Powers and Pitfalls in Sequence Analysis: The 70% Hurdle

  1. Peer Bork1
  1. European Molecular Biology Laboratory (EMBL) 69012 Heidelberg; Germany and Max-Delbrück-Centrum, D-13122 Berlin-Buch, Germany

This extract was created in the absence of an abstract.

High-throughput technologies impress us almost every week with novel global results and big numbers. They often reveal important general trends that are impossible to realize with classical, low-throughput experimental methods, yet (so far) they provide fewer insights into specific, molecular detail. Because of the amount of data involved, high-throughput technologies imply the use of bioinformatics methods that deal with information transformation, storage, and analysis. By necessity, most of these processes are automated.

Partly because of the nature of current publication schemes, the accuracy and error margins of a given method are often only found in small print. It is obvious that each method has its limits and also that during data processing, some information will be lost or diluted. Because of the current need to integrate and add value to data, results from high-throughput experiments (if made publicly accessible) are often taken further by third-party research that relies on the quality of these data. Thus, I believe that public awareness of error margins for high-throughput experimental and computational methods should be increased; the incredibly valuable data accumulating in various heterogeneous databases permit powerful analyses but should not be overinterpreted. In the following discussion, I will concentrate on limits in computational sequence analysis, which is far from being perfect (Table 1), despite the fact that sequencing itself is highly automated and accurate, and despite the fact that sequence information is described in simple linear terms (using a four-letter alphabet). On average, a 70% accuracy just to predict functional and structural features has to be considered a success (Table 1).

View this table:
Table 1.

Selected Examples of Prediction Accuracy in Different Areas of Sequence Analysis

Limitations in the Total Knowledge Base of Protein Function

As these analysis methods are knowledge based, one of …

| Table of Contents

Preprint Server