Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry

Abstract

Liquid chromatography and tandem mass spectrometry (LC-MS/MS) has become the preferred method for conducting large-scale surveys of proteomes. Automated interpretation of tandem mass spectrometry (MS/MS) spectra can be problematic, however, for a variety of reasons. As most sequence search engines return results even for 'unmatchable' spectra, proteome researchers must devise ways to distinguish correct from incorrect peptide identifications. The target-decoy search strategy represents a straightforward and effective way to manage this effort. Despite the apparent simplicity of this method, some controversy surrounds its successful application. Here we clarify our preferred methodology by addressing four issues based on observed decoy hit frequencies: (i) the major assumptions made with this database search strategy are reasonable; (ii) concatenated target-decoy database searches are preferable to separate target and decoy database searches; (iii) the theoretical error associated with target-decoy false positive (FP) rate measurements can be estimated; and (iv) alternate methods for constructing decoy databases are similarly effective once certain considerations are taken into account.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Overlap between target (forward) and decoy (reversed) sequences is negligible.
Figure 2: The distributions of potential peptide matches is consistent between target and decoy databases at several mass tolerances.
Figure 3: Incorrect identifications are equally selected from target and decoy (reversed) sequences.
Figure 4: Separate searching overestimates FP rates by underestimating low-scoring correct identifications.
Figure 5: Estimating the theoretical error associated with target-decoy estimations.
Figure 6: Evaluation of alternate decoy databases.

Similar content being viewed by others

References

  1. Elias, J.E., Haas, W., Faherty, B.K. & Gygi, S.P. Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations. Nat. Methods 2, 667–675 (2005).

    Article  CAS  Google Scholar 

  2. Chen, Y., Kwon, S.W., Kim, S.C. & Zhao, Y. Integrated approach for manual evaluation of peptides identified by searching protein sequence databases with tandem mass spectra. J. Proteome Res. 4, 998–1005 (2005).

    Article  CAS  Google Scholar 

  3. Eng, J.K., McCormack, A.L. & Yates, J.R., III. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994).

    Article  CAS  Google Scholar 

  4. Moore, R.E., Young, M.K. & Lee, T.D. Qscore: An Algorithm for Evaluating SEQUEST Database Search Results. J. Am. Soc. Mass Spectrom. 13, 378–386 (2002).

    Article  CAS  Google Scholar 

  5. Peng, J., Elias, J.E., Thoreen, C.C., Licklider, L.J. & Gygi, S.P. Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. J. Proteome Res. 2, 43–50 (2003).

    Article  CAS  Google Scholar 

  6. Kislinger, T. et al. PRISM, a generic large scale proteomic investigation strategy for mammals. Mol. Cell. Proteomics 2, 96–106 (2003).

    Article  CAS  Google Scholar 

  7. Haas, W. et al. Optimization and use of peptide mass measurement accuracy in shotgun proteomics. Mol. Cell Proteomics 7, 1326–1337 (2006).

    Article  Google Scholar 

  8. Perkins, D.N., Pappin, D.J., Creasy, D.M. & Cottrell, J.S. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999).

    Article  CAS  Google Scholar 

  9. Olsen, J.V., Ong, S.E. & Mann, M. Trypsin cleaves exclusively C-terminal to arginine and lysine residues. Mol. Cell. Proteomics 3, 608–614 (2004).

    Article  CAS  Google Scholar 

  10. Nielsen, M.L., Savitski, M.M. & Zubarev, R.A. Improving protein identification using complementary fragmentation techniques in fourier transform mass spectrometry. Mol. Cell. Proteomics 4, 835–845 (2005).

    Article  CAS  Google Scholar 

  11. Resing, K.A. et al. Improving reproducibility and sensitivity in identifying human proteins by shotgun proteomics. Anal. Chem. 76, 3556–3568 (2004).

    Article  CAS  Google Scholar 

  12. Qian, W.J. et al. Probability-based evaluation of peptide and protein identifications from tandem mass spectrometry and SEQUEST analysis: the human proteome. J. Proteome Res. 4, 53–62 (2005).

    Article  CAS  Google Scholar 

  13. Higdon, R., Hogan, J.M., Van Belle, G. & Kolker, E. Randomized sequence databases for tandem mass spectrometry peptide and protein identification. OMICS 9, 364–379 (2005).

    Article  CAS  Google Scholar 

  14. Beausoleil, S.A. et al. Large-scale characterization of HeLa cell nuclear phosphoproteins. Proc. Natl. Acad. Sci. USA 101, 12130–12135 (2004).

    Article  CAS  Google Scholar 

  15. Elias, J.E., Gibbons, F.D., King, O.D., Roth, F.P. & Gygi, S.P. Intensity-based protein identification by machine learning from a library of tandem mass spectra. Nat. Biotechnol. 22, 214–219 (2004).

    Article  CAS  Google Scholar 

  16. Keller, A., Nesvizhskii, A.I., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002).

    Article  CAS  Google Scholar 

  17. Sadygov, R.G. & Yates, J.R., III. A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases. Anal. Chem. 75, 3792–3798 (2003).

    Article  CAS  Google Scholar 

  18. Nesvizhskii, A.I. & Aebersold, R. Interpretation of shotgun proteomic data: the protein inference problem. Mol. Cell. Proteomics 4, 1419–1440 (2005).

    Article  CAS  Google Scholar 

  19. Beausoleil, S.A., Villen, J., Gerber, S.A., Rush, J. & Gygi, S.P. A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat. Biotechnol. 24, 1285–1292 (2006).

    Article  CAS  Google Scholar 

  20. Everley, P.A. et al. Enhanced analysis of metastatic prostate cancer using stable isotopes and high mass accuracy instrumentation. J. Proteome Res. 5, 1224–1231 (2006).

    Article  CAS  Google Scholar 

  21. Kersey, P.J. et al. The International Protein Index: an integrated database for proteomics experiments. Proteomics 4, 1985–1988 (2004).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This work was supported in part by US National Institutes of Health (GM67945 and HG00041 to S.P.G.). We thank S. Beausoleil, P. Everley, S. Gerber and W. Haas for continuing and insightful discussions, and Sage-N for implementing our idea of the pseudo-reversed searches on their SEQUEST platform.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Steven P Gygi.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Fig. 1

False positive identifications can be estimated by doubling decoy hits from a search against a concatenated target/decoy database. (PDF 416 kb)

Supplementary Fig. 2

The distributions of potential peptide matches is consistent between target and decoy databases. (PDF 22 kb)

Supplementary Fig. 3

Example supporting the necessity for target/decoy competition. (PDF 47 kb)

Supplementary Fig. 4

Relative scores shift to smaller values for less than half of peptide hits when searched against composite target-decoy databases as opposed to separate databases. (PDF 41 kb)

Supplementary Fig. 5

Using decoy hits to guide selection of appropriate selection criteria. (PDF 160 kb)

Supplementary Table 1

Slopes of best-fit lines for precision values shown in Figure 5b. (PDF 14 kb)

Supplementary Methods (DOC 43 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Elias, J., Gygi, S. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods 4, 207–214 (2007). https://doi.org/10.1038/nmeth1019

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth1019

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing