Review
Approaches for systematic proteome exploration

https://doi.org/10.1016/j.bioeng.2007.01.001Get rights and content

Abstract

With the completion of the human genome project (HUGO) during recent years, gene function, protein abundance and expression patterns in tissues and cell types have emerged as central areas for the scientific community. A mapped human proteome will extend the value of the genome sequence and large-scale efforts aiming at elucidating protein localization, abundance and function are invaluable for biomarker and drug discovery. This research area, termed proteomics, is more demanding than any genome sequencing effort and to perform this on a wide scale is a highly diverse task. Therefore, the proteomics field employs a range of methods to examine different aspects of proteomics including protein localization, protein–protein interactions, posttranslational modifications and alteration of protein composition (e.g. differential expression) in tissues and body fluids. Here, some of the most commonly used methods, including chromatographic separations together with mass spectrometry and a number of affinity proteomics concepts are discussed and exemplified.

Introduction

With the completion of the human genome sequence (Consortium, 2004; Lander et al., 2001; Venter et al., 2001) the main focus has shifted from sequencing to annotation of gene function and further to protein expression pattern analysis in body fluids, tissues and cells of different origin. A driving force in genomics and proteomics is to shed light on the cellular events decisive for organisms or cells fate. However, precise knowledge of protein function is not always necessary in an initial phase, even the bare recognition of a differentially expressed protein would be of high interest. Such proteins are potential marker molecules that may be utilized for diagnostics, treatment or even prevention of severe diseases as various cancers. The impact of proteins as targets in drug development is reflected by the fact that more than 80% of all available pharmaceutical drugs act through proteins (Drews, 2000).

Today, genome sequences from more than 400 organisms, representing all branches of life, archaeal, bacterial and eukaryal, are publicly available (listed at www.genomesonline.org) and the number of sequenced genomes increases every day. Currently, 1600 genomes are being sequenced in ongoing projects (Liolios et al., 2006). The available genome sequences provide useful information to the proteomics community. From the DNA sequence it is possible to theoretically say what proteins may be present in a sample. Furthermore, a known gene sequence is commonly used to identify and verify proteins detected by various proteomics methods. Modern sequencing technology, provide highly accurate sequences. For example, the human genome is completed to 99% and the predicted error rate is about 1 event per 100 000 bases (Consortium, 2004). Altogether, there are more than three billion bases included in the human genome and the number of protein-coding genes is predicted to be over 23 000 (Ensembl version 40, August 2006, www.ensembl.org). However, most of the genomes still remain to be functionally annotated and a comprehensive proteome exploration is even more challenging than any genome sequencing effort. While genomes are essentially static over time, the composition and levels of transcribed mRNAs and expressed proteins is fluctuating at every single time point.

Proteomes of higher eukaryotes are very complex and several properties make the proteomic exploration enormously demanding. For instance, the huge dynamic range in a biological sample span over 10 orders of magnitude in relative abundance of different proteins (Anderson and Anderson, 2002). This range makes it complicated to detect rare protein species without extensive preparatory work as fractionation or depletion of several abundant protein species. Other demanding factors are, alternative promoter usage and a number of post-transcriptional alterations of proteins including, alternative splicing and post-translational modifications (PTMs) (Godovac-Zimmermann et al., 2005). The three-dimensional structure and the range of physiological and chemical properties among proteins also have to be considered as this makes various proteins less suitable to study using certain approaches. Moreover, a great part of proteins are more or less fragile and subjected to degradation and thus, sample handling is a crucial part in many proteomics methods. Facing the complexity of comprehensive proteome analysis, the proteomics field has employed a broad range of methods to study different aspects, including protein localization, protein–protein interactions, posttranslational modifications and alteration of protein composition in tissues and body fluids as a consequence of abnormality or developmental states.

In many cases, a useful strategy to significantly reduce the complexity during the analytical operations is to deal with sub-proteomes, i.e. focus on proteins from a predefined protein class, a specific body fluid or tissue, a certain organelle or proteins with certain PTMs or from particular protein complexes. To obtain a less complex sample, suitable for a more focused sub-proteome analysis, classical biochemical operations as centrifugation and filtration are often employed. Furthermore, fractionation of proteins, with respect to molecular size (Chertov et al., 2005, Tirumalai et al., 2003), hydrophobicity (Yuan and Desiderio, 2005) and isoelectrical point (Davidsson et al., 2002) have been reported. Subsequent detection of proteins in the different fractions is then facilitated by lower sample complexity and the possibility to analyze material corresponding to larger initial sample volumes. Enrichment of proteins containing certain post-translational modifications is performed by applying various techniques for affinity capture, including immunoprecipitation. Phosphorylated proteins are most frequently selected by immobilized metal-ion affinity chromatography (IMAC) using various metal-ions, including Fe3+, Ga3+, Al3+ or Zr4+ as the ligand (Reinders and Sickmann, 2005). Recently, a novel strategy based on TiO2-columns has also been reported (Sano and Nakamura, 2004).

While the above-mentioned methods are relatively generic, other, more selective depletion strategies of abundant proteins require knowledge of the sample under study. For example, the dominating protein in human plasma, HSA, constitutes over 50% of the total protein content, and would hence be the first target protein of choice for plasma depletion strategies. Depletion is often achieved through passage of the sample over a highly selective affinity media. For example, HSA-depletion is most commonly achieved on Cibacron Blue-Sepharose media (Travis et al., 1976) or using monoclonal antibodies (Greenough et al., 2004, Steel et al., 2003). To reach even lower detection limits, additional abundant proteins have to be removed. Several studies have compared different depletion strategies in investigations of human plasma (Bjorhall et al., 2005, Gong et al., 2006, Ramström et al., 2005). One should bear in mind that when depletion procedures are applied, other peptides and proteins may be co-depleted and thus, not detected in the subsequent analysis (Granger et al., 2005). This risk is, however, in many cases counterbalanced by the improved detection limits achieved for many other proteins in the sample. In addition to fractionation and depletion, sample treatment in general is of utmost importance to achieve comparable results from a lab-to-lab and experiment-to-experiment point of view.

The proteomics field is highly diverse and to cover many aspects thereof, several approaches are required as they contribute with different kinds of information regarding different protein categories. An attempt to coordinate this broad mission is the human proteome organization, HUPO. HUPO is an international organization that supports proteomics initiatives in the member nations and aims to set standards and guidelines for how the scientific community should work together in trying to map the human proteome (Hanash, 2004).

This article describes and exemplifies some of the most commonly employed methods to perform proteomics research. We focus on four main aspects of proteomics, expression analysis, localization, quantification and interactions, as these areas complement each other to give a more global proteomics view. Most of the mentioned methods are useful in more than one application and can be addressed to several of the paragraphs in this review.

Section snippets

Expression analysis

Protein expression is tightly regulated and specific sets of proteins are expressed in different tissue and cell types. The protein composition in every single cell is very dynamic and adapts to factors as cell cycle progression and environmental fluctuations. One aspect, among other definitions, of expression proteomics is to investigate what protein species are present in a defined sample.

Given that there are 23 000 protein coding genes in the human genome, the actual number of protein species

Localization

Knowledge about protein localization provides useful information and might indicate what cellular pathways a certain protein functions in. Protein localization information also support putative protein–protein interactions that already have been proposed from alternative methods (Kumar et al., 2002). In addition, information on localization of a certain protein in a tissue, on a cellular and subcellular level may provide knowledge of pathogenic mechanisms involved in different diseases (Uhlen

Interactions

Protein–protein interactions and proteins interacting with other molecules are essential for life. In many cases knowledge about a protein's molecular environment, in terms of direct or indirect interaction partners, would add information sufficient to assign protein function. A number of different approaches aiming for mapping of protein interactions have been developed. Depending on the type of interaction, different methods are preferred. When searching for stable complexes with high

Quantification

One of the most challenging and important tasks in proteomics research is biomarker discovery, the identification of biomolecules whose concentrations are altered in certain disorders. For these studies, quantitative data is required. Several methods, including serial analysis of gene expression (SAGE) (Velculescu et al., 1995) and quantitative real-time reverse transcriptase PCR (qRT-PCR), allows for quantitative analysis of transcribed RNA-molecules in a cell. Today, the DNA microarray

Conclusions

A large variety of efforts concerning mapping of the human proteome is currently in progress in the scientific community. Various approaches and techniques are used and different questions are raised. Fortunately, the possibility to map and understand the proteome of different organisms has increased greatly with the sequencing of a large number of genomes, including the human. The knowledge of the genetic code can be utilized in different ways. Either a genome-based approach can be used,

Acknowledgements

The authors would like to express their gratitude to Dr. Andersson-Svahn and Dr. Barbe for kindly providing Fig. 2.

References (114)

  • M. Janzi et al.

    Serum microarrays for large scale screening of protein levels

    Mol. Cell Prot.

    (2005)
  • J. Ji et al.

    Strategy for qualitative and quantitative analysis in proteomics based on signature peptides

    J. Chromatogr. B

    (2000)
  • L.W. Miller et al.

    Selective chemical labeling of proteins in living cells

    Curr. Opin. Chem. Biol.

    (2005)
  • V.K. Mootha et al.

    Integrated analysis of protein composition, tissue diversity, and gene regulation in mouse mitochondria

    Cell

    (2003)
  • F. Nilsson et al.

    The use of phage display for the development of tumour targeting agents

    Adv. Drug Deliv. Rev.

    (2000)
  • P.A. Nygren et al.

    Binding proteins from alternative scaffolds

    J. Immunol. Methods

    (2004)
  • P.H. O’Farrell

    High resolution two-dimensional electrophoresis of proteins

    J. Biol. Chem.

    (1975)
  • S.E. Ong et al.

    Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics

    Mol. Cell Prot.

    (2002)
  • A. Skerra

    Lipocalins as a scaffold

    Biochim. Biophys. Acta

    (2000)
  • S. Stamm et al.

    Function of alternative splicing

    Gene

    (2005)
  • L.F. Steel et al.

    Efficient and specific removal of albumin from human serum samples

    Mol. Cell Prot.

    (2003)
  • J. Steen et al.

    High-throughput protein purification using an automated set-up for high-yield affinity chromatography

    Protein Exp. Purif.

    (2006)
  • R.S. Tirumalai et al.

    Characterization of the low molecular weight human serum proteome

    Mol. Cell Prot.

    (2003)
  • M.E. Belov et al.

    Zeptomole-sensitivity electrospray ionisation–Fourier transform ion cyclotron resonance mass spectrometry of proteins

    Anal. Chem.

    (2000)
  • H.K. Binz et al.

    High-affinity binders selected from designed ankyrin repeat protein libraries

    Nat. Biotechnol.

    (2004)
  • H.K. Binz et al.

    Engineering novel binding proteins from nonimmunoglobulin domains

    Nat. Biotechnol.

    (2005)
  • K. Bjorhall et al.

    Comparison of different depletion strategies for improved resolution in proteomic analysis of human serum samples

    Proteomics

    (2005)
  • B.L. Brizzard et al.

    Immunoaffinity purification of FLAG epitope-tagged bacterial alkaline phosphatase using a novel monoclonal antibody and peptide elution

    Biotechniques

    (1994)
  • G. Cagney et al.

    De novo peptide sequencing and quantitative profiling of complex protein mixtures using mass-coded abundance tagging

    Nat. Biotechnol.

    (2002)
  • P. Chaurand et al.

    New developments in profiling and imaging of proteins from tissue sections by MALDI mass spectrometry

    J. Proteome Res.

    (2006)
  • O. Chertov et al.

    Enrichment of low-molecular-weight proteins from biofluids for biomarker discovery

    Expert. Rev. Proteomics

    (2005)
  • C.T. Chien et al.

    The two-hybrid system: a method to identify and clone genes for proteins that interact with a protein of interest

    Proc. Natl. Acad. Sci. U.S.A.

    (1991)
  • I.H.G.S. Consortium

    Finishing the euchromatic sequence of the human genome

    Nature

    (2004)
  • P. Davidsson et al.

    Identification of proteins in human cerebrospinal fluid using liquid-phase isoelectric focusing as a prefractionation step followed by two-dimensional gel electrophoresis and matrix-assisted laser desorption/ionisationb mass spectrometry

    Rapid. Commun. Mass Spectrom.

    (2002)
  • B. Domon et al.

    Mass spectrometry and protein analysis

    Science

    (2006)
  • J. Drews

    Drug discovery: a historical perspective

    Science

    (2000)
  • A.D. Ellington et al.

    In vitro selection of RNA molecules that bind specific ligands

    Nature

    (1990)
  • S. Fields et al.

    A novel genetic system to detect protein–protein interactions

    Nature

    (1989)
  • J. Godovac-Zimmermann et al.

    Perspectives in spicing up proteomics with splicing

    Proteomics

    (2005)
  • Y. Gong et al.

    Different immunoaffinity fractionation strategies to characterize the human plasma proteome

    J. Proteome Res.

    (2006)
  • D.R. Goodlett et al.

    Differential stable isotope labeling of peptides for quantitation and de novo sequence derivation

    Rapid Commun. Mass Spectrom.

    (2001)
  • A. Gorg et al.

    Current two-dimensional electrophoresis technology for proteomics

    Proteomics

    (2004)
  • J. Granger et al.

    Albumin depletion of human plasma also removes low abundance proteins including the cytokines

    Proteomics

    (2005)
  • C. Greenough et al.

    A method for the rapid depletion of albumin and immunoglobulin from human plasma

    Proteomics

    (2004)
  • S.P. Gygi et al.

    Quantitative analysis of complex protein mixtures using isotope-coded affinity tags

    Nat. Biotechnol.

    (1999)
  • S.P. Gygi et al.

    Correlation between protein and mRNA abundance in yeast

    Mol. Cell Biol.

    (1999)
  • E. Harlow et al.

    Antibodies a Laboratory Manual

    (1988)
  • J.A. Heyman et al.

    Genome-scale cloning and expression of individual open reading frames using topoisomerase I-mediated ligation

    Genome Res.

    (1999)
  • T.P. Hopp et al.

    A short polypeptide marker sequence useful for recombinant protein identification and purification

    BioTechnology

    (1988)
  • W.K. Huh et al.

    Global analysis of protein localization in budding yeast

    Nature

    (2003)
  • Cited by (0)

    View full text