ReviewApproaches for systematic proteome exploration
Introduction
With the completion of the human genome sequence (Consortium, 2004; Lander et al., 2001; Venter et al., 2001) the main focus has shifted from sequencing to annotation of gene function and further to protein expression pattern analysis in body fluids, tissues and cells of different origin. A driving force in genomics and proteomics is to shed light on the cellular events decisive for organisms or cells fate. However, precise knowledge of protein function is not always necessary in an initial phase, even the bare recognition of a differentially expressed protein would be of high interest. Such proteins are potential marker molecules that may be utilized for diagnostics, treatment or even prevention of severe diseases as various cancers. The impact of proteins as targets in drug development is reflected by the fact that more than 80% of all available pharmaceutical drugs act through proteins (Drews, 2000).
Today, genome sequences from more than 400 organisms, representing all branches of life, archaeal, bacterial and eukaryal, are publicly available (listed at www.genomesonline.org) and the number of sequenced genomes increases every day. Currently, 1600 genomes are being sequenced in ongoing projects (Liolios et al., 2006). The available genome sequences provide useful information to the proteomics community. From the DNA sequence it is possible to theoretically say what proteins may be present in a sample. Furthermore, a known gene sequence is commonly used to identify and verify proteins detected by various proteomics methods. Modern sequencing technology, provide highly accurate sequences. For example, the human genome is completed to 99% and the predicted error rate is about 1 event per 100 000 bases (Consortium, 2004). Altogether, there are more than three billion bases included in the human genome and the number of protein-coding genes is predicted to be over 23 000 (Ensembl version 40, August 2006, www.ensembl.org). However, most of the genomes still remain to be functionally annotated and a comprehensive proteome exploration is even more challenging than any genome sequencing effort. While genomes are essentially static over time, the composition and levels of transcribed mRNAs and expressed proteins is fluctuating at every single time point.
Proteomes of higher eukaryotes are very complex and several properties make the proteomic exploration enormously demanding. For instance, the huge dynamic range in a biological sample span over 10 orders of magnitude in relative abundance of different proteins (Anderson and Anderson, 2002). This range makes it complicated to detect rare protein species without extensive preparatory work as fractionation or depletion of several abundant protein species. Other demanding factors are, alternative promoter usage and a number of post-transcriptional alterations of proteins including, alternative splicing and post-translational modifications (PTMs) (Godovac-Zimmermann et al., 2005). The three-dimensional structure and the range of physiological and chemical properties among proteins also have to be considered as this makes various proteins less suitable to study using certain approaches. Moreover, a great part of proteins are more or less fragile and subjected to degradation and thus, sample handling is a crucial part in many proteomics methods. Facing the complexity of comprehensive proteome analysis, the proteomics field has employed a broad range of methods to study different aspects, including protein localization, protein–protein interactions, posttranslational modifications and alteration of protein composition in tissues and body fluids as a consequence of abnormality or developmental states.
In many cases, a useful strategy to significantly reduce the complexity during the analytical operations is to deal with sub-proteomes, i.e. focus on proteins from a predefined protein class, a specific body fluid or tissue, a certain organelle or proteins with certain PTMs or from particular protein complexes. To obtain a less complex sample, suitable for a more focused sub-proteome analysis, classical biochemical operations as centrifugation and filtration are often employed. Furthermore, fractionation of proteins, with respect to molecular size (Chertov et al., 2005, Tirumalai et al., 2003), hydrophobicity (Yuan and Desiderio, 2005) and isoelectrical point (Davidsson et al., 2002) have been reported. Subsequent detection of proteins in the different fractions is then facilitated by lower sample complexity and the possibility to analyze material corresponding to larger initial sample volumes. Enrichment of proteins containing certain post-translational modifications is performed by applying various techniques for affinity capture, including immunoprecipitation. Phosphorylated proteins are most frequently selected by immobilized metal-ion affinity chromatography (IMAC) using various metal-ions, including Fe3+, Ga3+, Al3+ or Zr4+ as the ligand (Reinders and Sickmann, 2005). Recently, a novel strategy based on TiO2-columns has also been reported (Sano and Nakamura, 2004).
While the above-mentioned methods are relatively generic, other, more selective depletion strategies of abundant proteins require knowledge of the sample under study. For example, the dominating protein in human plasma, HSA, constitutes over 50% of the total protein content, and would hence be the first target protein of choice for plasma depletion strategies. Depletion is often achieved through passage of the sample over a highly selective affinity media. For example, HSA-depletion is most commonly achieved on Cibacron Blue-Sepharose media (Travis et al., 1976) or using monoclonal antibodies (Greenough et al., 2004, Steel et al., 2003). To reach even lower detection limits, additional abundant proteins have to be removed. Several studies have compared different depletion strategies in investigations of human plasma (Bjorhall et al., 2005, Gong et al., 2006, Ramström et al., 2005). One should bear in mind that when depletion procedures are applied, other peptides and proteins may be co-depleted and thus, not detected in the subsequent analysis (Granger et al., 2005). This risk is, however, in many cases counterbalanced by the improved detection limits achieved for many other proteins in the sample. In addition to fractionation and depletion, sample treatment in general is of utmost importance to achieve comparable results from a lab-to-lab and experiment-to-experiment point of view.
The proteomics field is highly diverse and to cover many aspects thereof, several approaches are required as they contribute with different kinds of information regarding different protein categories. An attempt to coordinate this broad mission is the human proteome organization, HUPO. HUPO is an international organization that supports proteomics initiatives in the member nations and aims to set standards and guidelines for how the scientific community should work together in trying to map the human proteome (Hanash, 2004).
This article describes and exemplifies some of the most commonly employed methods to perform proteomics research. We focus on four main aspects of proteomics, expression analysis, localization, quantification and interactions, as these areas complement each other to give a more global proteomics view. Most of the mentioned methods are useful in more than one application and can be addressed to several of the paragraphs in this review.
Section snippets
Expression analysis
Protein expression is tightly regulated and specific sets of proteins are expressed in different tissue and cell types. The protein composition in every single cell is very dynamic and adapts to factors as cell cycle progression and environmental fluctuations. One aspect, among other definitions, of expression proteomics is to investigate what protein species are present in a defined sample.
Given that there are 23 000 protein coding genes in the human genome, the actual number of protein species
Localization
Knowledge about protein localization provides useful information and might indicate what cellular pathways a certain protein functions in. Protein localization information also support putative protein–protein interactions that already have been proposed from alternative methods (Kumar et al., 2002). In addition, information on localization of a certain protein in a tissue, on a cellular and subcellular level may provide knowledge of pathogenic mechanisms involved in different diseases (Uhlen
Interactions
Protein–protein interactions and proteins interacting with other molecules are essential for life. In many cases knowledge about a protein's molecular environment, in terms of direct or indirect interaction partners, would add information sufficient to assign protein function. A number of different approaches aiming for mapping of protein interactions have been developed. Depending on the type of interaction, different methods are preferred. When searching for stable complexes with high
Quantification
One of the most challenging and important tasks in proteomics research is biomarker discovery, the identification of biomolecules whose concentrations are altered in certain disorders. For these studies, quantitative data is required. Several methods, including serial analysis of gene expression (SAGE) (Velculescu et al., 1995) and quantitative real-time reverse transcriptase PCR (qRT-PCR), allows for quantitative analysis of transcribed RNA-molecules in a cell. Today, the DNA microarray
Conclusions
A large variety of efforts concerning mapping of the human proteome is currently in progress in the scientific community. Various approaches and techniques are used and different questions are raised. Fortunately, the possibility to map and understand the proteome of different organisms has increased greatly with the sequencing of a large number of genomes, including the human. The knowledge of the genetic code can be utilized in different ways. Either a genome-based approach can be used,
Acknowledgements
The authors would like to express their gratitude to Dr. Andersson-Svahn and Dr. Barbe for kindly providing Fig. 2.
References (114)
- et al.
Selective enrichment of monospecific polyclonal antibodies for antibody-based proteomics efforts
J. Chromatogr. A
(2004) - et al.
Affinity proteomics for systematic protein profiling of chromosome 21 gene products in human tissues
Mol. Cell Prot.
(2003) - et al.
The human plasma proteome: history, character, and diagnostic prospects
Mol. Cell Prot.
(2002) - et al.
Proteomic profiling of pancreatic cancer for biomarker discovery
Mol. Cell Prot.
(2005) - et al.
From gene expression analysis to tissue microarrays—a rational approach to identify therapeutic and diagnostic targets in lymphoid malignancies
Mol. Cell Prot.
(2006) Prokaryotic expression of antibodies and affibodies
Curr. Opin. Biotechnol.
(2004)- et al.
Complementary profiling of gene expression at the transcriptome and proteome levels in Saccharomyces cerevisiae
Mol. Cell Prot.
(2002) - et al.
Quantitative phosphoproteomics applied to the yeast pheromone signaling pathway
Mol. Cell Prot.
(2005) HUPO initiatives relevant to clinical proteomics
Mol. Cell Prot.
(2004)- et al.
Mass spectrometric analysis of protein mixtures at low levels using cleavable 13C-isotope-coded affinity tag and multidimensional chromatography
Mol. Cell Prot.
(2003)