Methods and approaches in the analysis of gene expression data
Introduction
The rapid technological development in the field of genomics has created an unprecedented situation in biology. Large volumes of information, from the genome sequence to high throughput functional data, have shifted the attention of biologists towards an understanding of the global mechanisms behind biological phenomena. For example the human genome project has revealed a considerable number of genes with no clear sequence homology with previously characterised genes; understanding how these new genes act in driving the physiology of a human cell will be a major challenge for many years to come. Genetics and functional genomics platforms such as gene expression profiling, proteomics, yeast two-hybrid, transgenic technology and functional screening clearly all have roles to play in deciphering this puzzle. High-density macro and micro arrays have acquired a special role in this challenging field; these consist of ordered collections of thousands of different DNA sequences that can be used to measure DNA and RNA variation in biological samples. They have many applications (Lipschultz et al., 1999) but are most commonly used in expression profiling (Bowtell, 1999). Because of its relative low cost and high gene coverage (they can be used to measure the expression of thousands of genes in a single experiment) the use of this technology is rapidly spreading in both academic and industrial institutions, contributing to the exponential increase in publicly available genome wide transcription data.
The availability of so much data is a big challenge both in terms of the infrastructure required for its storage and manipulation, and for the analytical tools required to extract meaningful information. These problems, originally limited to local institutions, need to be approached at a more global level. Large publicly accessible gene expression databases will be required to share and cross-mine different experimental data. Common standards in data quality and experimental procedures however may need to be established in order to make this a worthwhile effort. Recently, the European Bioinformatics institute (EBI) has announced plans for a microarray database that could satisfy these criteria (http://www.ebi.ac.uk/arrayexpress/News/news.html) but the current funding situation may hamper rapid expansion.
The development of data analysis strategies and tools to cope with the complexity of the data is a sizeable task. Current methods for analysis are based on comparison roles and pattern recognition algorithms such as cluster analysis. These methods are very effective data exploration tools that have already revealed a great deal of information in many areas of immunological research. The new field of gene expression data analysis is rapidly moving towards more statistically robust methods (for example to predict disease membership or to model biological variables by means of gene expression).
In this review we present an overview of the methodologies currently used in the analysis of gene expression data. In the second section of the manuscript we introduce some principles of data analysis with particular reference to clustering and other classification methods. In the third section we describe some of the most relevant applications in the field of immunology.
Section snippets
Data acquisition and data manipulation
High-density array technology provides an assay for the simultaneous measurement of the expression level of thousands of genes in a single experiment. Each array consists of a solid support (usually nylon or glass) where cDNA or oligonucleotides are arrayed in a fixed pattern. Fluorescent or radioactive probes derived from messenger RNA are hybridised to the complementary DNA on the array. The radioactive or fluorescence emissions of specifically bound probe are detected using an appropriate
Comparison of independent and paired samples
The comparison of two independent samples (e.g. diseased versus normal tissue) is the simplest experimental situation. As discussed previously, although a number of statistical tests are available to assess the significance of the observed differences, most of the groups active in this field use filtering rules based on fold difference criteria. Here we will review two examples based on two different experimental designs.
A filtering rule strategy applied to a problem of immunological interest
Conclusion and future perspectives
Virtually every laboratory can now access some form of microarray technology and monitor the expression of a significant percentage of the transcriptional capacity of the human genome. This unprecedented volume of information is changing the way experimental research is done in many areas of biology. The classical hypothesis driven ‘single gene’ approach is now paralleled by a more global approach. Although experiments aiming to characterise the transcriptional response of a biological system
Acknowledgments
We acknowledge that Dragoni I., Zanders E., Gillian A. and Falciani F. generated and analysed the data on the rheumatoid arthritis populations described in Section 3.5 while working at the GlaxoWellcome medicine research centre in Stevenage, UK. We would like to thank Dr. Brian Champion and Dr. Gareth Maslen (Lorantis Ltd.) for critical reading of the manuscript and for useful comments. We are also indebted to Dr. Dov Stekel (Oxford Gene Technology, Oxford, UK) for his encouragement and for
References (58)
- et al.
Monitoring gene expression using DNA microarrays
Curr. Opin. Microbiol.
(2000) - et al.
Microdissection, microchip arrays, and molecular analysis of tumor cells (primary and metastases)
Semin. Radiat. Oncol.
(1998) - et al.
Coordinate regulation of the expression of axonal proteins by the axonal microenvironment
Dev. Biol.
(1986) - et al.
Analysis of gene expression data using self-organizing maps
FEBS Lett.
(1999) - et al.
Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
Nature
(2000) - et al.
Probing lymphocyte biology by genomic-scale gene expression analysis
J. Clin. Immunol.
(1998) - et al.
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed with oligonucleotide arrays
Proc. Natl. Acad. Sci. USA
(1999) - et al.
Global approaches to quantitative analysis of gene expression patterns observed by use of two-dimensional gel electrophoresis
Clin. Chem.
(1984) - et al.
Clustering gene expression patterns
J. Comput. Biol.
(1999) Options available — from start to finish — for obtaining expression data by microarray
Nature
(1999)