Abstract
Microarrays are routinely used to assess mRNA transcript levels on a genome-wide scale. As use and acceptance increases, there is intensified focus on appropriate methods of data generation and interpretation, with important questions being asked about the best data analysis methods. The development of such 'best practices' is needed, as microarrays — in particular, Affymetrix oligonucleotide arrays — are becoming increasingly important in human clinical trials, both for differential diagnosis and monitoring of pharmacological efficacy. Here, representatives from high-volume microarray core centres consider the current status of 'best practices', focusing on the broadly used Affymetrix oligonucleotide arrays.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Brazma, A. et al. Minimum information about a microarray experiment (MIAME) — toward standards for microarray data. Nature Genet. 29, 365–371 (2001).
Spellman, P. T. et al. Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol. 3, RESEARCH0046.1-0046.9 (2002).
Brazma, A. et al. ArrayExpress — a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 31, 68–71 (2003).
Zhao, P., Iezzi, S., Sartorelli, V., Dressman, D. & Hoffman, E. P. Slug is downstream of myoD: identification of novel pathway members via temporal expression profiling. J. Biol. Chem. 277, 20091–20101 (2002).
Di Giovanni, S. et al. Gene profiling in spinal cord injury shows role of cell cycle in neuronal death. Ann. Neurol. 53, 454–468 (2003).
Jin, J. Y., Almon, R. R., DuBois, D. C. & Jusko, W. J. Modeling of corticosteroid pharmacogenomics in rat liver using gene microarrays. J. Pharmacol. Exp. Ther. 307, 93–109 (2003).
Bakay, M. et al. Sources of variability and effect of experimental approach on expression profiling data interpretation. BMC Bioinformat. 3, 4–15 (2002).
DePrimo, S. E. et al. Expression profiling of blood samples from an SU5416 Phase III metastatic colorectal cancer clinical trial: a novel strategy for biomarker identification. BMC Cancer 3, 3 (2003).
de Vos, S. et al. Gene expression profile of serial samples of transformed B-cell lymphomas. Lab. Invest. 83, 271–285 (2003).
Hittel, D. S., Kraus, W. E. & Hoffman, E. P. Skeletal muscle dictates the fibrinolytic state after exercise training in overweight men with characteristics of metabolic syndrome. J. Physiol. 548, 401–410 (2003).
Zambon, A. C. et al. Time- and exercise-dependent gene regulation in human skeletal muscle. Genome Biol. 4, R61 (2003).
Bakay, M. et al. Sources of variability and effect of experimental approach on expression profiling data interpretation. BMC Bioinformat. 3, 4–15 (2002).
Cardozo, A. K. et al. Gene microarray study corroborates proteomic findings in rodent islet cells. J. Proteome Res. 2, 553–555 (2003).
Chun, T. W. et al. Gene expression and viral prodution in latently infected, resting CD4+T cells in viremic versus aviremic HIV-infected individuals. Proc. Natl Acad. Sci. USA 100, 1908–1913 (2003).
Kamme, F. et al. Single-cell microarray analysis in hippocampus CA1: demonstration and validation of cellular heterogeneity. J. Neurosci. 23, 3607–3615 (2003).
Huang, J. et al. Effects of ischemia on gene expression. J. Surg. Res. 99, 222–227 (2001).
Seo, J. et al. Interactive color mosaic and dendrogram displays for signal/noise optimization in microarray data analysis. IEEE ICME 3, 461–462 (2003).
Somorjai, R. L., Dolenko, B. & Baumgartner, R. Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions. Bioinformatics 19, 1484–1491 (2003).
Mei, R. et al. Probe selection for high-density oligonucleotide arrays. Proc. Natl Acad. Sci. USA 100, 11237–11242 (2003).
Li, C. & Hung Wong, W. Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biol. 2, RESEARCH0032 (2001).
Irizarry, R. A. et al. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 31, e15 (2003).
Bolstad, B. M., Irizarry, R. A., Astrand, M. & Speed, T. P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185–193 (2003).
Ambroise, C. & McLachlan, G. J. Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl Acad. Sci. USA 99, 6562–6566 (2002).
West, M. et al. Predicting the clinical status of human breast cancer utilizing gene expression profiles. Proc. Natl Acad. Sci. USA 98, 11462–11467 (2001).
Tusher, V., Tibshirani, R. & Chu, G. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA 98, 5116–5124 (2001).
Tibshirani, R., Hastie, T., Narasimhan, B. & Chu, G. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl Acad. Sci. USA 99, 6567–6572 (2002).
Huang, E. et al. Gene expression phenotypic models that predict the activity of oncogenic pathways. Nature Genet. 34, 226–230 (2003).
Black, E. P. et al. Distinct gene expression phenotypes of cells lacking Rb and Rb family members. Cancer Res. 63, 3716–3723 (2003).
Huang, E. et al. Gene expression predictors of breast cancer outcomes. Lancet 361, 1590–1596 (2003).
Alter, O., Brown, P. O. & Botstein, D. Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl Acad. Sci. USA 97, 10101–10106 (2000).
Chen, J. et al. The PEPR GeneChip data warehouse and implementation of a dynamic time series query tool (SGQT) with graphical interface. Nucleic Acids Res. 32, D578–D581 (2004).
Almon et al. In vivo multitissue corticosteroid microarray time series available online at Public Expression Profile Resource (PEPR). Pharmacogenomics 4, 791–799 (2003).
Acknowledgements
The authors thank their respective funding agencies, particularly the larger collaborative funding initiatives that make systematic and large-scale studies of the bioinformatics and biostatistics of genome-wide data sets possible from the Department of Defense and the Doris Duke Charitable Foundation CSDA. The authors also thank S. Hilmer, A. DeBiase and G. Miyada for their critique of the manuscript.
Author information
Consortia
Additional information
Corresponding author: ehoffman@cnmcresearch.org
Eric P. Hoffman is at the Research Center for Genetic Medicine, Children's National Medical Center, Washington DC 20010, USA. email: ehoffman@cnmcresearch.org Tarif Awad, John Palma, Teresa Webster, Earl Hubbell and Janet A. Warrington are at Affymetrix, Santa Clara, California 95051, USA. emails: tarif_awad@affymetrix.com; john_palma@affymetrix.com; teresa_webster@affymetrix.com; earl_hubbell@affymetrix.com; janet_warrington@affymetrix.com Avrum Spira is at The Pulmonary Center, Boston University Medical Center and the Bioinformatics Program, Boston University, Boston, Massachusetts 02118, USA. e-mail: aspira@lung.bumc.bu.edu George Wright is at the Biometric Research Branch, Division of Cancer Treatment and Diagnosis, National Cancer Institute, National Institute of Health, Bethesda, Maryland 20892, USA. e-mail: wrightge@mail.nih.gov Jonathan Buckley and Tim Triche are at the Children's Hospital, University of California, Los Angeles, California 90089, USA. e-mail: buckley@hsc.usc.edu; triche@hsc.usc.edu Ron Davis, Robert Tibshirani and Wenzhong Xiao are at Stanford University, Palo Alto, California 94303, USA. e-mails: dbowe@stanford.edu; tibs@stat.stanford.edu; wzxiao@pmgm2.stanford.edu Wendell Jones is at Expression Analysis Inc., Durham, North Carolina 27713, USA. e-mail: wjones@expressionanalysis.com Ron Tompkins is at Harvard University, Boston, Massachusetts 02115, USA. e-mail: rtompkins@partners.org Mike West is at the Institute of Statistics and Decision Sciences, Duke University, Durham, North Carolina 27708, USA. e-mail: mw@stat.duke.edu
Related links
Related links
DATABASES
LocusLink
FURTHER INFORMATION
Affymetrix Developers' Network
ArrayExpress microarray database
The Children's National Medical Center Microarray Center
HOPGENE Program for Genomic Applications
Glossary
- A-, B- AND C-SERIES ARRAYS
-
A series of human, rat and mouse Affymetrix arrays released in 2003, in which the A array contained the best-characterized genes, and B and C arrays contained less well-defined expressed sequence tags. In 2004, all probe sets have been condensed so that there is only one microarray per species that covers the entire genome.
- CROSS-SECTIONAL DESIGN
-
The use of different subjects in an experimental and control group or groups. The statistical analysis compares the median and variation within each group relative to the other groups.
- FEATURE
-
Typically one element (spot) on a microarray. In spotted cDNA or oligonucleotide arrays, features correspond to genes or transcripts; in Affymetrix arrays, there are typically 22 elements per probe set and often multiple probe sets per gene, so a feature might refer to a single oligonucleotide, a probe pair or a probe set, or a gene with multiple probe sets. In bioinformatics it is most often synonymous with a gene.
- FLUORESCENCE-ACTIVATED CELL SORTING
-
(FACS). A method whereby dissociated and individual living cells are sorted, in a liquid stream, according to the intensity of fluorescence that they emit as they pass through a laser beam.
- FLUOROPHORE
-
A small molecule, or a part of a larger molecule, that can be excited by light to emit fluorescence.
- ISCHAEMIA
-
The loss of blood supply, and hence oxygenation, to a tissue or cells.
- LASER CAPTURE MICRODISSECTION
-
A technique in which individual cells, or regions of tissue, are excised from a histological preparation, using specially equipped microscopes, and isolated for further study.
- LONGITUDINAL DESIGN
-
The use of multiple samples from the same subject. With this design, each subject serves as their own control, eliminating confounding inter-individual variations at baseline; paired t-tests are used to interpret the data.
- NEGATIVE CELL ISOLATION
-
The use of antibodies or other reagents to remove all unwanted cells from a mixed population of cells. In this method, the desired cells are not exposed to bound antibodies, thereby avoiding potential activation or other molecular alteration in the desired cells.
- PENALTY WEIGHT
-
In Affymetrix arrays, hybridization to the 'mismatch' probe of a probe pair might or might not be considered as a form of measurement of noise or background, and can be factored into the signal seen with the paired 'perfect match' as a penalty weight.
- PHOTOLITHOGRAPHY
-
The process of using light to either etch or activate regions of a surface (substrate). This method is used in microelectronics to create integrated circuits and processors.
- REAL-TIME PCR
-
The quantification of the amount of PCR product during each cycle of a PCR reaction. The product concentration, as a function of cycle number, provides a good estimation of the relative quantity of the mRNA being tested.
- RESECTION
-
Surgical removal of tissue, most commonly used for removing tumorous masses from surrounding tissue.
- S1 NUCLEASE PROTECTION
-
An experimental method for determining mRNA transcript concentration in a tissue or cell RNA sample. It involves using labelled DNA probes that bind the RNA, with overhanging non-hybridized tails of the probe then being digested by the S1 nuclease. This creates a smaller labelled DNA probe that is indicative of the abundance of the mRNA being tested.
- SURVIVAL DATA ANALYSIS
-
A battery of statistical methods applied to data when mortality is often the only, or best, measured outcome.
- TIME-SERIES STUDY
-
The use of a series of samples taken at defined time points after a defined stimulus. In mice and rats, the samples at different time points are usually from different animals. In humans, time-series studies are necessarily longitudinal to avoid additional confounding noise.
- TUKEY'S BI-WEIGHT ESTIMATOR
-
Many statistical tests require underlying definitions that are assumed to be valid (for example, tumour versus non-tumour), and require data that show a normal distribution. Microarray data, and the clinical information underlying the definition of samples, is often less exact, with genes or samples often performing as statistical outliers. Tukey's bi-weight estimator is one of the M-class of statistical models that is less sensitive to outliers and performs more gracefully when underlying assumptions are inexact.
- WILCOXON'S SIGNED RANK
-
A statistical test that investigates the population median of paired differences. It is well suited for microarray work as it treats each gene as an independent variable and does not require normal distributions of the data.
Rights and permissions
About this article
Cite this article
The Tumor Analysis Best Practices Working Group. Expression profiling — best practices for data generation and interpretation in clinical trials. Nat Rev Genet 5, 229–237 (2004). https://doi.org/10.1038/nrg1297
Issue Date:
DOI: https://doi.org/10.1038/nrg1297
This article is cited by
-
Investigating the impact of RNA integrity variation on the transcriptome of human leukemic cells
3 Biotech (2022)
-
Up-regulation of ACE2, the SARS-CoV-2 receptor, in asthmatics on maintenance inhaled corticosteroids
Respiratory Research (2021)
-
Comparison of visceral adipose tissue DNA methylation and gene expression profiles in female adolescents with obesity
Diabetology & Metabolic Syndrome (2019)
-
A checklist for maximizing reproducibility of ecological niche models
Nature Ecology & Evolution (2019)
-
African-American esophageal squamous cell carcinoma expression profile reveals dysregulation of stress response and detox networks
BMC Cancer (2017)