Controlling the number of false discoveries: application to high-dimensional genomic data

https://doi.org/10.1016/S0378-3758(03)00211-8Get rights and content

Abstract

Researchers conducting gene expression microarray experiments often are interested in identifying genes that are differentially expressed between two groups of specimens. A straightforward approach to the identification of such “differentially expressed” genes is to perform a univariate analysis of group mean differences for each gene, and then identify those genes that are most statistically significant. However, with the large number of genes typically represented on a microarray, using nominal significance levels (unadjusted for the multiple comparisons) will lead to the identification of many genes that truly are not differentially expressed, “false discoveries.” A reasonable strategy in many situations is to allow a small number of false discoveries, or a small proportion of the identified genes to be false discoveries. Although previous work has considered control for the expected proportion of false discoveries (commonly known as the false discovery rate), we show that these methods may be inadequate. We propose two stepwise permutation-based procedures to control with specified confidence the actual number of false discoveries and approximately the actual proportion of false discoveries. Limited simulation studies demonstrate substantial gain in sensitivity to detect truly differentially expressed genes even when allowing as few as one or two false discoveries. We apply these new methods to analyze a microarray data set consisting of measurements on approximately 9000 genes in paired tumor specimens, collected both before and after chemotherapy on 20 breast cancer patients. The methods described are broadly applicable to the problem of identifying which variables of any large set of measured variables differ between pre-specified groups.

Introduction

Technological advances have made possible detailed genetic characterization of biological specimens. For example, cDNA microarray technology (Schena et al., 1995) permits the simultaneous evaluation of expression levels of thousands of genes on a single specimen, generating a gene expression “profile” for that specimen. The cDNA microarray technology has now been applied to profile numerous human cancer specimens; it is hoped that gene expression profiles of tumors might aid in distinguishing aggressive from indolent tumors and might guide choice of therapies. Another exciting opportunity comes with the completion of the initial sequencing and analysis of the human genome. More than 1.4 million single nucleotide polymorphisms (SNPs) in the human DNA sequence have been now identified (International Human Genome Sequencing Consortium, 2001). Differing patterns of SNPs might be related to risk of developing disease or predict response to, or toxicity from, drug therapies. A typical experimental approach in these settings would be to select specimens from two or more groups it was desired to compare, and then to measure a large number of characteristics on each specimen. Often one would want to identify characteristics that univariately are significantly different between the groups. The issue we address in this paper is how one can identify individual characteristics that are significantly different between groups of specimens while maintaining control over spurious findings amid the potentially enormous number of comparisons being made.

The particular example we consider in this paper consists of gene expression profiles obtained by cDNA microarray analysis of approximately 9000 genes for 40 paired breast tumor specimens. The specimens were collected on 20 breast cancer patients, before and after chemotherapy. Our interest is in identifying genes whose expression levels differed significantly after chemotherapy as compared to before. The example data are continuous, log-transformed expression ratios that measure the relative abundance of each gene's mRNA in the test specimen compared to a reference sample using a two-color fluorescent probe hybridization system (Schena et al., 1995). We note, however, that the general approaches described in this paper are very broadly applicable to both continuous and discrete data, and to censored data.

If one were to simply conduct univariate tests of characteristics, for example gene expression levels, using conventional significance levels, there would be an enormous multiple comparisons problem. Many genes would likely be claimed significantly differentially expressed when, in truth, they were not differentially expressed, i.e., a type I error. Such false claims of significance are often called “false discoveries.” One can use a procedure to account for these multiple comparisons and control the probability of any false discovery. This overall probability of any error is usually referred to as the familywise error (FWE) rate. A Bonferroni adjustment to the p-values, or preferably less conservative stepwise procedures such as those described by Westfall and Young (1993, pp. 72–74) and Troendle (1996) can be used to control the familywise error rate. For example, Callow et al. (2000) have applied the Westfall and Young method to microarray data. These procedures will guarantee that the probability of any false discovery is less than the designated significance level, e.g., 0.05. However, the criterion of not making any false discovery is too stringent for most microarray investigations, in which the identification of these genes will be followed by further study of them for biological relevance. On the other hand, making no adjustment for multiple comparisons could generate many false leads.

A reasonable compromise is to use a procedure that will allow some false discoveries, but not too many. A simple procedure is to lower the nominal significance level and appeal to Bonferroni, e.g., using a significance level of 0.001 would ensure in expectation at most 10 false discoveries with 10,000 variables. A slightly more complex procedure attempts to control for the expected proportion of discoveries (identified genes) that are false discoveries (with the proportion set to 0 when no genes are identified): Order the univariate p-values from the k variables, P(1)<P(2)<⋯<P(k). To keep the expected false discovery proportion less than γ (e.g., γ=0.10), identify as differentially expressed those genes that are associated with the indices 1,2,…,i, where i is the largest index satisfying P(i)k<. This procedure is attributed to Eklund by Seeger (1968) and was studied by Benjamini and Hochberg (1995). Tusher et al. (2001) present a procedure they call significance analysis of microarrays (SAM) for estimating a false discovery rate from data. Their procedure is based on estimating the expected number of false positives from a complete null permutation distribution; they do not discuss the statistical properties of their procedure. Procedures targeting control of the expected number or proportion of false discoveries rather than the actual number or proportion can give a false sense of security. This is demonstrated in Table 1, Table 2 for simulated data with 10,000 variables. In Table 1, we consider using a univariate nominal significance level of 0.001. The expected number of false discoveries is less than or equal to 10, but the spread of the distribution of the actual number of false discoveries becomes quite large when the correlation between the variables increases. For example, there is a 10% chance of having 18 or more false discoveries with block correlation 0.5. The same problem arises when controlling the expected false discovery proportion. Table 2 displays the distribution of the actual false discovery proportion when using the simple procedure described above to control the expected false discovery proportion to be less than 0.10. Even with no correlation the results here are troubling: 10% of the time the false discovery proportion will be 0.29 or more.

In this paper, we discuss methods for controlling the actual (rather than expected) number and proportion of false discoveries. We prove that our procedure for the former guarantees control with specified confidence, and provide evidence that our procedure for the latter achieves approximate control. That is, application of these methods will allow statements such as “with 95% confidence, the number of false discoveries does not exceed 2” or “with approximate 95% confidence, the proportion of false discoveries does not exceed 0.10”. In Section 2, we describe our procedures for controlling the number and proportion of false discoveries and provide justifications for the algorithms which appear in Appendix A. The methods are applied to the analysis of the pre-post chemotherapy breast cancer specimens in Section 3. In Section 4, we describe some limited simulation studies to assess our procedures and to compare them to procedures designed to control the FWE rate. We end with a discussion in Section 5.

Section snippets

Controlling false discoveries

We consider two probabilistic frameworks for inference. The first, a population model, specifies that for each group the k-dimensional vector of variables associated with a specimen is independent and identically distributed with a multivariate distribution that can depend on the group. For unpaired data, the multidimensional distribution of the subvector of null hypothesis variables is assumed to be the same in the different groups. For paired data, the pairs represent population strata, and

Application to microarray data from 40 paired breast tumor specimens

We demonstrate the methods on a subset of a previously published data set involving gene expression from cDNA microarrays using specimens from 65 breast tumors from 42 individuals (Perou et al., 2000). Our analysis is based on data from 20 individuals with specimens taken both before and after a 16-week course of doxorubicin chemotherapy. The microarray analyses generate two gene expression profiles for each specimen, one before, and one after chemotherapy. Each profile consists of log

Simulation studies

We conducted several simulations to assess the performance of Procedures A and B and to compare them to the procedure to control the FWE rate. Specifically, we evaluated the same procedures that we applied to the breast tumor data in the previous section. Similar to the paired breast tumor example, we considered cases in which 8000 hypotheses were to be tested, and we assumed 40 paired specimens. The paired difference data were generated as multivariate Gaussian with a variety of block

Discussion

This paper has considered both the control of the absolute number of false discoveries as well as the control of the false discovery proportion. Investigators may prefer one or the other of these types of control. For example, if an investigator considers 10 false discoveries out of 100 discoveries acceptable but not 10 false discoveries out of 12 discoveries, then he or she would be more interested in controlling the false discovery proportion. On the other hand, if following up 10 false

Acknowledgements

This study utilized the high-performance computational capabilities of the Biowulf/LoBoS3 cluster at the National Institutes of Health, Bethesda, MD. We thank the referees and editorial board member for their helpful comments.

References (12)

  • Y Benjamini et al.

    Controlling the false discovery ratea practical and powerful approach to multiple testing

    J. R. Statist. Soc. Ser. B

    (1995)
  • M.J Callow et al.

    Microarray expression profiling identifies genes with altered expression in HDL-deficient mice

    Genome Res.

    (2000)
  • Y Chen et al.

    Ratio-based decisions and the quantitative analysis of cDNA microarray images

    J. Biomed. Opt.

    (1997)
  • International Human Genome Sequencing Consortium

    Initial sequencing and analysis of the human genome

    Nature

    (2001)
  • E.L Lehmann

    Nonparametrics

    (1975)
  • C.M Perou et al.

    Molecular portraits of human breast tumors

    Nature

    (2000)
There are more references available in the full text version of this article.

Cited by (191)

  • Notip: Non-parametric true discovery proportion control for brain imaging

    2022, NeuroImage
    Citation Excerpt :

    The FDP is the proportion of false discoveries among all discoveries. As noted by several authors (Genovese and Wasserman, 2004; Korn et al., 2004; Neuvial, 2008), FDR control does not guarantee FDP control. An alternative type of inference to increase statistical power is to perform inference at cluster-level, rather than voxel-level (Poline and Mazoyer, 1993), because brain activation is organised in compact regions (clusters) in the brain volume.

  • Stimulus-specific plasticity of macaque V1 spike rates and gamma

    2021, Cell Reports
    Citation Excerpt :

    For a two-sided test, the minimal p value obtainable is therefore 0.002. In case of spectrally or time-resolved analyses, the same procedure as described for average gamma-band or dMUA responses was applied to all frequency or time bins, and a multiple comparison correction was applied with an alpha per test of 0.05 and a false discovery rate across the multiple tests of 0.05 (Korn et al., 2004). We observed that dMUA responses on average showed a rapid decrease for the first few repetitions, followed by a lesser decrease for further repetitions.

  • An Empirical Bayes Approach to Controlling the False Discovery Exceedance

    2023, Journal of Business and Economic Statistics
View all citing articles on Scopus
View full text