Genomics, Prior Probability, and Statistical Tests of Multiple Hypotheses

  1. Kenneth F. Manly1,4,
  2. Dan Nettleton2, and
  3. J.T. Gene Hwang3
  1. 1 Department of Molecular and Cellular Biology, Roswell Park Cancer Institute, Buffalo, New York 14263, USA
  2. 2 Department of Statistics, Iowa State University, Ames, Iowa 50011, USA
  3. 3 Departments of Mathematics and Statistical Science, Cornell University, Ithaca, New York 14853, USA

This extract was created in the absence of an abstract.

Genomic methods have made statistical multiple-test methods important to geneticists and molecular biologists. These tests apply to identification of quantitative trait loci and measurement of changes in RNA or DNA abundance by microarray methods. Recently developed multiple-test methods provide more statistical power when many of the tested null hypotheses are false. At the same time, these methods can provide stringent control of errors in cases when most or all of the tested null hypotheses are true. These methods control errors in a different way from previous hypothesis tests, controlling or estimating quantities called the posterior error rate (PER), false discovery rate (FDR), or proportion of false positives (PFP), rather than the type I error. In this study, we attempt to clarify the relationships among these methods and demonstrate how the proportion of true null hypotheses among all tested hypotheses plays an important role.

Genomic methods, those that evaluate many genes or many genomic locations for some property, often require testing a large set of statistical hypotheses, called a family of hypotheses. Such a family may include thousands of hypotheses. For example, detection of quantitative trait loci involves testing a statistical association between trait values and genotypes at several hundred marker loci (Lander and Botstein 1989). Microarray analysis of RNA expression may involve looking for changes among thousands of RNA species (Lockhart et al. 1996). Combining the two techniques (Jansen and Nap 2001; Brem et al. 2002; Schadt et al. 2003), tests pairwise associations between thousands of RNA expression patterns and genotypes at hundreds of marker loci.

Naive application of standard hypothesis tests with no adjustment for multiple testing will yield large numbers of nonreproducible positive results or false discoveries (Soriç 1989). On the other hand, using multiple testing methods to control the family-wise type I error rate (FWER, …

| Table of Contents

Preprint Server