Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays

  1. Daisuke Komura1,2,8,
  2. Fan Shen3,8,
  3. Shumpei Ishikawa1,8,
  4. Karen R. Fitch3,
  5. Wenwei Chen3,
  6. Jane Zhang3,
  7. Guoying Liu3,
  8. Sigeo Ihara1,
  9. Hiroshi Nakamura1,2,
  10. Matthew E. Hurles4,
  11. Charles Lee5,
  12. Stephen W. Scherer6,
  13. Keith W. Jones3,
  14. Michael H. Shapero3,
  15. Jing Huang3,9, and
  16. Hiroyuki Aburatani1,7,9
  1. 1 Research Center for Advanced Science and Technology, The University of Tokyo, Meguro, Tokyo 153-8904, Japan;
  2. 2 Department of Advanced Interdisciplinary Studies, Graduate School of Engineering, The University of Tokyo, Bunkyo-ku, Tokyo 113-8656, Japan;
  3. 3 Affymetrix, Inc., Santa Clara, California 95051, USA;
  4. 4 The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom;
  5. 5 Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA;
  6. 6 The Centre for Applied Genomics and Program in Genetics and Genomic Biology, The Hospital for Sick Children, Toronto, Ontario, M5G 1L7, Canada;
  7. 7 Japan Science and Technology Agency, Kawaguchi, Saitama, 332-0012, Japan
  1. 8 These authors contributed equally to this work.

Abstract

Recent reports indicate that copy number variations (CNVs) within the human genome contribute to nucleotide diversity to a larger extent than single nucleotide polymorphisms (SNPs). In addition, the contribution of CNVs to human disease susceptibility may be greater than previously expected, although a complete understanding of the phenotypic consequences of CNVs is incomplete. We have recently reported a comprehensive view of CNVs among 270 HapMap samples using high-density SNP genotyping arrays and BAC array CGH. In this report, we describe a novel algorithm using Affymetrix GeneChip Human Mapping 500K Early Access (500K EA) arrays that identified 1203 CNVs ranging in size from 960 bp to 3.4 Mb. The algorithm consists of three steps: (1) Intensity pre-processing to improve the resolution between pairwise comparisons by directly estimating the allele-specific affinity as well as to reduce signal noise by incorporating probe and target sequence characteristics via an improved version of the Genomic Imbalance Map (GIM) algorithm; (2) CNV extraction using an adapted SW-ARRAY procedure to automatically and robustly detect candidate CNV regions; and (3) copy number inference in which all pairwise comparisons are summarized to more precisely define CNV boundaries and accurately estimate CNV copy number. Independent testing of a subset of CNVs by quantitative PCR and mass spectrometry demonstrated a >90% verification rate. The use of high-resolution oligonucleotide arrays relative to other methods may allow more precise boundary information to be extracted, thereby enabling a more accurate analysis of the relationship between CNVs and other genomic features.

Footnotes

  • 9 Corresponding authors.

    9 E-mail jing_huang{at}affymetrix.com; fax (408) 732-7025.

    9 E-mail haburata-tky{at}umin.ac.jp; fax 81-3-5452-5355.

  • [Supplemental material is available online at www.genome.org. The array data from this study have been submitted to GEO under accession nos. GSE5013 and GSE5173.]

  • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.5629106

    • Received June 12, 2006.
    • Accepted August 29, 2006.

Related Article

| Table of Contents

Preprint Server