Elsevier

NeuroImage

Volume 60, Issue 3, 15 April 2012, Pages 1608-1621
NeuroImage

A large scale multivariate parallel ICA method reveals novel imaging–genetic relationships for Alzheimer's disease in the ADNI cohort

https://doi.org/10.1016/j.neuroimage.2011.12.076Get rights and content

Abstract

The underlying genetic etiology of late onset Alzheimer's disease (LOAD) remains largely unknown, likely due to its polygenic architecture and a lack of sophisticated analytic methods to evaluate complex genotype–phenotype models. The aim of the current study was to overcome these limitations in a bi-multivariate fashion by linking intermediate magnetic resonance imaging (MRI) phenotypes with a genome-wide sample of common single nucleotide polymorphism (SNP) variants. We compared associations between 94 different brain regions of interest derived from structural MRI scans and 533,872 genome-wide SNPs using a novel multivariate statistical procedure, parallel-independent component analysis, in a large, national multi-center subject cohort. The study included 209 elderly healthy controls, 367 subjects with amnestic mild cognitive impairment and 181 with mild, early-stage LOAD, all of them Caucasian adults, from the Alzheimer's Disease Neuroimaging Initiative cohort. Imaging was performed on comparable 1.5 T scanners at over 50 sites in the USA/Canada. Four primary “genetic components” were associated significantly with a single structural network including all regions involved neuropathologically in LOAD. Pathway analysis suggested that each component included several genes already known to contribute to LOAD risk (e.g. APOE4) or involved in pathologic processes contributing to the disorder, including inflammation, diabetes, obesity and cardiovascular disease. In addition significant novel genes identified included ZNF673, VPS13, SLC9A7, ATP5G2 and SHROOM2. Unlike conventional analyses, this multivariate approach identified distinct groups of genes that are plausibly linked in physiologic pathways, perhaps epistatically. Further, the study exemplifies the value of this novel approach to explore large-scale data sets involving high-dimensional gene and endophenotype data.

Introduction

Late onset Alzheimer's disease (LOAD), the commonest cause of late-life dementia (Bekris et al., 2010) has high heritability (Gatz et al., 2006a, Gatz et al., 2006b). However, its etiopathology, pathogenesis and major risk genes are only partly known, mainly due to its genetic complexity and heterogeneity. The “amyloid hypothesis” seems insufficient to fully explain LOAD etiology and alternative hypotheses continue to be advanced (Pimplikar et al., 2010).

To date only one gene of major effect, apolipoprotein E ε4 (APOE4), replicates as significantly influencing LOAD risk (Strittmatter et al., 1993), but does not account for all genetic variability, suggesting the interplay of multiple, mostly unidentified susceptibility loci of smaller effect size acting multiplicatively under a common disease variant model (Eccles and Tapper, 2010) and/or with environmental factors (Traynor and Singleton, 2010). Recent high-throughput genome wide association studies (GWAS) (Grupe et al., 2007, Harold et al., 2009, Seshadri et al., 2010, van Es and van den Berg, 2009) have identified and replicated in addition to APOE4, other genes such as BIN1, CLU, ABCA7, CR1, PICALM, MS4A6A, CD33, MSA4E and CD2AP, all of which (apart from APOE) have modest effect sizes and cumulatively account for only 35% of the population attributable risk (Ku et al., 2011, Naj et al., 2011). However, if LOAD risk is mediated in part by common polymorphisms individually conferring low disease risk, acting in concert, typical univariate GWAS might not have enough power to consistently detect these effects unless they utilize very large sample sizes. This might be an inherent problem as obtaining such large sample sizes are usually quite difficult. Also more importantly univariate studies do not take into account the effect of multiple genes at once. This is important because major LOAD risk factors include obesity, cerebrovascular disease and diabetes, all disorders with significant genetic underpinnings (Profenno et al., 2010), suggesting causative genes might belong to common biological pathways shared by these conditions. To circumvent some of these issues, multivariate analyses have been suggested as an approach to identify important genetic factors in LOAD (Gandhi and Wood, 2010).

MRI captures robust phenotypic neuroanatomical LOAD biomarkers, most consistently implicating posterior cingulate and entorhinal cortices, hippocampus and other medial temporal structures (Jack et al., 2010a, Jack et al., 2010b, Smith, 2010, Villain et al., 2010) corresponding to sites of early, severe LOAD-related neuropathology. Imaging genetics attempts to bridge genetic variations with phenotypic trait markers, relating genotypic variations to underlying biological disease etiologies and increasing statistical power, thereby requiring smaller sample sizes (Potkin et al., 2009). However, such strategies require tools to simultaneously accommodate thousands of data points per feature set (e.g. ~ 105 voxels from imaging data and up to 106 SNPs from genetic data), posing a major statistical challenge. Often, large scale studies are performed in a univariate fashion that significantly limits either one or both feature sets. However, these techniques can curtail the usefulness of multidimensional data to identify potentially informative relationships. Conventional voxel-wise analyses are computationally time consuming on a genome-wide scale and ineffectively capture cumulative effect spread over multiple genes. Prior analyses (Biffi et al., 2010, Potkin et al., 2009, Shen et al., 2010) on the multi-site MRI/genetic ADNI dataset used massively univariate approaches: GWAS, that confirmed the risk status of APOE4 and identified TOMM40 (Shen et al., 2010) and hypothesis-driven analyses using pre-selected known affected brain regions plus GWAS, that reinforced the status of promising individual genes of interest (Biffi et al., 2010). However, no analyses have evaluated the premise that genetic determinants are not randomly distributed among relevant biological pathways but instead grouped together among specific biological processes, nor have they detected predicted groups of common, interactive risk polymorphisms.

Parallel independent component analysis (Para-ICA) a novel multivariate data-driven, hypothesis-free statistical technique, extends ICA to analyze multiple modalities simultaneously (Calhoun et al., 2009). Para-ICA identifies simultaneously clusters of associated, likely interacting genes related to either: (a) functional brain networks, (b) related structural brain regions, or (c) physiologic processes e.g. EEG patterns or other potential endophenotypes and shows their relationships (Calhoun et al., 2009). Beginning with two modalities (here, SNP's and MRI-derived regional brain volume/thickness), we sought to discover underlying factors from both modalities and their connections. Similar to conventional ICA analyses, extracted structural MRI components are maximally independent within modality and loading coefficients represent variation among individuals. Networks or components extracted from genetic data are groups of interacting SNP loci, contributing with varying degrees to a genetic process affecting a downstream biological function, i.e. linear SNP combinations highly associated with related phenotypes. To date, this technique has been used mainly in schizophrenia and healthy controls to find genes responsible for brain structure and function using MRI and EEG patterns (Jagannathan et al., 2010, Liu et al., 2008, Meda et al., 2010). However, subject and SNP numbers in those studies were typically small.

Genetic and structural MRI data from the ADNI sample provide an ideal test bed to explore LOAD and to validate application of Para-ICA to larger datasets. The subject number (> 800) and large genotypic dataset (> 600,000 SNPs) allow for examination of feasibility of scaling up this technique where some valid results are published in this dataset from conventional, hypothesis-driven analyses (Biffi et al., 2010). Because many LOAD risk genes remain to be discovered, the technique can simultaneously be used to identify novel risk genes, as it identifies clusters of related, interacting SNPs.

We had the following goals: 1) to evaluate whether Para-ICA could be scaled up to deal with larger populations and many more SNPs than previously analyzed; 2) to identify new risk genes for LOAD and their corresponding endophenotypes and 3) to explore the different LOAD-mediating biological interactive pathways in which the identified risk genes might participate. We hypothesized that the method might identify previously unknown LOAD risk genes, as well as known candidate genes. We hypothesized that identified genes would group into LOAD-associated physiologic pathways and processes.

Section snippets

Materials and methods

We evaluated associations between two data modalities, structural MRI (sMRI), (regional brain volumes and cortical thicknesses), and genome-wide genotypic data (SNPs), to reveal multivariate relationships between structural brain regions and SNP's that differed between healthy controls, MCI and AD subjects.

Alzheimer's Disease Neuroimaging Initiative (ADNI) study

Data used in the preparation of this article were obtained from the ADNI database (adni.loni.ucla.edu). ADNI results from efforts of many co-investigators from a broad range of academic institutions and private corporations, with subjects recruited from over 50 sites across the U.S. and Canada. For up-to-date information, see www.adni-info.org. The Principal Investigator of this initiative is Michael W. Weiner, MD, VA Medical Center and University of California-San Francisco.

Genotype–phenotype associations (Para-ICA)

Para-ICA was implemented using the Fusion ICA Toolbox v2.0a; http://icatb.sourceforge.net in Matlab 7.0 to compute independent genetic/imaging networks and simultaneously identify and quantify association between the two modalities/features. This variant of ICA was designed for multimodality processing that extracts components using an entropy term based on information theory to maximize independence and enhances the interconnection by maximizing the linkage function in a joint estimation

Results

Initial data pre-processing with a univariate “GWAS like” analysis (p < 0.05 uncorrected) revealed N = 27,150 SNPs that differed significantly across groups. It confirmed SNPs from APOE (ε4; p = 6.6E-16; ε3; p = 3.6E-09) and TOMM40 (p = 7.25E-08) as the top three candidate genes, whose genotypes differed significantly across diagnostic groups, as identified in prior GWAS of the same parent dataset (Potkin et al., 2009).

Discussion

As hypothesized, we validated a scaled-up Para-ICA approach to reveal novel interactive genes and pathways for LOAD, thus highlighting one of the primary advantages of Para-ICA which is the use of modest sample sizes compared to conventional GWAS analyses to effectively capture genotype–phenotype relationships. Dominant loading coefficients were contained in all major regions affected by LOAD pathology in the single structural component significantly associated with four different SNP/genetic

Conflict of interest statement

Dr. Andrew Saykin receives research support from Eli Lilly and Company, Siemens AG, and Welch Allyn Inc.

The following are the supplementary materials related to this article.

. Scatter plots for associations between (loading coefficients) for all four genetic networks with S1. Also shown is the mean regression line with 95% confidence interval. Diagnostic groups are identified by different colors. Note the clear separation of association terms between groups.

. Between-group comparison chart of

Acknowledgments

The study was supported by the following grants and research support to Dr. Andrew Saykin from Eli Lilly and Company, Siemens AG, Welch Allyn Inc., the NIH (R01 CA101318 [PI], R01 AG19771 [PI], RC2 AG036535 [Core Leader], P30 AG10133–18S1 [Core Leader], and U01 AG032984 [Site PI and Chair, Genetics Working Group]), the Indiana Economic Development Corporation (IEDC #87884), and the Foundation for the NIH, and to Dr. Vince Calhoun from NIH ROIEB005846. We would also like to thank Mrs. Joanna

References (62)

  • L.A. Profenno et al.

    Meta-analysis of Alzheimer's disease risk with obesity, diabetes, and related disorders

    Biol. Psychiatry

    (2010)
  • V. Ramaswamy et al.

    Developmental disability: duplication of zinc finger transcription factors 673 and 674

    Pediatr. Neurol.

    (2010)
  • A.J. Saykin et al.

    Alzheimer's Disease Neuroimaging Initiative biomarkers as quantitative phenotypes: genetics core aims, progress, and plans

    Alzheimers Dement.

    (2010)
  • L. Shen et al.

    Whole genome association study of brain-wide imaging phenotypes for identifying quantitative trait loci in MCI and AD: A study of the ADNI cohort

    Neuroimage

    (2010)
  • B.J. Traynor et al.

    Nature versus nurture: death of a dogma, and the road ahead

    Neuron

    (2010)
  • E. Urcelay et al.

    Enhanced proliferation of lymphoblasts from patients with Alzheimer dementia associated with calmodulin-dependent activation of the Na+/H + exchanger

    Neurobiol. Dis.

    (2001)
  • D.G. Walker et al.

    Human postmortem brain-derived cerebrovascular smooth muscle cells express all genes of the classical complement pathway: a potential mechanism for vascular damage in cerebral amyloid angiopathy and Alzheimer's disease

    Microvasc. Res.

    (2008)
  • L. Yang et al.

    Unbiased screening for transcriptional targets of ZKSCAN3 identifies integrin beta 4 and vascular endothelial growth factor as downstream targets

    J. Biol. Chem.

    (2008)
  • J.R. Bamburg et al.

    Cytoskeletal pathologies of Alzheimer disease

    Cell Motil. Cytoskeleton

    (2009)
  • S. Bareiss et al.

    Delta-catenin/NPRAP: a new member of the glycogen synthase kinase-3beta signaling complex that promotes beta-catenin turnover in neurons

    J. Neurosci. Res.

    (2010)
  • L.M. Bekris et al.

    Genetics of Alzheimer disease

    J. Geriatr. Psychiatry Neurol.

    (2010)
  • A. Biffi et al.

    Genetic variation and neuroimaging measures in Alzheimer disease

    Arch. Neurol.

    (2010)
  • M.E. Calkins et al.

    The Consortium on the Genetics of Endophenotypes in Schizophrenia: model recruitment, assessment, and endophenotyping methods for a multisite collaboration

    Schizophr Bull

    (2007)
  • V.D. Calhoun et al.

    A method for making group inferences from functional MRI data using independent component analysis

    Hum. Brain Mapp.

    (2001)
  • R. Dominguez

    Actin filament nucleation and elongation factors—structure–function relationships

    Crit. Rev. Biochem. Mol. Biol.

    (2009)
  • D. Eccles et al.

    The influence of common polymorphisms on breast cancer

    Cancer Treat. Res.

    (2010)
  • P. Eikelenboom et al.

    The significance of neuroinflammation in understanding Alzheimer's disease

    J. Neural. Transm.

    (2006)
  • G. Gallo

    Tau is actin up in Alzheimer's disease

    Nat. Cell Biol.

    (2007)
  • S. Gandhi et al.

    Genome-wide association studies: the key to unlocking neurodegeneration?

    Nat. Neurosci.

    (2010)
  • M. Gatz et al.

    Role of genes and environments for explaining Alzheimer disease

    Arch. Gen. Psychiatry

    (2006)
  • A. Grupe et al.

    Evidence for novel susceptibility genes for late-onset Alzheimer's disease from a genome-wide association study of putative functional variants

    Hum. Mol. Genet.

    (2007)
  • Cited by (106)

    • Multivariate Analysis and Modelling of multiple Brain endOphenotypes: Let's MAMBO!

      2021, Computational and Structural Biotechnology Journal
    • Distinct structural brain circuits indicate mood and apathy profiles in bipolar disorder

      2020, NeuroImage: Clinical
      Citation Excerpt :

      Here, we employ parallel independent component analysis (pICA) to reveal the relationship between clinical profiles and the structural brain patterns of BD patients. pICA is a higher order statistical method which can be used to establish association between different data modalities (Calhoun et al., 2009; Chen et al., 2013; Liu et al., 2009; Meda et al., 2012; Pearlson et al., 2015). Single-modality ICA identifies maximally independent components through blind source separation in one data modality.

    View all citing articles on Scopus

    Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.ucla.edu). As such, the investigators within ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in the analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.ucla.edu/wpcontent/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.

    1

    For the Alzheimer's Disease Neuroimaging Initiative.

    View full text