Introduction

Natural history cohort studies of HIV-1 were initiated in the United States and Europe during the peak of the AIDS epidemic (circa 1990). At the time, genome search methods were limited to family-based approaches, relegating the search for host genetic factors to candidate genes studies informed by a limited understanding of the pathophysiology of HIV [1••]. Nonetheless, progress was made, and the identification of the co-receptors for viral entry [2, 3] quickly led to the identification of allelic variants of the primary co-receptor, CCR5, that was depleted in HIV-1-infected individuals [4] and the carriers of which displayed delayed progression once infected with the virus. This key discovery demonstrated that host genetic factors could influence the course of HIV infection, provide insights into mechanisms, and suggest therapeutic targets. For an excellent review, the reader is directed to An and Winkler [1••].

The turn of the century brought new assay platforms and analytic methodologies that would allow for whole genome searches to be performed efficiently in unrelated individuals. To date, 11 genome-wide searches of varying design have been performed [5, 6, 7•, 810, 11•, 1214]. While the concordance among the genome-wide association study (GWAS) findings is striking, these similarities are due in part to the phenotypes and subgroups of individuals selected for study.

GWAS of HIV Pathogenesis

The search for novel host genetic factors that influence HIV pathogenesis has focused on a restricted and largely overlapping set of phenotypes. These phenotypes have logically and primarily focused on three clinically relevant end points and measurements that are summarized below. It follows that the susceptibility loci identified thus far participate in innate and adaptive immunity. All of the studies reviewed below are of high quality in terms of study designs, genomic data collection, and data analysis. Moreover, a compelling mechanistic rationale exists for each gene identified.

HIV RNA viral load at set point refers to the acute phase of the initial HIV infection when viral replication attains a steady-state. Though challenging to define given the typical follow-up period in most cohort studies (i.e., bi-annual visits), and the exclusion of many HIV-infected individuals who did not meet the inclusion criteria, the concordance among findings from GWAS studies to date were striking (Table 1). Specifically, three loci have mapped to the major histocompatibility locus on chromosome 6 and have been verified in every cohort examined for viral load set point to date. Human leukocyte antigen (HLA) P5 (HCP5), HLA class B (HLA-B), and HLA-C harbor protective alleles that were associated with lower viral load at set point [6, 7•, 13]. The high degree of correlation (termed linkage disequilibrium [LD]) between HCP5 and HLA-B made their independent associations with viral load at set point difficult to disentangle due to the small number of cases with rare recombination events between these loci. However, the association originally mapped to the HCP5 locus was subsequently dissected [6, 15••]. These analyses suggest that the HCP5 locus is associated with higher viral load at set point and the HLA-B locus (primarily HLA-B*57) is responsible for the protective effects detected by the single nucleotide polymorphism (SNP) located in HCP5 [15••].

Table 1 Genome-wide searches for HIV-related traits

Disease progression, defined as the time from seroconversion until the point at which immunosuppression occurs (i.e., a CD4+ T-cell count less than 350 cells/mm3, initiation of highly active antiretroviral therapy [HAART]), is a clinical end point of considerable interest. The extremes of the distribution in terms of disease progression, rapid progressors (RP), and long-term non-progressors (LTNP) have been the focus of several genome searches. Aside from the associations previously identified with the related phenotype, viral load at set point HCP5, HLA-B, HLA-C and variation of the zinc ribbon domain containing 1 (ZNRD1) gene are associated with disease progression [6, 7•, 12].

Three studies that employed unique study designs are described below that resulted in the identification of additional novel disease loci for disease progression. The first study sought to refine the LTNP phenotype by excluding elite controllers. Elite controllers differ from LTNPs in that they suppress RNA viral load at levels that are below the limit of detection. Exclusion of elite controllers in a GWAS for LTNP uncovered an additional risk locus, C-X-C chemokine receptor type 6 (CXCR6), validated in several cohorts [11•]. The second study sought to capture the entire spectrum of progression by initially screening three subgroups (i.e., RP, moderate progressors, LTNP), followed by replication in a larger cohort [8]. Variation in the prospero homeobox 1 (PROX1) gene was associated with slower progression. The third study first performed a two-stage linkage analysis in two family-based cohorts of macaques, and replicated an association signal detected on the X chromosome with viral load at set point and disease progression in a cohort of HIV-infected individuals [14]. The association signal mapped to an intragenic SNP located between the gene encoding for ribosomal protein S6 kinase alpha-6 (RPS6KA6) and the gene encoding for cylicin-1 (CYLC1). Subsequent validation in a larger sample may allow the gene underlying this association to be definitively identified.

Whereas the association signals with LTNP show considerable overlap with loci detected using disease progression [6, 7•, 8, 12] as the phenotype, analysis of RP yielded unique loci [10] that have proven difficult to replicate in other cohorts. This inability to replicate may be due in part to the under-representation of RP in most cohorts. However, the possibility that individuals who are RP or LTNP harbor risk alleles that are unique to each tail in the distribution cannot be discounted. The minor alleles of SNP mapping to the gene-encoding protein arginine methyltransferase 6 (PMRT6), the gene encoding the sex-determining region Y-box 5 (SOX5) gene, and the gene encoding for the transforming growth factor, beta receptor associated protein 1 (TGFBRAP1) alleles were depleted in RP. The risk allele mapping to the retinoid X receptor gamma (RXRG) gene was enriched in RP.

The majority of genome searches performed to date have focused on European-descent populations [5, 6, 7•, 8, 10, 11•, 12]. This approach is reasonable as it reflects the demographic of the epidemic when the cohorts analyzed thus far were initiated. The recent development of methods to account for more complex population substructure has paved the way to examination of populations with more diverse ancestry, such as Africans [9] and African Americans [13]. Although the first recently reported GWAS for viral load set point in African Americans failed to identify risk loci that exceeded the significance thresholds required of genome-wide searches, the associations with HCP5 and HLA-C were validated [13]. Examination of a completely different phenotype, maternal-to-child transmission in a cohort of HIV-serodiscordant children of HIV-infected mothers from Malawi, yielded several positional candidate genes. However, none exceeded the a priori significance thresholds. Further examination of these suggestive association signals may provide insights into the host genomic influence of the vertical transmission of HIV. Both studies suggest that novel phenotypes may provide additional novel genes that influence other facets of HIV transmission and pathogenesis.

To date, two genome searches have pursued novel HIV traits. The first involved examination of not only circulating RNA viral load, but viral DNA that serves as an estimate of the HIV viral reservoir [5]. In addition to the verification of the previous associations with HCP5 and HLA-C, two additional associations with both lower RNA and DNA viral load were identified. The first was in the syndecan 2 (SYND2) and the second was with an intragenic SNP that detected two flanking positional candidate genes: DEAH (Asp-Glu-Ala-His) box polypeptide 40 (DDX40) and the human homolog of yippee-like 2 (YPEL2) [16]. Future validation efforts may be able to identify which of the two genes (DDX40 or YPEL2) underlies this later association signal.

Opportunities for Future Research

Although the discoveries made thus far using genome-wide searches are clear, many opportunities remain for additional discoveries. Perhaps the most pragmatic opportunity lies in the secondary analysis of the currently available GWAS data. Three analytic methods are likely to yield additional insights: meta-analysis, focused gene–gene interactions (i.e., a specific type of gene–gene interaction termed epistasis), and pathway analysis [17]. Locus-specific meta-analyses [12], the examination of specific gene–gene interactions [6], and pathway analysis performed by Fellay and colleagues [6] have been pursued in a subset of the GWAS described. The availability of several new and imminent GWAS datasets suggests that a more in-depth series of analyses are possible. These types of analyses may uncover additional genetic associations that could not surpass the statistical significance thresholds in the component studies due to limited power.

The GWAS reported to date have focused primarily on European-descent male populations that reflect the demographics of the AIDS epidemic at the time that natural history cohorts were built. Whereas considerable value exists in studying these cohorts, the demographics of HIV disease have shifted. Women of color are at highest risk for new infection. Cohorts that represent this shift in demographics (i.e., non-European descent, women), such as the Women’s Interagency HIV Study (WIHS) [18], are currently available. The examination of gender-specific or gender-modified genetic associations was observed for two of the novel loci discovered by GWAS to date [12, 14]. In addition, the emergence of natural history cohorts of HIV in non-European-descent populations, such as the Centre for the AIDS Programme of Research in South Africa (CAPRISA) [19], will allow for the examination of the influence of different HIV subtypes in the genetic associations identified to date and that may result in the identification of novel associations. An important caveat to GWAS in populations of non-European descent is that genetic marker coverage in the current commercially available arrays is variable. This limitation was in evidence in the study of Loeuillet and colleagues, where a key risk allele was not tagged in a commonly used commercial array for genome-wide variation measurement [16]. Fortunately, the goal of the 1000 Genomes Project (1KGP, www.1000genomes.org) is to dramatically expand the catalog of variation for the next generation of GWAS search tools, with the goal of identifying nearly all variants that exist at any appreciable frequency in human populations.

Recent advances in DNA resequencing have revolutionized the fields of genetics and genomics. Without doubt, whole genome sequencing will eventually supersede the current GWAS approach (i.e., measuring relatively common sequence variations). The barriers to the application of this genome search tool by research groups of even modest resources include the cost, error profiles, and limitations of the new sequencing platforms that differ from traditional sequencing technologies, and foremost are the bioinformatic challenges (for a review see [20, 21]). However, deep re-sequencing of candidate gene regions with high prior index of suspicion, such as those identified in multiple independent GWAS and supported by functional studies, is a method that is currently tenable and suffers more modestly from the barriers identified above. This method currently serves as a powerful adjunct to GWAS and is useful in the identification of rare variants and/or sequence anomalies not currently captured in commercially available genotyping arrays [22].

The greatest frontier for the discovery of host genetic factors that influence the pathogenesis of HIV lies ahead. To date, the genome searches have focused primarily on plasma RNA viral load and disease progression as estimated by peripheral blood CD4+ T-cell count. Though the concordance of the findings has affirmed the value of studying these traits and outcomes, the genes that these phenotypes have implicated play a primary role in either innate and/or adaptive immunity. The utility of examination of novel phenotypes, including in vitro characterization, is evident [16]. Recent advances in the understanding of the molecular mechanisms of HIV proviral latency [23] may inform the selection of future phenotypic analyses. The longitudinal data accrued in most natural history cohorts to date are limited by participant burden and cost, with bi-annual visits being the most common time-interval. However, the modeling of more complex patterns of change over time [24] (e.g., J-shape curves for viral load following acute infection, CD4+ T-cell count decline trajectories, latent variable analysis) may provide phenotypes that are superior to those examined to date. The availability of banked serial biological specimens in many cohorts for which GWAS were reported suggests that cost effective studies can be performed by the addition of novel phenotypes that can be coupled with currently available genome-wide genetic marker data.

The emergence of HAART has naturally led to the examination of phenotypes captured prior to HAART due to the considerable complexity and ongoing evolution of these therapies in terms of drug targets. However, variable responses to HAART as well as differences in adverse event profiles remain a fundamental barrier to the success of these treatments [25]. Recent identification of gene polymorphisms that predict hypersensitivity reactions to different HAART drugs suggests that this line of inquiry is tenable [25]. The recent development of resource-efficient drug exposure measures, such as the measurement of HAART deposition in hair, holds promise to provide additional insights into pharmacogenomic risk factors as confounding due to self-report adherence is largely circumvented [2628].

GWAS of HIV-Associated Comorbid Disease

With the advent of HAART, HIV infection transitioned from an acute to a chronic disease. It is now well accepted that chronic HIV infection and/or HAART may influence several co-morbid diseases of aging as well as elicit disease-specific conditions [29, 30]. These conditions include, but are not limited to, neuropathy [31], nephropathy, atherosclerosis [32], metabolic perturbations [33], and neurocognitive disorders [34•]. In addition, the role of the host genome in the setting of co-infections that are common in HIV-infected individuals (e.g., hepatitis C, human papilloma virus) is an active area of research, particularly given the increased risk for common co-morbid disease (e.g., chronic kidney disease [35]).

The complex and poorly understood natural history of HIV infection necessitated rigorous longitudinal follow-up and in-depth characterization of the participants (e.g., demographic characteristics, clinical characteristics, co-morbid diseases, cell repository) enrolled into natural history cohorts. A natural byproduct of these intense and sustained studies is the possibility to not only contribute to an understanding of HIV and subsequently response to HAART, but also common co-morbid disorders and diseases. Below, two GWAS that serve as compelling exemplars of the importance of studying comorbid diseases in the setting of HIV are described.

Risk for chronic kidney disease is strikingly elevated in the setting of HIV infection. In addition, differences by race are observed with African Americans at increased risk. Mapping by admixture linkage disequilibrium, a method that detects ancestral risk alleles for disease in groups of individuals from recently mixed populations, resulted in the identification of a novel risk locus (i.e., myosin, heavy chain 9, non-muscle [MYH9]) [36]. Although recent evidence suggests that an adjacent locus may underlie the association with the MYH9 locus [37], the power of the approach is clear.

An equally promising discovery by GWAS of atherosclerosis in the setting of HIV was recently reported by Shrestha and colleagues [38]. Their search resulted in the identification of two SNPs in tight LD associated with carotid intima-media thickness mapping to the ryanodine receptor (RYR3). Previous work not only implicated a role for these SNPs in the etiology of cardiovascular disease, but the RYR3 protein is also known to interact with the HIV Tat protein. Clearly, additional research is warranted to better understand the role of the host genome in risk for common disease within the context of HIV.

Systems Biology: A New Vista to Understanding HIV Pathogenesis

Clearly, the complex host–viral interactions that underlie HIV pathogenesis will not be unraveled by GWAS alone. The integration of several other components of both cellular and organism-level processes will be required, termed a systems biology approach. Recent examinations of different functional RNA (fRNA) classes and gene expression profiles of specific host immune cell populations have yielded unique insights into HIV infection [1••]. The examination and integration of genomic, epigenomic, transciptomic, proteomic, metabolomic, and viral protein interactome are requisite. Though bioinformatic, computational, and statistical barriers to the integration of these data exist, new solutions to these challenges emerge daily. The integration of these data is sure to suggest novel therapeutic opportunities to interfere with the host–viral interaction to stymie effective infection.

Conclusions

The success of GWAS in the identification of HIV infection and control of viral levels (i.e., set point) is clear. However, the genes discovered to date and their variations only explain a portion of the variance in these traits. The addition of novel phenotypes in cohorts with pre-existing genome-wide data, the examination of novel cohorts by GWAS, and the application of novel analytic approaches and data mining will undoubtedly yield novel insights into HIV pathogenesis and risk for common co-morbid diseases. GWAS remains a cost-effective strategy to identify genes of interest that can be the focus of more resource-intensive functional and molecular studies. And finally, a systems biology approach will permit the integration of GWAS findings with other facets of cellular and organism-level biology and to prioritize targets for future therapies.