Introduction

Genetic variation has been shown to play an important role in acquired immune deficiency syndrome (AIDS), with numerous AIDS restriction genes (ARGs) affecting HIV-1 cell entry, acquired and innate immunity, and cytokine defenses against HIV-1 (O’Brien and Nelson 2004). In the United States, HIV is now one of the leading causes of death in young, predominantly male, adults (Anderson 2001). While there is little evidence to suggest a gender difference in progression to AIDS (Pezzotti et al. 1996), the immune environments between males and females differ in T helper (TH) lymphocyte response, antibody production and cell-mediated immunity (Gleicher and Barad 2007; Marriott and Huet-Hudson 2006; McCarthy 2000; Whitacre 2001; Whitacre et al. 1999). Proteins encoded by the Y chromosome may play a role in this difference between male and female immune responses, as well as variable immune responses within males (Teuschert et al. 2006; Wesley et al. 2007). In humans, males are more likely to exhibit risky behaviors such as physical aggression, resulting in injuries, and are more likely to travel to new geographical regions (Bosch et al. 2003; Carvajal-Carmona et al. 2000; Carvajal-Carmona et al. 2003; Cavalli-Sforza and Hewlett 1982; Mesa et al. 2000), resulting in more exposure to infectious diseases. These and other circumstances have created unique selective environments that likely specifically impacted Y chromosome evolution and variation (Jobling et al. 1998).

The haploid 60-megabase Y chromosome is obviously unique for genetic studies because only males are carriers. Further, 95% of the Y chromosome that does not undergo recombination contains 78 redundant protein-coding genes that encode 27 distinct proteins (Jobling and Tyler-Smith 2003; Skaletsky et al. 2003). Previous work has defined a Y chromosome phylogeny of 311 haplogroups where 20 major clades are evident (Karafet et al. 2008). Over the past decade attempts have been made to carry out association studies of Y chromosome mutations with differing male phenotypes and some effects related to infertility, testicular cancer and hypertension have been suggested (Krausz et al. 2004).

Previous studies have also examined Y chromosome genes for immune cell expression, immunological function, and the role of the encoded proteins in the transplantation response. We know that of the 11 ubiquitously expressed genes, five (DDX3Y, UTY, USP9Y, SMCY and TMSB4Y) are highly expressed in immunological tissues (Ditton et al. 2004; Lahn and Page 1997; Skaletsky et al. 2003). All five of these proteins also function as transplantation antigens in graft-versus-host disease (Agulnik et al. 1994; Ivanov et al. 2005; Laurin et al. 2006; Torikai et al. 2004; Vogt et al. 2000; Warren et al. 2000). These results suggest that several Y chromosome genes play a role in immunological monitoring.

Extensive studies of Y chromosome variability have focused on human population structure and migration patterns. Comparisons of the Y chromosome, autosomes, and the mitochondrial DNA along with human history suggests different migration patterns of males and females (Hamilton et al. 2005; Seielstad et al. 1998; Wilder et al. 2004; Wood et al. 2005). Some have argued that, in general, females and males in pre-agricultural societies migrated at equal rates (Wilkins and Marlowe 2006). In contrast, some studies have reported elevated female migration rates (Hamilton et al. 2005; Seielstad et al. 1998) which must be balanced with more recent male directional migration associated with colonization (Bosch et al. 2003; Mesa et al. 2000; Wilkins 2006). Diamond (2005) and colleagues (Dobson and Carper 1996; Wolfe et al. 2007) have argued that infectious diseases played a major role in the European colonization and the associated failure to expand in the tropics. Taken together, there is a unique history of the Y chromosome relative to autosomes and the X chromosome.

We postulated that underlying functional variants in any infection response and resistance could be identified by a Y chromosome major haplogroup analysis. Markers for the ten known major haplogroups of European Americans and African Americans were genotyped in five major HIV-1/AIDS cohorts. The haplogroups were then examined for association with HIV-1 infection, progression, viral load set point and highly active retroviral therapy (HAART) response.

Materials and methods

Study population

We studied 3,727 males (2,292 European Americans, 1,233 African Americans, and 202 individuals from other racial groups) from five HIV-1/AIDS cohorts: Multicenter AIDS Cohort Study (MACS n = 1,594) (Phair et al. 1992); San Francisco City Cohort (SFCC n = 195) (Buchbinder et al. 1994); Multicenter Hemophilia Cohort Study (MHCS n = 703) (Goedert et al. 1989); Hemophilia Growth and Development Study (HGDS n = 213) (Hilgartner et al. 1993); and the AIDS Linked to Intravenous Experience (ALIVE n = 1,022) (Vlahov et al. 1998). Participants were divided into one of three categories as follows: seronegative (SN, negative result with HIV antibody test, n = 656); seroconverter (SC, date of HIV-1 infection known, n = 920); and seroprevalent (SP, HIV-1-positive before study enrollment, n = 2,151). SCs were limited to those with 2 years or fewer between first positive and last negative clinic visits. Study subjects were HAART naïve individuals and thus were censored until a conservative HAART initiation cutoff date of 1 January 1996 since HAART became available in 1996. An exception to this was 467 MACS subjects that were only used for HAART analyses (see “HAART analysis” section). Estimated dates of seroconversion were available for SP subjects from their respective cohorts (Buchbinder et al. 1994; Goedert et al. 1989; Phair et al. 1992).

Individual DNA samples were extracted from the immortal lymphoblastoid B cell lines as previously described (Dean et al. 1996). This study was approved by the Protocol Review Office of the Institutional Review Board of the National Cancer Institute. Informed consent was obtained from all individuals at the study sites.

Y chromosome analysis

Each male was genotyped for ten single nucleotide polymorphisms (SNPs) located on the non-recombining region of the Y chromosome (NRY) using the 5′ nuclease assay (TaqMan Assay-by-Design SNP genotyping products, Applied Biosystems) (Lind et al. 2007). Males were then assigned to 1 of 11 major Y chromosome haplogroups (those common in Europeans and Africans) according to the Y Chromosome Consortium (2002). Individuals who had the m89 derived allele but did not belong to the H, I or K haplogroups were joined to form the juxtaposed F* (xI, H, K) group and referred as F* throughout the manuscript. After some promising results were obtained for haplogroup I (Y-I), five additional markers, M253, M227, P37, M26, and M223 (Cinnioglu et al. 2004; Rootsi et al. 2004; Underhill et al. 2001) were genotyped to define the I1a*, I1a4, I1b*, I1b2, and I1c subhaplogroups (the corresponding YCC 2008 (Karafet et al. 2008) names are I1, I1b, I2a, I2a2 and I2b, respectively). The remaining Y-I haplogroup subjects who lacked the derived alleles at these five markers were assigned into the I* (xI1a*, xI1a4, xI1b*, xI1b2, xI1c) paragroup.

Statistical analyses

Haplogroups with frequency less than 5% were excluded from statistical analysis. Haplogroups compared were I, R and F* in European Americans, and I, R and E in African Americans. Statistical analyses were performed with SAS 9.1 (SAS Institute, Cary, NC, USA). P values reported were nominal and uncorrected for multiple comparisons. The proportion of false positives due to multiple testing was estimated using the QVALUE (version 1.0) program (Storey and Tibshirani 2003). A q value cutoff of 0.05 was used for significance. Throughout the main body of this paper q values are given when P ≤ 0.05 and in the display items q values ≤0.05 are noted.

Population structure

For 90% of the European-American and 97% of the African-American SCs, 800K autosomal SNP data from Affymetrix 6.0 Genechip platform (Troyer et al. in preparation) with call rates greater than or equal to 95% was applied a principal components analysis (PCA) using the EIGENSOFT (Price et al. 2006) program to examine and adjust for potential population stratification. ANOVA F statistic was performed on the recovered eigenvectors given the Y chromosome haplogroups. Further, the top three most informative eigenvector values for each subject were included in a Cox proportional hazards model to correct for their contribution to the estimated hazard ratios.

HIV infection and progression

Separate analyses were conducted for European-American and African-American subjects. Haplogroup frequencies were compared in HIV-1-negative and HIV-1-positive subjects using the log-likelihood chi-square test. For SC subjects the Kaplan–Meier and Cox proportional hazard model analyses were used to examine the relationship between Y haplogroups and four AIDS-related outcomes: time to CD4 T lymphocyte count of fewer than 200 per cubic millimeter (CD4 < 200); AIDS-1993 definition (Centers for Disease Control 1992); AIDS-1987 definition (Centers for Disease Control 1986); and death from AIDS-defining illnesses prior to HAART use. The Cox models were run by stratifying for cohorts in order to account for individual cohort effects. The nine ALIVE European American samples, of which only three had AIDS outcome data, were not included the stratified analyses to maintain statistical power. The time dependence of AIDS progression was examined initially using graphical analyses and later with partitioned time interval Cox analysis. Genes known to influence HIV-1 progression [CCR5-Δ32 (Dean et al. 1996), CCR2-64I (Smith et al. 1997), CCR5-P1/P1 promoter (Martin et al. 1998), and IL10-5A (Shin et al. 2000)] were accounted for in the Cox proportional hazard models. More generally, the full set of known ARGs was summarized as a genetic propensity index (O’Brien and Nelson 2004) that also included HLA-B*27, HLA-B*57, HLA-B*35, HLA-B*35Px, KIR3DS1, TSG101, SDF, RANTES, and HLA class I heterozygosity for each subject for the four AIDS-related outcomes and used as a continuous variable in the Cox analyses. SP subjects who avoided one or more AIDS outcomes for more than 9 years were included only in the categorical analyses of AIDS progression and outcomes similar to our previous reports (Martin et al. 1998; Shin et al. 2000; Smith et al. 1997).

HIV-1 viral load set point analysis

Viral load measurements for 381 European-American SCs from the MACS cohort were quantified using either the reverse-transcription polymerase chain reaction (RT-PCR, Amplicor; Roche Diagnostics, Nutley, NJ, USA) or the Roche Ultrasensitive Assay described previously (Lyles et al. 2000). Viral load set points were calculated as the mean of all viral measurements for these SCs using the HIV-1 RNA measurements obtained between 6 months to 3 years after the first HIV-seropositive visit and analyzed as log10 HIV-1 RNA copies/mL.

HAART analysis

HAART was defined (Office of AIDS Research Advisory Council 2006) as one of the following: (1) two or more nucleoside reverse transcriptase inhibitors (NRTIs) in combination with at least one protease inhibitors (PI) or one non-nucleoside reverse transcriptase inhibitors (NNRTI); (2) one NRTI in combination with at least one PI and at least one NNRTI; (3) a regimen consisting of ritonavir and saquinavir in combination with one NRTI and no NNRTIs; and (4) an abacavir- or tenofovir- containing regimen of three or more NRTIs without both PIs and NNRTIs. Combinations of zidovudine (AZT) and stavudine (d4T) with either a PI or NNRTI were not considered HAART. The date of HAART initiation was defined as the midpoint between the last visit without HAART and the first visit at which HAART was reported. Only subjects with less than 1 year between these visits were included in the study. Baseline viral load and CD4 counts were taken within 6 months of HAART initiation.

We only used the European-American MACS cohort subjects to detect the suppression of viral load upon HAART. Time to suppression was defined as the time from HAART initiation to the first undetectable viral load visit for each subject. Differential survival rates were examined graphically using Kaplan–Meier analyses and with the hazard ratio using Cox proportional hazards models adjusted for clinical AIDS prior to HAART, baseline viral load, and CD4 cell count, and CCR5-Δ32. The measurement of HIV-1 RNA was obtained by RT-PCR (Amplicor HIV Monitor Assay, Roche Diagnostics, Nutley, NJ). We performed separate analysis of subjects below thresholds of <200 copies/mL and <50 copies/mL. Failure of viral suppression (<200 copies/mL) was reported as HAART failure.

Results

In our analyses, we focused on the European-American and African-American samples, and excluded other racial groups due to their small sample size in these cohorts. The Y major haplogroup frequencies of the remaining 3,490 subjects are shown in Table 1.

Table 1 Frequency (%) of Y chromosome haplogroups in European-American and African-American seroconverters, seroprevalents and seronegatives

European Americans

Population structure and stratification analysis

Since specific Y chromosomes track population stratification in a population study, we determined whether any observed signals were due to the Y chromosome itself or due to an autosomal region associated with the Y chromosome because of geographical origins. We examined data for 800K autosomal SNPs and applied the EIGENSOFT-PCA (Price et al. 2006) approach to identify and correct for population stratification in the I, R and F* haplogroup subjects. The known substructure of European Americans (Price et al. 2008) from northwest to southeast Europe and the division within southern Europeans were evident. An ANOVA of the most significant (top three) eigen vector values (principal components) showed that I and R haplogroup subjects were not significantly different from each other (vector-1: P = 0.60; vector-2: P = 0.08; vector-3: P = 0.67). However, the distribution of the F* subjects along the first and second eigen vectors were significantly different compared to the I and R subjects (vector-1: df = 2, P = 0.0001; vector-2: df = 2, P = 0.001; vector-3: df = 2, P = 0.91). For completeness, the progression analyses (of SCs) were corrected for these top three most informative eigen vectors.

HIV-1 infection and AIDS progression

An analysis of susceptibility to infection showed no significant frequency difference when comparing HIV-1-negative and SC subjects in European Americans (P = 0.38; Table 1). A survival difference for the AIDS-1987 outcome was observed between the SCs belonging to the three most common (I, R, and F*) European-American haplogroups (df = 2, log-rank = 0.02, q = 0.19; Fig. 1), where faster disease progression in the Y-I haplogroup subjects was evident. A trend toward faster AIDS progression in the Y-I haplogroup was also suggested by the Cox proportional hazards analyses [Table 2; relative hazard (RH) = 1.35, = 0.07 for AIDS-1987]. A close examination of survival plots indicated acceleration to AIDS in the latter stage of HIV-1 pathogenesis among the Y-I haplogroup subjects. To address the time dependence of AIDS progression of Y-I haplogroup individuals, various partitioned time intervals were examined (data not shown). Generally, the most significant Cox models were divided into early progressors (0–7 years after seroconversion) and late/slow progressors (>7 years after seroconversion) for CD4 < 200 and AIDS-1993 outcomes; and at 9 years for the AIDS-1987 and death. Our Cox model interval analyses indicated that Y-I haplogroup subjects depleted CD4 cells more quickly (RH = 2.05, = 0.007, q = 0.10) and progressed to AIDS (RH = 2.84, = 0.001, q = 0.03) and death (RH = 2.48, = 0.003, q = 0.07) faster in the latter phase (Table 2). The faster progression to AIDS-1987 and death (in the MHCS cohort) results were significant after multiple test corrections among the Y-I haplogroup individuals (Table 3). This direction of association was consistent in all the cohorts, but the results were not always as significant (Supplementary Table 1). An additional categorical analysis, including the seroprevalent subjects also showed the same trend (results not shown).

Fig. 1
figure 1

Survival analyses of most common European-American and African-American Y chromosome haplogroups in all AIDS cohorts. Number of individuals (n) and number of events, Cox proportional hazards model P values (P) and relative hazards (RH) are presented. ac Kaplan–Meier survival curve of European-American individuals for three AIDS-related outcomes. Haplogroup R is the reference group. Haplogroup E was excluded from further analyses as it had only 35 subjects (5.8% of seroconverters, Table 1) and behaved nearly identically to the reference haplogroup R for the major AIDS outcomes. d Kaplan–Meier survival curve of African-American individuals. Most common E haplogroup is used as the reference group

Table 2 Faster progression of Y-I haplogroup subjects to AIDS outcomes among European-American seroconverters in early and late time intervals
Table 3 Significant associations of Y-I haplogroup with AIDS progression and viral suppression outcomes with corresponding q values (≤0.05)

To address whether any one of the Y-I subhaplogroups carried the faster progression signal, we further genotyped the Y-I haplogroup subjects for the most common Y-I subhaplogroup markers (I1a*, I1a4, I1b*, I1b2 and I1c) known from Europe. Y-I subhaplogroups show geographical frequency differences in Europe most probably representing their places of origin. Subhaplogroups I1a* and I1a4 are mostly found in northern Europe. I1b* and I1b2 are most frequent in Eastern Europe and the Balkans, with I1b2 reaching the highest frequency in Sardinia. Finally, I1c despite its wide range in Europe, is found mostly in northwest Europe (Rootsi et al. 2004). The observed frequencies of I1a*, I1a4, I1b*, I1b2 and I1c were 59.2, 0.3, 11.5, 2.8 and 23.1%, respectively. The remaining 3.1% of the Y-I haplogroup samples that lacked these five subhaplogroup-defining derived alleles were labeled I*. The Cox analyses suggested a relatively faster progression signal for the relatively common I1a* subhaplogroup (RH = 1.42, = 0.09 for AIDS-1987; Table 4) compared to the other subhaplogroups. However, a Kaplan–Meier survival analysis did not find significant differences between the subhaplogroups (df = 4, log-rank = 0.72; Supplementary Fig. 1) indicating that the Y-I haplogroup as a whole best explains the acceleration to AIDS outcomes rather than any particular Y-I subhaplogroup. To correct for any population stratification effect, we included the top three eigen values for each SC as a continuous variable in the Cox model. In these adjusted COX models, Y-I haplogroup individuals trended toward faster CD4 depletion and progression to AIDS and death later in HIV infection (Table 2). The faster progression to AIDS-1987 result was significant after multiple test corrections (Table 3). This direction of association was consistent in all the cohorts, but the results were not always as significant (Supplementary Table 2).

Table 4 Association of Y-I subhaplogroups with progression to AIDS outcomes among European American seroconverters

We also examined the frequency of Y chromosome haplogroups among subjects with different AIDS-defining illnesses (Supplementary Table 3). An analysis of haplogroup frequencies among cases and controls for AIDS-defining disease revealed a trend toward elevated frequency of Y-I haplogroup cases compared to SCs for all of the eight disease categories examined and a trend toward elevated malignancy development (RH = 2.0, P = 0.009, q = 0.11) and Kaposi sarcoma (RH = 2.30, P = 0.007, q = 0.10) for the Y-I haplogroup (Supplementary Table 3). These findings were consistent with a faster AIDS progression observed in the Y-I haplogroup subjects. A similar analysis of AIDS-defining disease as the first outcome showed seven of the eight disease categories with elevated Y-I haplogroup frequencies. However, neither the results nor the Cox models for specific disease outcomes were significant (results not shown).

Genetic cofactors

We systematically examined and accounted for the effects of well-known ARGs. First, to evaluate the CCR5-Δ32 influence on AIDS progression, we analyzed the Y-I haplogroup association using CCR5-Δ32 in a Cox model as a covariate. Again, the results indicated a significantly faster progression to AIDS-1987 (after 9 years) in the Y-I haplogroup subjects (adjusted RH = 2.93, = 0.0009, q = 0.03). The Y-I haplogroup also maintained its significant result when CCR2-64I, CCR5-P1/P1 promoter and IL10-5A variants were included in the Cox model (results not shown). Further, when we included a genetic propensity index which accounts for HLA-B*27, HLA-B*57, HLA-B*35, HLA-B*35Px, KIR3DS1, TSG101, SDF, RANTES and HLA class I heterozygosity effects in the Cox model, the progression difference was still significant for the Y-I haplogroup (Table 2). This progression difference was also observed in individual cohorts, but the results were not always as significant (Supplementary Table 4). Moreover, the Y-I haplogroup association with rapid CD4 cell depletion, and rapid progression to AIDS-1987 and to death remained significant after multiple test corrections (Table 3).

Viral load and HAART

Plasma viral load set points among subjects infected with HIV-1 may be a strong indicator of future disease progression because higher viral load levels lead to faster CD4 cell depletion and AIDS progression (Bruisten et al. 1997; Henrard et al. 1995; Katzenstein et al. 1996; Lyles et al. 2000). An ANOVA analysis of plasma viral load set point among HIV-infected patients showed no significant differences between I, R, and F* haplogroup subjects (df = 2, = 0.36; Supplementary Table 5). In contrast, among AIDS patients treated with HAART, Y-I haplogroup-bearing subjects took significantly longer to reach overall viral suppression in a Cox model (<200 HIV RNA copies/mL; RH = 0.62, = 0.001, q = 0.03; Fig. 2) and to reach an undetectable viral load (<50 HIV RNA copies/mL; RH = 0.69, = 0.02, q = 0.19; Table 5). The slower viral suppression result observed in the Y-I haplogroup was still significant after multiple test corrections (Table 3). The Y-I subhaplogroups had the same trend for AIDS progression, but the significant result was not due to a single subhaplogroup (Table 5). Overall, more individuals with the Y-I haplogroup trended toward HAART failure [OR = 2.42 (95% CI 1.29–4.54), Fishers Exact P = 0.009, q = 0.11].

Fig. 2
figure 2

Kaplan–Meier survival curves for viral suppression of I, R, and F* haplogroups in the MACS (European American) cohort subjects. Number of individuals (n) and number of events, Cox proportional hazards model P values (P) and relative hazards (RH) are presented. Haplogroup R is the reference group. P = 0.001 corresponds to false-discovery-rate q = 0.03

Table 5 Slower HIV-1 suppression of Y-I subjects among European American haplogroups

Analyses of African Americans

An initial analysis of susceptibility to infection showed no significant frequency difference when comparing HIV-1-negative and SC subjects in African Americans (P = 0.45; Table 1). A Kaplan–Meier survival analysis for AIDS-1987 outcome did not demonstrate a significant difference between the SCs in seven African American Y haplogroups (df = 6, log-rank = 0.20; Fig. 1). A simplified Kaplan–Meier survival analysis for AIDS-1987 outcome using only the E, R, and I haplogroups also did not demonstrate a significant difference between the three haplogroups (df = 2, log-rank = 0.34; Fig. 1). Furthermore, survival analyses using the Cox proportional hazard model for CD4 < 200, AIDS-1993, AIDS-1987, and time-to-death outcomes did not find a significant progression difference between the E, R, and I haplogroups (Supplementary Table 6). Following modified Cox analyses, adjusting for the effects of ARGs via genetic propensity index values (Supplementary Table 7) and population stratification (Supplementary Table 8) also did not find a significant progression difference between the E, R, and I haplogroups. Further analysis of specific outcomes in African Americans failed to show any significant associations with the examined haplogroups.

Discussion

Our study is the first to examine whether major Y haplogroups have an association with HIV infection and progression to AIDS. We genotyped subjects as one of 11 common Y haplogroups in European Americans and African Americans, and observed faster progression to all four AIDS outcomes in European-American Y-I haplogroup subjects in later years of infection. The early and initial stages of AIDS development indicated by the HIV-1 infection rate and plasma viral load levels, in the I, R, and F* haplogroups were not significantly different from each other. However, the Y-I haplogroup showed significantly longer HIV-1 viral suppression time and higher failure rate in HAART, outcomes that relate to later stages in AIDS progression. Moreover, the significant results after false-discovery-rate corrections for AIDS-1987, AIDS-related death, and HIV-1 viral suppression analyses suggest a potential biological basis for these observations. An independent analysis of AIDS-related illnesses also suggests the increased risk of the Y-I haplogroup in all disease categories. A non-significant opposite trend was observed in African American samples, but results depended on only six individuals. It is hard to draw haplogroup specific conclusions based on the few (<5%) Y-I haplogroup subjects in African Americans.

There are two possible explanations for the Y-I haplogroup effect. First, a locus (or loci) on the Y chromosome is responsible for faster progression to AIDS. As our results do not indicate an infection difference between the haplogroups, this locus might influence the AIDS progression pathway and lead to faster immunosuppression and AIDS outcomes. The relative hazard ratio estimates (1.76–2.84, Table 2) indicate a moderate effect on AIDS progression in the Y-I haplogroup. The alternative explanation for the Y chromosome signal is that we are tracing an autosomal locus in European Americans associated with the Y-I haplogroup. In European Americans, many ARG polymorphisms have been identified and found to influence HIV-1 infection, AIDS progression and mortality (O’Brien and Nelson 2004). A classic example of these protective genes against AIDS infection and progression, the CCR5-Δ32 variant, shows a north to south gradient in European populations (Libert et al. 1998). Similarly, the Y-I haplogroup is found mostly in northern Europe, and its frequency declines toward southern Europe (Rootsi et al. 2004). This observation may suggest that we see an autosomal loci effect rather than a true Y chromosome effect. However, the effect of the Y-I haplogroup in the latter stage of disease progression is independent of the 13 well-studied autosomal AIDS restriction/susceptibility loci considered in these analyses. Moreover, the European American Y-I subhaplogroups and the population stratification analyses fail to show any specific geographical or population substructure basis for the Y-I haplogroup AIDS progression effect.

Our report of the involvement of Y chromosome major haplogroups in HIV progression and HAART outcomes has both strengths, largely elaborated above, and weaknesses. The genotyping of the known major haplogroups from Europe and Africa precisely differentiated relevant Y chromosome major haplogroups, yet failed to differentiate all of the underlying subhaplogroups. Both progression and HAART viral suppression signals were seen for the Y-I haplogroup. Therefore, the five known Y-I subhaplogroups were genotyped, but none of them specifically carried the associations. That these two major and different endpoints were both associated with the Y-I haplogroup suggests that an underlying biological process could be at work. A strict Bonferroni correction for the total 414 tests (325 European-American and 89 African-American Cox proportional hazards and viral load tests) presented in figures and tables yields a significance cutoff of 0.05/414 = 1.2 × 10−4. Only one of the nominal P values across this study was more significant than this stringent correction. However, applying the false-discovery-rate approach, 12 P values (0.0001–0.001) yield q values (Storey and Tibshirani 2003) ≤0.05 (Table 3). Those 12 tests show that the Y-I haplogroup is significantly associated with accelerated AIDS progression, particularly later in infection (after 7–9 years), and with failure to suppress virus during HAART. While these results localize an AIDS progression and HAART treatment signal to the Y chromosome, the underlying causative variant was not identified.

Variation on the human Y chromosome is shaped by mutations ranging from single nucleotide changes to inversions, duplications, and deletions, causing larger structural changes that generate copy number variations (Repping et al. 2006). Some of these structural changes and copy number variations have been linked to diseases such as oligo/azospermia and gonadoblastoma, and shown to be represented frequently at certain Y haplogroups (Krausz et al. 2004; Repping et al. 2003). However, no such mutation or structural change has been documented to be particularly associated with the Y-I haplogroup. The non-recombining nature of the Y chromosome makes the task of discovering the locus involved in HIV progression significantly more difficult than it would be elsewhere in the genome. The Y chromosome can be considered a complete haplotype block; therefore, all the genes on the Y chromosome are positional candidate disease genes, with the primary candidates being those expressed in the immune tissues and involved in the immunological response.

In conclusion, we present a unique association between infectious disease and the Y chromosome haplogroup I (Y-I). The 12 significant false-discovery-rate q values for AIDS progression and HAART outcomes observed in the Y-I haplogroup and the independence of the Y chromosome effect from that of the examined autosomal loci suggest a causal variant at a locus on the Y-I haplogroup. Eleven of these 12 associations are with survival and have strong relative hazards ranging from 2.4 to 6.3. So while a “winners curse” may be inflating these estimates, the consistency of these results points to a significant role for the Y chromosome in human health. Further comprehensive genetic and functional studies of the Y chromosome should speed the discovery of the hidden variation related to HIV-1/AIDS.