Introduction

Prostate cancer is the most commonly diagnosed non-cutaneous cancer in men in the United States with over 200,000 new cases and 30,000 deaths estimated in 2010 (Jemal et al. 2010). Over 35 prostate cancer risk loci have been reported to date from a series of genome-wide association studies (GWAS). One locus maps to a region on chromosome 19q13.33 marked by SNP rs2735839, located ~600 bp downstream of KLK3, the gene that encodes prostate-specific antigen (PSA).

This locus was initially reported in a GWAS of prostate cancer conducted in subjects from the UK and Australia (with a P value of 1.5 × 10−18) (Eeles et al. 2008). Other prostate cancer GWAS efforts, including the Cancer Genetic Markers of Susceptibility (CGEMS) project in the US, did not report a GWAS significant association between SNPs on chromosome 19q13.33 and prostate cancer risk (Gudmundsson et al. 2009; Sun et al. 2009; Thomas et al. 2008). An important difference in study design could account for the divergent results in the GWAS: in the study that reported the prostate cancer association for the KLK3 locus, controls were selected to have low PSA levels (<0.5 ng/ml) (Eeles et al. 2008) whereas the other studies did not select controls based on PSA levels. In a follow-up analysis of the UK GWAS, the signal was observed although not as strongly as in the discovery data set (Kote-Jarai et al. 2008). In a subsequent re-analysis of the CGEMS GWAS (within the Prostate, Lung, Colon Ovarian screening trial, PLCO) an association between rs2735839 and prostate cancer risk was seen only when control individuals were restricted to those with very low PSA levels (Ahn et al. 2008).

In a previous study, we catalogued common SNPs and insertion/deletion polymorphisms (indels) in the KLK3 region by a resequence analysis of a 56 kb region flanking rs2735839 (chr19:56,019,829–56,076,043 bps; NCBI Build 36.3) in 78 unrelated individuals of European ancestry (Parikh et al. 2010). Based on these results, we tagged the genomic region surrounding rs2735839 and genotyped 24 SNPs in five prostate cancer case–control studies from the United States, France, Norway and Finland (1994; Calle et al. 2002; Naess et al. 2008; Prorok et al. 2000; Valeri et al. 2003).

In our analysis, we did not observe a significant association between rs2735839 and prostate cancer risk, but found suggestive evidence for three highly correlated SNPs in the KLK3 gene. Interestingly, the association with these markers was observed only for nonaggressive prostate cancer, defined by a Gleason score lower than 7 and disease stage <III, and not for cases with Gleason score 8 or greater or Stage III and above (also known as advanced disease). These three SNPs are also associated with baseline serum PSA levels. Our results suggest that the observed association between germline variation in the KLK3 gene and nonaggressive prostate cancer, at least in part, may be mediated by the association between the germline variation and PSA levels.

Methods

Sample collections

This study included 3,522 prostate cancer cases and 3,338 control subjects from four case–control studies nested within cohorts and one hospital based case–control study (Supplemental Table 1), previously analyzed in the Cancer Genetics Markers of Susceptibility (CGEMS) study. The cohort studies were: the Prostate, Lung, Colon and Ovarian (PLCO) Cancer Screening Trial (Gohagan et al. 2000; Prorok et al. 2000), the Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study (ATBC) (1994), the American Cancer Society Prevention Study II Nutrition Cohort (CPS-II) (Calle et al. 2002) and the Cohort of Norway (CONOR) (Naess et al. 2008). The case–control study was the French Prostate Case–Control Study (CeRePP) (Valeri et al. 2003). Details of the PLCO, ATBC, CPS-II, CONOR and CeRePP have been previously described elsewhere in CGEMS (Yeager et al. 2009). All studies were approved by the appropriate institutional review boards.

SNP selection, pre-genotyping quality control (QC) and genotyping

Pairwise analysis of linkage disequilibrium identified a set of tagging single-nucleotide polymorphisms (SNPs) using a criteria of a linkage disequilibrium coefficient threshold of r 2 ≥ 0.8 and minor allele frequency (MAF) ≥0.05 based on the resequencing data in the KLK3 region on chr19:56,019,829–56,076,043 (NCBI Build 36.3) (Supplemental Table 2) (Parikh et al. 2010) in the program GLU (http://code.google.com/p/glu-genetics). Twelve tag SNPs, covering an 8 kb region surrounding rs2735839, were selected for genotyping. They capture 90% of common variation (MAF ≥0.05) across the 8 kb region surrounding rs2735839 (r 2 ≥ 0.8) based on the 1,000 Genomes Project data (overall 40 SNPs in the Caucasians from Utah, USA (CEU) population samples, June 2010 release, http://www.1000genomes.org/). An additional 12 SNPs with r 2 > 0.3 with rs2735839 were also included. The program, Tagger, was used to estimate the proportion of common variation across the region captured by the tag SNPs based on a multi-marker tagging approach (de Bakker et al. 2005). The 24 SNPs (Fig. 1; Supplemental Table 3) extend across the entire KLK3 gene and in addition cover 12,907 bps upstream and 3,424 bps downstream of the gene (Parikh et al. 2010).

Fig. 1
figure 1

a Genomic structure of the KLK3 region indicating the location of the 24 SNPs genotyped. SNPs that associate with prostate cancer and/or PSA levels are colored in black (nonaggressive prostate cancer and PSA levels) and in red (only PSA levels). Locations of other SNPs are indicated by arrows. One SNP that was excluded from analysis based on deviation from Hardy–Weinberg equilibrium is shown with a gray arrow. The boundary of the KLK3 gene is indicated with a horizontal arrow. Exons are indicated by boxes and introns are by solid lines. b Linkage disequilibrium (LD) map for the 24 genotyped SNPs in KLK3. Each diamond represents the correlation coefficient r2, between every pair of SNPs. Darker shades represent stronger LD

After pre-genotyping quality control at the Core Genotyping Facility (CGF) of the National Cancer Institute (http://cgf.nci.nih.gov/operations/pregenotyping-qaqc.html), 24 SNPs were genotyped in 8,505 samples using TaqMan genotyping assays (ABI, Foster City, CA, USA). For quality control assessment, 5.2–6.1% of samples from each of the five studies were genotyped in duplicate (a total of 466 samples) with 98.1% genotype concordance overall. When 13 duplicate pairs with low completion rates (<80%) were excluded, the concordance for the remaining 453 duplicate sample increased to 99.6%. One SNP, rs2735839, had previously been genotyped as part of CGEMS for all studies except CONOR. Genotypes for this SNP were retrieved from CGEMS for subjects in PLCO, ATBC, CPS-II and CeRePP and combined with other genotypes before analysis.

Post-genotyping quality control

As per the standard CGF protocol, samples with <80% completion were excluded (n = 612), which increased the overall completion rate to >98% in the combined data set and the locus completion rate to >90%. An additional 90 subjects with <80% estimated CEU admixture were excluded, as well as 159 participants that did not meet eligibility requirements (Supplemental Table 4) based on prior GWAS and dense SNP genotyping (Yeager et al. 2007, 2009). Deviations from Hardy–Weinberg equilibrium (HWE) were tested using an exact test in the control group of each cohort. Significant deviation from HWE (P < 1 × 10−4) was observed for one SNP (CGF_34700) that was subsequently excluded in further analysis. A final genotype data set from 6,860 subjects for 23 SNPs was analyzed. A two group χ 2 test of equal proportions (Newcombe 1998) was performed to determine the differences in MAFs for each of the 23 SNPs analyzed in this study and the CEU population available in the 1,000 Genomes Project data (June 2010 release, http://www.1000genomes.org/). Power was estimated using Quanto under the log-additive model at α = 0.05 [http://hydra.usc.edu/gxe/ (Gauderman 2002)]. This study had >80% power to detect an association with an odds ratio of 1.25 (assuming a MAF of 0.05, a prevalence of prostate cancer of 1.5067% and alpha of 0.05) (http://seer.cancer.gov/csr/1975_2007/).

Statistical analysis

Single locus case–control association analysis was performed using the programs, GLU and PLINK. In order to incorporate prostate cancer stage and grade at diagnosis, we discriminated between nonaggressive (Gleason score <7 AND disease stage <III, n = 1,654), advanced prostate cancer (Gleason score ≥8 OR disease stage ≥III, n = 678) or aggressive prostate cancer (Gleason score ≥7 OR disease stage ≥III, n = 1,475) as defined in CGEMS (Thomas et al. 2008; Yeager et al. 2007). Logistic regression models were adjusted for study (ATBC, CPS-II, CeRePP, PLCO and CONOR), arm (for CeRePP and PLCO), age (in 10-year categories) and two principal components of population structure as identified in the PLCO study (as these two principal components were significant predictors of the disease status identified under null logistic model). Both a two degree of freedom (df) genotype effect model and 1 df trend effect model were tested. We found no evidence of heterogeneity in genetic effects across study using the Q and I 2 statistics.

We analyzed the association between SNPs in KLK3 and serum PSA levels in 702 control subjects and 461 incident prostate cancer cases from the PLCO cohort. PSA measurements at baseline (the first PSA test performed as men entered the PLCO Cancer Screening Trial) were available for this study (Thomas et al. 2008; Ahn et al. 2008). We excluded men diagnosed with prostate cancer in their first year of the PLCO trial (n = 217) as many present with extremely high PSA levels probably due to more advanced disease as compared to men diagnosed later in the trial (range: 0.32–617.80 ng/ml, average 19.11 ng/ml) and control subjects with missing PSA level information (n = 20). The average time between baseline PSA and diagnosis for the cases included in this study was 2.5 years (median 2.0 years). Average PSA levels were 1.53 ng/ml (median 1.11) in control subjects; 8.94 ng/ml (median 4.12) in all cases; and 4.23 ng/ml (median 3.43) in cases diagnosed after the first year of the PLCO trial. For each desired SNP, we defined a group of individuals without a minor allele (G = 0) and with at least one minor allele (G = 1) and estimated the mean baseline PSA in these two groups. To minimize the influence of the larger values, we also created a truncated PSA level for control subjects (at 4 ng/ml) and cases (at 8 ng/ml) and performed a similar analysis. For a formal statistical test, we compared PSA and genotype (G), using a general linear model with a gamma distribution and a log link function. The results were essentially equivalent to those obtained by performing a linear model on the log-transformed PSA level. We adjusted for center, age group and population stratification. Only one principal component, estimated to adjust for population structure, was a significant predictor of PSA level under null linear model based on prior analyses in CGEMS (Yeager et al. 2007).

Imputation

We imputed genotypes for an approximately 56 kb region on chromosome 19q33.13 (chr19:56,019,829–56,076,043 bps; NCBI Build 36.3) for 2,303 samples (1,179 cases and 1,124 control subjects) from the PLCO Cancer Screening Trial using IMPUTEv2 (Howie et al. 2009). The PLCO subjects were genotyped as part of the CGEMS prostate cancer whole-genome scan (Phase 1A with Illumina HumanHap300 and Phase 1B with Illumina HumanHap240 assays, Illumina Corp., San Diego, CA, USA) (Thomas et al. 2008; Yeager et al. 2007). Other studies that were part of this work did not have whole-genome SNP data and were not used for imputation. We used 120 estimated CEU haplotypes from the June 2010 release of the 1,000 Genomes Pilot 1 data (Durbin et al. 2010) as well as 298 estimated CEU and Tuscans in Italy (TSI) haplotypes from the HapMap3 project to carry out genotype imputation (Altshuler et al. 2010). Association analysis was conducted using SNPTESTv2 (Marchini et al. 2007) with logistic regression models (1 df trend test) adjusted for age (in 10-year categories) and two principal components of genetic structure as identified in the PLCO study (Yeager et al. 2007).

Bioinformatic analysis of a KLK3 coding variant

Four computational tools were used to assess the predictive value of a non-synonymous substitution caused by rs17632542 on the KLK3 protein. Polyphen-2 (Polymorphism Phenotyping) (Ramensky et al. 2002), SIFT (Sorting Intolerant From Tolerant) (Ng and Henikoff 2001), SNPs&GO (SNPs and Gene Ontology) (Calabrese et al. 2009) and Prophyler (Protein Phylogeny and Evolutionary Rates) (Binkley et al. 2010) predict functionally relevant disease-related mutations based on one or more of the following: sequence conservation, steric and biochemical properties of amino acid residues and protein functional annotation (SNPs&GO only).

Results

A total of 3,522 prostate cancer cases and 3,338 control subjects from five case–control studies were analyzed in the current study (Supplemental Table 1), including 1,654 men diagnosed with nonaggressive prostate cancer (Gleason score <7 and disease stage <III) and 678 men with advanced prostate cancer (Gleason score ≥8 or stage ≥III). We also analyzed men with aggressive prostate cancer (n = 1,475) by including cases with a Gleason score of 7 or higher (Gleason score ≥7 or stage ≥III).

SNP selection and genotyping

Twenty-four SNPs in a 22 kb region surrounding rs2735839, the initial marker found to be associated with prostate cancer risk, were genotyped (physical locations shown in Fig. 1a). Twelve of these SNPs were selected to tag the genomic region of KLK3 at a linkage disequilibrium (LD) coefficient threshold of r 2 ≥ 0.8 and a minor allele frequency (MAF) ≥0.05. They capture 90% of common variation (r 2 ≥ 0.8 and MAF ≥0.05) across the 8 kb region surrounding rs2735839 based on the 1,000 Genomes Project data. An additional 12 SNPs with an r 2 > 0.3 with rs2735839 were also genotyped to capture markers in both modest and high LD with this SNP. One SNP was excluded from further analysis because of substantial deviation from Hardy–Weinberg equilibrium. No significant differences were observed between minor allele frequencies of SNPs in control subjects of the current study and those of the CEU population from the 1,000 Genomes Project (Supplemental Table 3). The linkage disequilibrium structure across the KLK3 locus in control subjects (Fig. 1b) reflects the relatively low degree of LD seen in our resequence analysis (Parikh et al. 2010).

Association of SNPs in KLK3 with prostate cancer

Four SNPs were associated with overall prostate cancer at a significance level of P < 1 × 10−3 (Table 1). The most significant association was with rs17632542 (P = 3.41 × 10−4, per-allele trend OR 0.77, 95% CI 0.67–0.89; unconstrained heterozygote OR (ORhet) 0.80, 95% CI 0.69–0.93 and homozygote OR (ORhom) 0.37, 95% CI 0.16–0.86). Two additional SNPs, both highly correlated to rs17632542 (r 2 ≥ 0.97), showed a similar level of significance: rs62113214 (P = 3.57 × 10−4) and rs62113212 (P = 1.17 × 10−3). No other SNPs were nominally significant after adjusting for any of the three SNPs (data not shown). The marker initially reported to be associated with prostate cancer risk in a GWAS (Eeles et al. 2008) was not significant in our study (rs2735839, P = 0.197).

Table 1 Association results for all prostate cancer cases as compared to control subjects for the 23 SNPs in the KLK3 region

When the analysis was stratified by disease severity, the association was observed only with nonaggressive prostate cancers (Gleason score <7 and disease stage <III) for the three highly correlated SNPs (Table 2). SNP rs62113214 had the lowest P value (P = 4.72 × 10−5, per-allele trend OR 0.68, 95% CI 0.57–0.82; unconstrained heterozygote OR (ORhet) 0.70, 95% CI 0.57–0.85 and homozygote OR (ORhom) 0.34, 95% CI 0.11–1.04). The two highly correlated SNPs rs17632542 (P = 5.49 × 10−5) and rs62113212 (P = 5.74 × 10−5) showed similar results. SNP rs2735839 was marginally associated with nonaggressive prostate cancer (P = 0.031). No SNPs were associated with advanced prostate cancer (P > 0.05) as defined by a Gleason score ≥8 or disease stage ≥III (Table 3) or with aggressive prostate cancer (P > 0.05) as defined by a Gleason score ≥7 or stage ≥III when compared to control subjects (Supplemental Table 5). When nonaggressive cases were compared to advanced cases a significant difference was seen for the same three SNPs (rs62113212: P = 3.23 × 10−5, per-allele trend OR 0.59, 95% CI 0.46–0.76) indicating that this signal could be related to the less aggressive and often latent forms of prostate cancer, which are diagnosed partly as a consequence of PSA screening (Supplemental Table 6).

Table 2 Association results for nonaggressive prostate cancer (Gleason score <7 and disease stage <III) as compared to control subjects for the three highly correlated SNPs and rs2735839
Table 3 Association results for advanced prostate cancer (Gleason score ≥8 or disease stage ≥III) as compared to control subjects for the three highly correlated SNPs and rs2735839

KLK3 SNPs and serum PSA levels

The PSA test is commonly used to screen for prostate cancer and has been shown to increase detection of early, nonaggressive prostate cancer. We assessed baseline PSA levels according to different genotypes at the three highly correlated SNPs: rs17632542, rs62113212 and rs62113214, as well as at rs2735839. Because PSA information was only available in the PLCO cohort, we performed the analysis in this subset (702 control subjects and 461 cases). Baseline PSA levels differed significantly (Table 4) between control individuals without or with one or more minor alleles at any one of the three highly correlated SNPs. Mean PSA levels were 1.61 ng/ml (95% CI 1.49–1.72) in men without a minor allele, and 1.12 ng/ml (95% CI 0.96–1.28) in men with at least one minor allele at rs62113212. The significance of the differences was confirmed by a general linear model (P = 9.7 × 10−5). The box plot (Fig. 2a) shows that the entire distribution of PSA levels was shifted to lower values in the group with the minor allele (G = 1). Because of the high correlation with rs62113212, the results for rs17632542 and rs62113214 were nearly identical (Table 4). After truncating PSA at 4 ng/ml in control subjects, to limit the influence of larger values, we still found the mean PSA levels for the two groups to be significantly different [1.49 ng/ml (95% CI 1.41–1.58) vs. 1.11 ng/ml (95% CI 0.96–1.26), P = 1.5 × 10−5 for rs62113212].

Table 4 Association between the three highly correlated SNPs as well as rs2735839 and baseline PSA levels in the PLCO cohort
Fig. 2
figure 2

Box plots showing PSA levels (ng/ml) at baseline in control subjects from the PLCO study for carriers (G = 1) and noncarriers (G = 0) of minor alleles at rs62113212 (a) and rs2735839 (b). Median values are indicated by a horizontal line and 25 and 75% are at the bottom and top of the boxes

PSA levels were also lower in men carrying one or more minor alleles at rs2735839 (1.28 ng/ml for carriers of at least one minor allele compared to 1.64 ng/ml) as shown in Table 4 and Fig. 2b. Although not statistically significant, cases with at least one minor allele at any of the four SNPs also had lower PSA levels than their respective counterparts.

Genotype imputation

Imputation was performed in the PLCO GWAS data set genotyped within the Cancer Genetic Markers of Susceptibility (CGEMS) study (Thomas et al. 2008; Yeager et al. 2007) utilizing the publicly available HapMap and 1,000 genomes pilot project data sets (June 2010 release). This added 239 imputed SNPs to the 27 SNPs already genotyped as part of the CGEMS study (Thomas et al. 2008; Yeager et al. 2007) in the 56 kb region (chr19: 56,020,000–56,076,000) for a combined data set of 266 SNPs in 1,179 cases and 1,124 control subjects. We did not observe a stronger association (P < 1 × 10−4) for either prostate cancer overall or specifically for nonaggressive prostate cancer using imputed genotype data (Supplemental Tables 7 and 8). However, the imputation was performed in a subset of the cohorts (approximately 20% of subjects).

Prediction of functional effects of rs17632542 on KLK3

The three SNPs that most significantly associated with prostate cancer risk in our study: rs62113212, rs62113214 and rs17632542 are all located in the KLK3 gene (Fig. 1a). The first two are located in introns 2 and 4, respectively, whereas rs17632542 lies within exon 4. The minor allele (C) of rs17632542 causes a non-synonymous amino acid change from isoleucine (hydrophobic) to threonine (polar) at position 179 (Ile179Thr) in KLK3. This amino acid is conserved in humans, chimpanzee and rhesus but not in other mammals or vertebrates. The amino acid change seems to have deleterious effects based on Sorting Intolerant from Tolerant (SIFT) analysis with a SIFT score of 0.03. A benign impact is predicted by Polyphen-2 and a moderate impact is predicted by ProPhylER (multivariate analysis of protein polymorphism (MAPP) P value of 0.0058). Finally, when taking functional annotation into account (SNPs&Go) a neutral substitution is predicted (reliability index 5 out of 10).

Discussion

Here we describe a comprehensive fine mapping effort in the KLK3 gene based on resequence data across a region of chr19q13.33, surrounding rs2735839, a SNP previously associated with prostate cancer (Eeles et al. 2008; Kote-Jarai et al. 2008) and/or serum prostate-specific antigen (PSA) levels (Ahn et al. 2008). Based on a comprehensive catalog of common variation in a 56 kb region surrounding KLK3 by deep resequencing (Parikh et al. 2010) we now report the follow-up genotyping of tag SNPs in close to 7,000 subjects drawn from five prostate cancer studies. We did not observe significant association between rs2735839 and overall prostate cancer risk. However, three highly correlated, neighboring SNPs in KLK3 were associated with overall prostate cancer risk. This signal was detected only in association with nonaggressive prostate cancer and was not present in men with advanced prostate cancer. The significance level for these three SNPs is modest and did not reach the threshold for genome-wide significance in GWAS (2007). Nevertheless, our findings suggest that these three SNPs are associated with either developing or being diagnosed with nonaggressive prostate cancer, potentially due to differential case identification related to PSA level. Two of the SNPs are intronic but the third causes a non-synonymous amino acid substitution in the KLK3 protein albeit with minimal predicted shift in function.

A recent report did not find a significant association between SNPs in KLK genes and prostate cancer risk in a Swedish population-based case–control study comprising 1,419 prostate cancer cases and 736 controls. This study analyzed SNPs in coding and promoter regions of all 15 kallikrein genes and included both rs17632542 and rs2735839 (but not rs62113212 or rs62113214) (Klein et al. 2010). We did not observe additional signals in KLK3 or in the neighboring KLK2 and KLK15 genes by imputation.

We also observed that baseline PSA levels were lower in control subjects carrying one or more minor alleles at the three highly correlated SNPs. This may indicate that one or more of these alleles directly cause a reduction in serum PSA levels, possibly through regulatory effects (on transcription of the gene), through altered protein stability or reduced detection of serum PSA. However, since many factors could play a role in regulating serum PSA levels this question still remains open and requires validation and further investigations. Reduced PSA levels were also noted in carriers of minor alleles at rs2735839 although the effect was less pronounced.

The implications of our study directly pertain to the relationship between germline KLK3 variants, serum PSA levels and detection of prostate cancer through PSA screening. Since the introduction of the PSA test as a screening tool in the US in the late 1980s, reported prostate cancer incidence rates have risen sharply (Welch and Albertsen 2009), especially for early stage disease and younger individuals (Shao et al. 2009). A significant number of men are likely to be ‘over-diagnosed’ as a result of PSA screening, where patients with clinically insignificant cancers that are unlikely to progress within their lifetime are diagnosed and treated, often with grave adverse effects on quality of life (Welch and Albertsen 2009). Two large randomized screening trials are now ongoing to assess the benefits of the PSA test for prostate cancer screening and reduction of mortality. Although interim results have been published, it is not clear if the test reduces mortality from prostate cancer (Andriole et al. 2009; Schroder et al. 2009).

We have fine mapped a challenging association signal in the KLK3 gene to three highly correlated SNPs centromeric to the original signal reported by Eeles et al. (2008). Since we observed that the same three SNPs were also significantly associated with baseline PSA levels in the prospective PLCO cohort, we interpret our results to suggest that the effect may perhaps be a consequence of PSA screening as well. It is possible that the KLK3 locus also contributes to prostate cancer risk. Further studies are needed to dissect the contribution of genetic variation in KLK3 to PSA levels and prostate risk separately in an effort to elucidate the possibility of pleotropic effects of the KLK3 locus.

URLs

http://cgf.nci.nih.gov/operations/pregenotyping-qaqc.html

http://code.google.com/p/glu-genetics/

http://pngu.mgh.harvard.edu/~purcell/plink/

http://www.1000genomes.org/