Introduction

Molecular and genetic studies demonstrate that breast cancer is a heterogeneous disease. Several classifiers are available for distinguishing tumor types based on prognosis and prediction of response to chemotherapy and hormonal therapy [13]. Molecular features are associated with substantially different outcomes [4] and with wide variability in response to standard therapies [5, 6]. Symptomatic tumors that tend to be large and palpable on presentation have substantially higher risk of recurrence than tumors detected by screening [7]. For these larger tumors, neoadjuvant, or preoperative, chemotherapy makes it possible to assess response to treatment and may provide insights to the tumor’s biology. Studies examining the degree to which pathologic complete response (pCR) to therapy is predictive of recurrence-free survival (RFS) or overall survival (OS) have given mixed results in relatively unselected populations [812].

The I-SPY 1 TRIAL (investigation of serial studies to predict your therapeutic response with imaging and molecular analysis) is a multicenter neoadjuvant study of women with histologically confirmed invasive breast cancers. This report describes associations between molecular markers assessed in pretreatment tumor biopsy samples and response to neoadjuvant chemotherapy at the time of surgery, longer-term disease outcomes, and the relationship between response and RFS.

Methods

Study design and patient selection

The I-SPY 1 TRIAL methods have been described in detail elsewhere [13, 14] and was a collaboration of the American College of Radiology Imaging Network (ACRIN), Cancer and Leukemia Group B (CALGB), and Specialized programs of research excellence (SPORE). All patients gave written consent and had histologically confirmed invasive breast cancers measuring at least 3 cm by clinical examination or imaging, with no evidence of distant metastatic disease. Patients’ clinical stage 1 by exam was eligible if tumor size was >3 cm by imaging. Patients with T4 or inflammatory disease were eligible. The regimen of neoadjuvant chemotherapy included an initial anthracycline-based regimen after which patients either underwent surgery or received a taxane-based regimen prior to surgery.

Assays were conducted in nine laboratories. Data was integrated for central accession for analysis using NCICB’s caINTEGRATOR application (https://caintegrator-stage.nci.nih.gov/ispy/index2.jsp)—I-SPY 1 data version dated February 2011.

Standard pathology biomarkers

Hormone and HER2 receptor expression were measured from pretreatment core biopsies. Estrogen and progesterone receptor status were determined by immunohistochemistry (IHC) and calculation of Allred scores [15] at the study sites. HER2 status was determined locally by IHC and/or fluorescence-in situ hybridization assays (FISH). HER2 testing (IHC and FISH) was also performed centrally at the University of North Carolina (UNC) [13, 16]. HER2 status was considered positive if either local or central assays were positive. Ki67 was recorded as low (<10%), medium (10–20%), or high (>20%) and described in detail in supplemental methods [17].

Evaluation of pathologic response

pCR is defined as no invasive tumor present in either breast or axillary lymph nodes. Residual Cancer Burden (RCB) [9] was assessed and included the primary tumor bed area, overall invasive cancer and in situ disease cellularity, number of positive lymph nodes and diameter of largest metastasis. I-SPY 1 TRIAL pathologists were centrally trained; all cases were re-reviewed and scored for RCB as a dichotomous outcome (0, I vs. II, III) and by class (0, I, II, III). Data was recorded using NCI’s Oracle Clinical Remote Data Capture version 4.5 electronic database.

RNA analysis

Tissue samples immediately frozen in OCT were assayed on catalog 44,000 feature Agilent Human oligonucleotide microarrays (catalog # G4112F). Total RNA purification and microarray hybridization were done as previously described [18]. The background was subtracted and Lowess normalized log2 ratio of Cy3 and Cy5 intensity values were calculated [19]. The primary microarray data presented in this study are available in the GEO database under accession number GSE22226.

Intrinsic subtype classification was determined by PAM50 50-gene assay as described [18]. The risk of recurrence score (ROR-S) classified patients as having high, medium, or low risk of relapse using predefined cut-points as described previously [19].

The 70-gene prognostic profile was determined using representative probes and data normalization as previously described [20]. This profile classified patients as having high or low risk of relapse using the predefined threshold [20, 21].

Wound healing signature [22] was used to classify tumors as quiescent or activated. A gene-expression signature predictive of p53 genotype [23] was used to classify tumors as p53 wild type or mutated.

DNA analysis

DNA copy number abnormality was assessed by a molecular inversion probe (MIP) platform with focal amplification and high resolution (~10 K bp) as previously described [2426]. Direct p53 genotyping was performed and mutations were detected by the Roche p53 AmpliChip beta test array [27, 28] and a combination of two approaches described in supplemental methods.

Statistical analysis

The primary endpoint for the trial was RFS according to the STEEP criteria [29]. Time to recurrence was computed from start of treatment; and RFS at 3 years was determined by Kaplan–Meier analysis.

Associations of signature classifications with pCR and RFS were assessed by logistic regression and proportional hazards modeling, respectively. Association of pCR with RFS within each risk subset was also tested. These analyses were conducted using JMP Version 8.0.1, SAS Institute Inc.

We took HR/HER2 categories to be standard and addressed the ability of other signatures to predict RFS and pCR assuming HR/HER2 category is given. We used multivariate Cox regression of RFS on molecular signature classification, adjusting for the predefined contribution of HR and HER2 (as derived from a Cox model fit of RFS on HR and HER2). Similarly, we employed multivariate logistic regression to evaluate whether other molecular signature classifications were independently predictive pCR when given HR/HER2. These analyses were conducted using Bioconductor R [30].

Differences in rates of pCR and rates of RCB class 0 or 1 within molecular signature-defined subgroups were assessed by χ2 tests.

Patient were categorized as low- or high-risk by each molecular signature HR+/HER2− versus others, luminal (A and B) versus others (HER2-enriched, basal, and normal-like) [31], p53 wild type versus mutation, 70-gene prognosis signature low versus high [20], and wound-quiescent versus wound-activated [22], and stage 3 (including inflammatory) versus earlier stage to ensure comparable degrees of freedom. The pairwise concordances of the molecular signatures were compared by Kendall’s rank correlation and Fisher’s exact test.

All statistical analyses were performed (CY) and verified by a second statistician (DB) to confirm the results.

Results

Patient characteristics

Between May 2002 and March 2006, 237 patients were enrolled at nine institutions as described previously [13]. As shown in the CONSORT diagram (Fig. 1), 221 patients received an anthracycline as initial neoadjuvant chemotherapy, with 95% receiving a taxane, and were considered evaluable. Of these, 215 (97%) underwent surgical resection and had pathologic data available for analysis; IHC receptors were available for 210 and RCB was available for 196 (93%). We attempted microarrays on all 210 patients but could generate high quality gene-expression arrays for just 149 (Fig. 1). In addition, 171 patients had p53 gene mutation chip data and 153 had copy number variation by MIP arrays. Patients with available gene-expression arrays, who did and did not receive trastuzumab, were not significantly different from the overall cohort (Table 1).

Fig. 1
figure 1

CONSORT diagram: patients available for analysis. Of the 237 patients enrolled in the study, 16 patients withdrew. Of the 221 patients available for analysis, six decided not to undergo surgery after completing neoadjuvant chemotherapy, leaving 215 patients available for pathologic response analysis

Table 1 Demographics and characteristics of patients in the I-SPY 1 TRIAL

Among the 215 patients, 20 of 67 (30%) HER2+ patients received neoadjuvant trastuzumab, as previously described [13]. Of the 46 HER2+ patients who did not, 17 (36%) received adjuvant trastuzumab. Radiation and hormonal adjuvant therapy were also given at physician discretion as clinically indicated (Table 1). Analysis of RFS was limited to patients who did not receive trastuzumab.

Most (65%) patients had clinically or pathologically confirmed axillary lymph node involvement at diagnosis and 90% had tumors of intermediate or high histologic grade. Median follow-up for survival and RFS was 3.9 years.

Early outcomes: residual disease measured at the time of surgical resection

The overall rate of pCR was 27%, and the rate of RCB scores of 0 or I was 37%.

As shown in Table 2, the response to therapy varied considerably by marker subset. For the subsets defined by clinical assay (IHC markers), pCR rates were lowest for the HR+/HER2− subset (9%) and highest for the HR−/HER2+ subset (54%). Of the PAM50 intrinsic subtypes, pCR rates were lowest for luminal A (3%) and highest for HER2-enriched population (50%). For Ki67, the pCR rates varied from 5% for the low group to 35% for the high group. Other biomarkers were associated with high rates of pCR, including p53 null mutations by Gene Chip (47%) and amplification at 17q as measured by MIP array (45%).

Table 2 Distribution and response rates by molecular phenotypes and profiles

The rates of pCR and RCB scores of 0 or I for four gene-expression prognostic classifiers—70-gene prognosis signature, ROR-S, wound healing, and p53 mutation signatures—were all low (Table 2). In this population of patients treated with neoadjuvant chemotherapy, a minority of patients had good prognosis profiles: 9% were classified as 70-gene low risk, 27% as ROR-S low, 25% as wound healing quiescent, and 49% as p53 wild type. The respective pCR rates were 0, 6, 7, and 9%. Rates of pCR were higher for poor prognosis signatures, including 70-gene high risk (24%), wound healing signature activated (26%), ROR-S moderate risk (17%) and high risk (36%) and p53 mutation predicted by expression profile (34%). Clinical outcomes were better when pCR or an RCB score of 0 or 1 was achieved.

Recurrence-free and OS

Three-year RFS and OS for the entire cohort were 78 and 85%, respectively. When RFS for the population was stratified by molecular signatures, outcomes by subtype differed substantially (Table 2).

We dichotomized the classifiers, using low versus medium and high for the analyses shown in Tables 3 and 4. Although developed using heterogeneous methods and populations, dichotomized tumor classifiers and signatures were highly correlated, with concordances generally >70% (Table 3). As shown in Table 4, for the breast cancer subtypes known to be associated with a better outcome (e.g., luminal, 70-gene low risk, wound healing quiescent, and p53 wild type), the RFS was relatively high for patients who did not achieve pCR, but pCR was associated with a better outcome, regardless of subtype. Poor prognosis tumors tend to be more sensitive to chemotherapy, and thus have a higher rate of pCR. The hazard ratio for the prediction of pCR versus not was better within the dichotomized molecular classifications than for the population as a whole. Associations between RFS and pCR differed the most for HR+/HER2− versus not (hazard ratio 0.17, [95% CI 0.04–0.51]), which was also where rates of pCR differed the most (9 vs. 38%). For the dichotomized breast cancer molecular classifiers (Table 4), the difference between highest and lowest rates of pCR varied from 19 to 29%.

Table 3 Correlations among the molecular signatures
Table 4 Pathological complete response and recurrence-free survival by molecular subtypes

In a multivariate model, the factors that added to HR and HER2 in predicting RFS were clinical stage (stage 3 vs. not), wound healing (activated vs. quiescent), ROR-S, and predicted p53 mutation signature (Table 5). The only factor that added to HR and HER2 in predicting pCR was Ki67 low and medium versus high (data not shown). The analysis shown was performed using the dichotomized groupings for the molecular signatures, but the results were qualitatively the same when the original groupings were used (data not shown).

Table 5 Univariate vs. multivariate Cox analyses adjusting for predefined HR/HER2 contribution

When receptor status and pCR were fixed in a multivariate model, most molecular signatures and clinical stage improved the ability to predict RFS (data not shown). The most likely reason is because the molecular classifiers can identify additional patients in the “no pCR” group who have excellent outcomes, the majority of whom are in the HR+/HER2− receptor subtypes. The Kaplan–Meier plots for the hormone receptor positive patients without a pCR, stratified by four of the molecular signatures, and by clinical stage, showed that the low risk classification identified patients with a much better outcome than the high risk classification (Fig. 2).

Fig. 2
figure 2

Stratification, by molecular classifier, of the hormone receptor positive HER2 negative subgroup that did not achieve a pathologic complete response. The patients in the HR+/HER2− subgroup that did not achieve a pathologic complete response are stratified by the molecular subtypes as shown: a 70-gene prognosis profile (Blue line low risk/gold line high risk); b wound healing signature (Blue line quiescent; gold line activated); c risk of relapse subtype score (ROR-S) (Blue line low risk/Gold line medium and high risk); d p53 predicted mutation (Blue line predicted wild type/Gold line predicted mutation); e clinical stage (Blue line clinical stage 2/Gold line clinical stage 3). Stratification of the “no pCR” HR+/HER2− patient group by molecular signatures and clinical stage

A visual representation or unsupervised cluster of the clinical features of the patients in I-SPY 1 who did not receive trastuzumab is shown in Fig. 3. Like Table 3, this figure illustrates the high degree of overlap among clinical variables and molecular signatures. There is a cluster of tumors with many high risk features illustrated by the cluster in red in the lower left, some with pCR and good outcomes and without pCR and poor outcomes, but the molecular features of these tumors are the same and do not appear to provide information to discriminate between good and poor outcomes.

Fig. 3
figure 3

Heat map by molecular features and outcomes for all patients

Discussion

The I-SPY 1 collaboration demonstrates that standards for imaging, data and tissue collection can be followed and molecular profiling from small specimens is achievable. Molecular profiles were generated for over 65% of all patients (improving as the trial proceeded), and these patients are representative of the entire data set.

Patients who present with large breast tumors, as exemplified by the I-SPY 1 cohort, have biologically poor-risk cancers, as evidenced by 91% having 70-gene high risk profile and the fact that many are interval cancers [32]. Even within this clinically high risk population, response to therapy was heterogeneous. HER2 positivity and HR negativity were associated with a greater rate of pCR, as were four poor prognosis molecular signatures: wound-activated signature, ROR-S high risk, 70-gene poor-risk, and p53 predicted mutation. Patients with good prognosis signatures had a lower chance of short-term (pCR, RCB) response to chemotherapy, but had better long-term (RFS, OS) outcomes, even when their tumors did not respond to therapy. These findings support the emerging consensus that patients with good risk signatures (wound healing quiescent, 70-gene low, and ROR-S low) have low rates of early recurrence in spite of large tumor size. The molecular profiles vary by the percent of the population they classify as low risk, the fraction that respond to therapy, and the outcomes among those without pCR, even though the data set was not sufficiently large to show a statistical difference.

The International Breast Cancer Study Group (IBCSG), NSABP, and MD Anderson Cancer Center [33] have found that pCR rates are much higher in patients with HR-negative tumors than in those with HR-positive tumors. These observations are consistent with our results and with adjuvant studies that show patients with HR-negative disease benefit more from chemotherapy [34] than do patients with HR-positive disease.

Molecular profiles may provide the opportunity to identify, beyond HR and HER2 status, what might be driving tumor behavior and outcomes. In a multivariate model, when receptor types were fixed, the factors that added to RFS included clinical stage, wound healing signature, ROR-S, and p53 predicted mutation. When pCR was also fixed, most of the dichotomized molecular markers added some additional predictive value, likely because of the ability to identify patients in the “no pCR” group who have excellent outcomes, largely the HR+/HER2− subgroup, though not exclusively. Given that the low proliferative HR+ subset is at risk for late recurrence, longer follow-up and additional studies will be required to validate this observation.

Molecular signatures are currently being used to identify low risk patients who are less likely to benefit from chemotherapy regardless of nodal status [35] or in the setting of HR+ node-negative disease [36]. Such patients have been shown to have low rates of response to chemotherapy and very low rates of early recurrence [37]. Confirmation of chemotherapy benefit in molecularly low risk patients will be forthcoming from the TAILORx [38] and MINDACT [39] trials. In the follow-on I-SPY 2 TRIAL, an adaptive-design neoadjuvant trial to test the ability of phase 2 agents in combination with chemotherapy to increase pCR, 70-gene low risk, HR-positive and HER2-negative patients are being excluded from randomization. In I-SPY 1, none of the 11 patients with a 70-gene prognosis profile had a pCR or a recurrence (Fig. 2).

In the I-SPY cohort, the wound healing signature identified the largest fraction of low risk patients (based on RFS) of any signature. The genes consistent with an activated wound environment characterize women with poor outcomes, in keeping with increasing evidence that supports targeting the inflammatory pathway in high risk cancers [40] and breast cancer in particular [41, 42]. The activated wound healing signature is associated with poor outcomes across multiple tumor types and may well reflect the importance of the microenvironment in tumor behavior.

Although pCR and RCB are very predictive of RFS among the poor prognosis molecular profiles, the profiles do not predict an individual patient’s response to standard chemotherapy. A substantial fraction of tumors with the highest risk features have a complete response to therapy and do well, while others with that same signature have a poor response and poor outcome. Ongoing analysis is focusing on the I-SPY 1 patients who did not have a complete response to therapy and had early recurrence, using the described biomarkers as well as phosphoprotein profiles, to explore targets for future therapeutic intervention.

Our study is limited by the short follow-up time. Patients with HR-positive tumors continue to be at risk for recurrence for many years, and early recurrence data may not reflect the overall outcome [34]. However, in this select group of patients where almost all patients had grade 2 or 3 disease, recurrence risk is likely to be concentrated in the first 5 years [43]. The Oxford Overview Analysis of the early breast cancer trials strongly suggests that the benefit of chemotherapy is reflected by distant disease-free survival at 5 years, where the survival curves for patients with chemotherapy versus not initially diverge but are then parallel, so any survival benefit from chemotherapy is likely to be manifest in the first 5 years [44]. The median follow-up period of 3.9 years in the I-SPY cohort should reflect the benefits in HER-positive and triple negative disease, where the risk of recurrence is early [13].

Molecular and biological heterogeneity were substantial even within the high risk group of patients in the I-SPY TRIAL. In this patient cohort, HR and HER2 status were the most predictive of pCR, but the molecular signatures add to the ability of the receptors to predict RFS. The task that remains is to use current and emerging markers to identify optimal biological subsets for new therapeutic agents. Importantly, molecular marker data should be collected routinely in trials so that markers and imaging that are early predictors of outcome can be related to the target endpoint of RFS [45]. The I-SPY 1 database, with its rich resource of genomic and protein expression data, is an important resource to explore emerging and new biomarkers associated with resistance and response to standard therapy.