Main

The crude incidence of breast cancer (BC) in Europe is 109.8/100.000 women per year and it is responsible for 38.4 out of 100.000 deaths per women annually (Pestalozzi et al, 2005). Significant improvements in both disease-free survival (DFS) and overall survival (OS) have been obtained with the extensive use of adjuvant systemic therapies (EBCTCG, 2005). In the last few decades, proliferation markers have been extensively evaluated as prognostic tools in BC. However, the only prognostic factors utilised in clinical decision making are some histologic features (e.g. tumour size, histologic grade, nodal status and lymphovascular invasion), hormone receptor status, HER-2 status and age (Colozza et al, 2005; Hayes, 2005).

Ki-67 is present in all proliferating cells and there is great interest in its role as a marker of proliferation (Gerdes et al, 1983). The Ki-67 antibody reacts with a nuclear non-histone protein of 395 KD present in all active phases of the cell cycle except the G0 phase (Cattoretti et al, 1992). MIB-1 is a monoclonal antibody against recombinant parts of the Ki-67 antigen; a good correlation exists between Ki-67 and MIB-1 (Cattoretti et al, 1992).

Recently, gene array techniques have revealed the Ki-67 gene's role in several ‘proliferation signatures’, showing that a set of genes with increased expression patterns is correlated with tumour cell proliferation rates, as assessed by the Ki-67 labelling index (Perou et al, 1999; Whitfield et al, 2006). Moreover, Ki-67 is one of the 21 prospectively selected genes of the Oncotype DXTM assay used to predict the risk of recurrence in a node-negative, tamoxifen-treated BC population enrolled in the National Surgical Adjuvant Breast and Bowel Project B-14 (NSABP B-14), as well to predict the magnitude of chemotherapy benefit in women with node-negative, estrogen receptor (ER)-positive BC enrolled in the NSABP B20 trial (Paik et al, 2004, 2006).

Despite the large number of published papers analyzing the prognostic role of Ki-67 in early BC, it is still not considered as an established factor to be used in clinical practice, probably because most of the studies are retrospective and because some uncertainty remains on the way Ki-67 should be assessed (Eifel et al, 2001; Goldhirsch et al, 2003; Colozza et al, 2005; Urruticoechea et al, 2005). Therefore, due to the fact that a more convincing demonstration of the Ki-67 prognostic role, in early BC, would be of value for initiating further research on the assessment methods of Ki-67, we performed this literature-based meta-analysis to better quantify the prognostic impact of Ki-67 expression.

Materials and methods

Publication selection

For this meta-analysis, we selected studies evaluating the relationship between Ki-67/MIB-1 status and prognosis in early BC published until May 2006. To fulfill our selection criteria, the studies had to have been published as a full paper in English. Articles were identified by an electronic PubMed search using the following keywords: ‘breast cancer’,‘Ki-67’,‘MIB-1’,‘proliferative index’, ‘proliferative marker’, ‘survival’ and ‘prognostic’. We also screened references from the relevant literature, including all the identified studies and reviews. To avoid duplicate data, we identified articles that included the same cohort of patients by reviewing interstudy similarity in the country in which the study was performed, investigators in the study, source of patients, recruitment period and inclusion criteria. Therefore, when the authors reported the same patient population in several publications, only the most recent or complete study was included in this analysis.

Data extraction

Information was carefully extracted from all publications by three authors (EA, GC and MP). The following data were collected from each study: publication date, first author's last name, antibody and cut-off used for assessing Ki-67 positivity, distribution of Ki-67 status, follow-up period, treatment, nodal status and data allowing us to estimate the impact of Ki-67 expression on DFS and/or OS.

We did not define any minimal number of patients to include a study in our meta-analysis, nor a minimal duration of median follow-up. The exclusion criteria are described below and were not driven by the study individual results.

Statistical methods

Ki-67 was considered positive or negative according to the cut-off values provided by the authors. For the quantitative aggregation of the survival results, the impact of Ki-67 expression on prognosis was measured using Hazard Ratio (HR). For each study, this HR was estimated by a method depending on the results provided in the original publication. The most accurate method was to retrieve the estimated HR and its variance using two of the following parameters: the HR point estimate, the log-rank statistic or its P-value, and the O–E statistic (difference between numbers of observed and expected events) or its variance. If those data were not available, we looked for the total number of events, the number of patients at risk in each group and the log-rank statistic or its P-value, to estimate the HR. Finally, if the only useful data were in the form of graphical representations of the survival distributions, we extracted from them the survival rates at specified time-points in order to reconstruct the HR estimate and its variance, with the assumption that the rate of patients censored was constant during the study follow-up (Parmar et al, 1998).

Three independent persons read the curves to reduce reading variability. If authors reported survival of three or more groups, we pooled the results to make feasible a comparison between two groups. Whenever possible, HR estimates for subgroups were calculated, such as in node-negative, node-positive or untreated patients. Results were crosschecked with those from the original publication to be sure that they are not discrepant, in particular when reading of the survival rates had to be performed on the survival curves.

The individual HR estimates were combined into an overall HR using the Peto's method that was first used and published in 1985 (Yusuf et al, 1985). We carried out heterogeneity χ2-tests, and if the assumption of homogeneity of individual HRs had to be rejected, we used a random-effect model in place of a fixed-effect model. By convention, an observed HR>1 implied a worse prognosis for the group with positive Ki-67 expression. This impact of Ki-67 on survival was considered to be statistically significant if the 95% confidence interval (CI) for the overall HR did not overlap 1. We have used the authors’ definitions for DFS and OS.

All the statistical calculations for our meta-analysis were performed with personal computing.

Results

Characteristics of the studies

Out of 68 studies published between the years 1989 and 2006, 46 had the sufficient information for HR extraction, including 38 studies evaluable for DFS and 35 for OS, some of them being evaluable for only one of these end points, or they analysed only one of these end points. Tables 1 and 2 list the evaluable studies with their main characteristics, and Table 3 presents the main results of this meta-analysis. The reasons to consider an article as non-evaluable were: (a) no univariate analysis reported; (b) no possibility to calculate HR using one of the methods mentioned above due to the fact that the distribution of Ki-67 was not reported in the article, or sometimes Ki-67 was analysed in combination with other prognostic markers rendering the analysis impossible; (c) overlapping data published in different journals; and (d) inclusion of metastatic BC patients. Table 4 lists all the studies considered non-evaluable for this meta-analysis, but used at sensitivity analysis.

Table 1 Main characteristics of all studies included in the meta-analysis for overall survival
Table 2 Main characteristics of all studies included in the meta-analysis for disease-free survival
Table 3 HR values and heterogeneity test for all subgroups analysis in patients with early breast cancer
Table 4 Studies that were not evaluable for this meta-analysis, but included in the sensitivity test

The number of patients included across all studies varied from 42 to 863, and the follow-up period varied from 23.6 months (mean) to 16.3 years (median). Different antibodies were used through all trials: anti-Ki-67 was used in 24 studies (52.1%), anti-MIB-1 in 24 studies (52.1%), both antibodies were performed in five studies (Keshgegian and Cnaan, 1995; Veronese et al, 1995; Bevilacqua et al, 1996; Querzoli et al, 1996; Billgren et al, 2002), anti-Ki-S5 in two studies (Rudolph et al, 1999a; Esteva et al, 2004) and anti-Ki-S11 in one study (Rudolph et al, 1999b). The different cut-off values used were those of the authors (range: 3.5–34%). Threshold definitions were mean or median values, the best cut-off value or an established arbitrary value.

Out of the 38 evaluable studies for DFS (10 954 patients), subgroup analysis was possible in 15 studies with node-negative patients (3370 patients) (Sahin et al, 1991; Weikel et al, 1991, 1995; Gaglia et al, 1993; Bevilacqua et al, 1996; Brown et al, 1996; Pierga et al, 1996; Railo et al, 1997; Jansen et al, 1998; Clahsen et al, 1999; Harbeck et al, 1999; Rudolph et al, 1999a; Billgren et al, 2002; Trihia et al, 2003; Erdem et al, 2005), in eight with node-positive patients (1430 patients) (Weikel et al, 1991, 1995; Gaglia et al, 1993; Pierga et al, 1996; Jansen et al, 1998; Billgren et al, 2002; Trihia et al, 2003; Esteva et al, 2004) and in six with untreated node-negative patients (736 patients) (Sahin et al, 1991; Weikel et al, 1991; Bevilacqua et al, 1996; Railo et al, 1997; Jansen et al, 1998; Harbeck et al, 1999). Regarding OS (9472 patients), of all 35 studies, subgroup analysis was possible in nine studies with node-negative patients (1996 patients) (Jensen et al, 1995; Weikel et al, 1995; Bevilacqua et al, 1996; Brown et al, 1996; Domagala et al, 1996; Fresno et al, 1997; Rudolph et al, 1999a; Trihia et al, 2003; Erdem et al, 2005), in four with node-positive patients (857 patients) (Weikel et al, 1995; Domagala et al, 1996; Gonzalez et al, 2003; Trihia et al, 2003) and in two that included only untreated patients (node-negative and node-positive) (284 patients) (Pinder et al, 1995; Bevilacqua et al, 1996).

Meta-analysis

The main meta-analyses results (overall population and DFS/OS) are shown in Figures 1 and 2. For the overall population, worse DFS (HR 1.93, 95% CI 1.74–2.14; P<0.001) and OS (HR 1.95, 95% CI 1.70–2.24; P<0.001) were observed among patients considered as Ki-67 positive. Worse prognosis was observed independently both in node-negative (DFS (HR 2.31, 95% CI 1.83–2.92; P<0.001); OS (HR 2.54, 95% CI 1.65–3.91; P<0.001)) and in node-positive patients (DFS (HR 1.59, 95% CI 1.35–1.87; P<0.001); OS (HR 2.33, 95% CI 1.83–2.95; P<0.001)). For the untreated patients subgroup analysis, worse DFS was found in all node-negative patients (HR 2.72, 95% CI 1.97–3.75; P<0.001), as well as worse OS in node-negative and node-positive patients taken together (HR1.79, 95% CI 1.22–2.63; P=0.001).

Figure 1
figure 1

Results of the meta-analysis with all evaluable studies for DFS. A hazard ratio (HR)>1 implies a worse DFS for the group with increased Ki-67. The squared size is proportional to the number of patients included in each study. The centre of the lozenge gives the combined HR for the meta-analysis and its extremities the 95% CI.

Figure 2
figure 2

Results of the meta-analysis with all evaluable studies for OS. A HR>1 implies a worse OS for the group with increased Ki-67. The squared size is proportional to the number of patients included in each study. The centre of the lozenge gives the combined HR for the meta-analysis and its extremities the 95% CI.

The necessity to exclude some studies due to a lack of results for aggregating the results is a well-known important problem when conducting a meta-analysis, because the excluded studies show often a smaller effect compared to the studies published with full details and evaluable for the meta-analysis. To assess the impact of bias related to the unevaluable studies (that might lead to an overestimation of the effect), we performed an analysis on the overall patient populations including both evaluable and unevaluable studies. For papers reporting only HR estimates obtained in multivariate analyses, we used this HR estimate together with its variance. For those with uncertainties related to the number of events and then the variance of the HR estimate, we made rough approximation of the variance. Finally, for the studies where no useful information could be retrieved from the publication, we considered that the HR estimate was 1 (i.e. no impact at all for Ki-67) and used a minimal variance compared to the included studies of the same size. Even by carrying out this sensitivity analysis, we still observe a significant pejorative impact of Ki-67 positivity on DFS (HR 1.74, 95% CI 1.56–1.95; P<0.001; heterogeneity test P<0.001) and OS (HR 1.76, 95% CI 1.54–2.00; P<0.001; heterogeneity test P<0.001).

Discussion

The present meta-analysis confirms that high Ki-67 expression in patients with early BC confers worse prognosis in the overall population and quantifies its prognostic univariate impact. Further, it was also shown in subgroup analyses for node-negative, node-positive and untreated patients. This is the first meta-analysis of published studies to evaluate the association between Ki-67/MIB-1 expression and prognosis in early BC. Prognostic markers may be defined as those markers that are associated with some clinical outcomes, typically a time-to-event outcome such as OS or DFS, independently of any treatment or intervention. The best setting to apply this concept is in untreated populations, which helps identifying the so-called pure prognostic marker. Prognostic markers may also be used to aid the decision-making process for adjuvant therapy, for example, they may be used as decision aids in determining whether a patient should receive adjuvant chemotherapy or how aggressive that therapy should be (McShane et al, 2005).

Ki-67 has been assayed in many studies as a prognostic and/or predictive marker in early BC. As a predictive marker, very few trials of primary systemic therapy, mostly retrospective and with conflicting results have been published (Colozza et al, 2005), and therefore we felt that the assessment of the predictive role of Ki-67 was out of scope for this meta-analysis.

Our meta-analysis was carried out using literature published results, and we therefore acknowledge some limitations of our approach which is, however, much less expensive than a meta-analysis using individual patients data. The language selection could favour positive studies, following the assumption that they are more often published in English, whereas the negative ones tend to be published more often in local journals using the author's native languages (Egger et al, 1997). However, we did not identify many papers published in a national language (Italian, Russian, Serbian, German) (Lelle, 1990; Topic et al, 2002; Kushlinskii et al, 2004; Costarelli et al, 2005). This may be called the ‘Tower of Babel bias’ and, in at least one of 36 consecutive meta-analyses, the exclusion of papers for linguistic reasons produced different results from those which would have been obtained if this exclusion criterion had not been used (Gregoire et al, 1995). Another possible source of confusion is the use of the same cohort of patients in different publications, although studies that were clearly based on the analysis of the same patient cohorts were excluded in this meta-analysis.

Some authors consider meta-analyses using individual data to be the gold standard evidence (Stewart and Parmar, 1993; Oxman et al, 1995). This approach is normally considered to be a new study that takes into account all performed studies on the topic, published or not, and that requires an individual data update by the investigators; it is thus much more time consuming, complex and costly. In a comparison between a meta-analysis based on individual patient data and one based on extracted data, the overall duration for the former was found to be 1–5 years while for the latter it is only 1–5 months. Additionally, the overall cost to perform an individual patient data meta-analysis is $50 000 to $500 000, whereas for an extracted data study it is in the range of $5000 to $30 000 (Piedbois and Buyse, 2004). Therefore, a meta-analysis on published literature is worthwhile and, especially in a situation, as here, it is very unlikely to find the resources to conduct a meta-analysis based on the individual data.

The method used for extrapolating HR might be a source of some variability in the HR estimates. When no other useful information was available, we extrapolated the HR from the survival curves using several time points during follow-up for reading the corresponding survival rates, assuming that censored observations were uniformly distributed. The estimation of survival rates based on the graphical representation of the survival curves was performed independently by three of the authors and we compared our HR estimate and its statistical significance with the results published in each individual trial. We did not identify any major contradiction between our results and the results available in the papers.

The adverse impact of Ki-67 positivity on both OS and DFS was observed in the overall population as well as in the subgroups node-negative and node-positive patients. Significant heterogeneity was detected when considering the whole population and node-negative patients. It is not considered appropriate to define a single measure (i.e. HR associated with Ki-67 positivity in this case) from studies with inherent dissimilarities. The observed disparity among the conclusions of different studies, responsible for the observed heterogeneity, can be quantified by applying quality scores to the selected studies included in the meta-analysis. However, these scores do not always explain the observed results (Greenland, 1994). In this case, the methodological characteristics of each study must be taken into consideration.

In 1992, Cattoretti et al (1992) reported better success in staining Ki-67 in paraffin-embedded samples after the new antibodies anti-MIB-1 and anti-MIB-3 had been developed. Although several antibodies are now commercially available to stain Ki-67, anti-MIB-1 is the most frequently used in recent studies (Urruticoechea et al, 2005). In our meta-analysis, antibodies other than anti-MIB-1 and anti-Ki-67 were included, such as anti-Ki-S5 (Rudolph et al, 1999a; Esteva et al, 2004) and anti-Ki-S11 (Rudolph et al, 1999b), albeit representing only a minority of the cases. Moreover, Ki-67 expression is usually estimated as the percentage of tumour cells positively stained by the antibody, with nuclear staining being the most common criteria of positivity. The use of different antibodies and scoring protocols without a standard minimum number of cells to be counted may account for some of the differences between the studies.

In our meta-analysis, some studies have used 10% as the cut-off (arbitrary value), whereas others have chosen mean, median, the optimal cut-off value or arbitrary values, and these differences might be responsible for the difficulty in determining a standard threshold in daily practice. However, some authors have described that the choice of the cut-off point for IHC may depend on the clinical objective: if Ki-67 is used to exclude patients with slowly proliferating tumours from chemotherapeutic protocols, a cut-off of 10% will help avoid overtreatment. In contrast, if Ki-67 is used to identify patients sensitive to chemotherapy protocols, it is preferable to set the cut-off at 25% (Spyratos et al, 2002). In the context of this meta-analysis, we may assume that increased Ki-67 leads to an increased risk of relapse and/or death and that a relative increase is estimated although the baseline risk (the risk in the group considered Ki-67 negative) is not the same in all the studies.

A further limitation of our meta-analysis is that it assesses only the univariate prognostic value of Ki-67. So, we cannot infer from our meta-analysis that Ki-67 is an independent factor; the answer to that question should come from a prospective study (it is likely that a meta-analysis of individual data would not solve the question as the intersection of the sets of covariates available in the individual studies is most probably very small).

To better clarify the prognostic role of ER status, Sotiriou et al (2006) used gene array profiling to explore the implications of the joint distribution of ER status and gene expression grade index (GGI) to predict clinical outcome. They found that almost all ER-negative tumours were associated with high GGI scores (high grade), whereas ER-positive tumours were associated with a heterogeneous mixture of GGI values. This means that GGI adds additional prognostic information when the ER status is known, whereas the opposite is not true. Unfortunately, due to the lack of information in the published studies used in our study, an analysis of the impact of Ki-67 expression on the ER-negative and ER-positive subpopulations and grade, which are well-known risk factor associated with worse outcome, was not possible. Table 5 summarises the main results of the recent genes signatures for prognosis/prediction in BC.

Table 5 Main results from the recent gene expression signatures in breast cancer

Despite years of research and hundreds of reports of tumour markers in oncology, the number of markers that have emerged as clinically useful is quite small. The REporting of tumour MARKer Studies (REMARK) guidelines was the major task of the NCI-EORTC First International Meeting on Cancer Diagnosis, representing a collaborative effort of statisticians, clinicians and laboratory scientists. The guidelines contain 20 recommendations derived from studies on tumour markers and regarding study design, methods of statistical analysis, preplanned hypotheses, patient and specimen characteristics, and assay methods. The widespread use of published guidelines for analytical methods and the reporting of results would greatly facilitate the development of alternative analyses and meta-analyses (Alonzo, 2005; McShane et al, 2005).

Despite some limitations, this meta-analysis supports the prognostic role of Ki-67 in early BC, by showing a significant association between its expression and the risk of recurrence and death in all populations considered and for both outcomes, DFS and OS. Had the proposed REMARK guidelines been employed in all the studies selected for this meta-analysis and had all necessary information been available, our literature-based meta-analysis would better characterise the role of Ki-67 as prognostic marker.