Introduction

Metastatic bone disease is common among patients with advanced-stage cancer, with prevalence estimates of roughly 70% of patients with advanced breast or prostate cancer and up to 95% of patients with multiple myeloma [14]. When cancer metastasizes to the bone, it can have profoundly negative effects on patients. For example, bone metastases often lead to debilitating and potentially life-threatening skeletal related events (SREs) such as pathological fractures and malignant spinal cord compression [2, 3, 5]. Bone metastases are also associated with extreme pain and decreased health-related quality of life (HRQL) [2, 4, 6, 7].

Several types of treatments are available for patients with bone metastases, including external beam radiotherapy and analgesics such as nonsteroidal anti-inflammatory medications and opioid analgesics [1, 4, 5]. One of the primary treatment approaches is the administration of bisphosphonate medication, which has been shown to reduce incidents of SREs such as pathological fractures, while providing pain relief and resulting in improved HRQL [3, 5, 812]. In clinical trials examining effectiveness of bisphosphonates, relief of bone pain is often a key outcome [9, 13, 14]. Because pain cannot be quantified with objective clinical measures, assessment of pain requires the use patient-reported outcome (PRO) measures to capture patients' subjective experience of the presence, severity, and impact that pain exerts on physical, functional, social, and emotional well-being. Consequently, it is necessary to use well-developed and validated PRO measures of pain to assess the effectiveness of existing bisphosphonates and new treatments for patients with bone metastases.

The primary goal of the current review was to examine PRO measures used to assess pain in trials of bisphosphonates for the treatment of bone metastases. These measures were then evaluated with regard to the FDA guidance on PROs, first issued in February 2006 and updated in December 2009 [15, 16] and the European Medicines Agency reflection paper [17]. Recommendations are provided for assessment of pain in future trials of treatments for bone metastases. As PRO measures assessing additional endpoints of HRQL and functional status are often included in bisphosphonate trials [13], a secondary goal of the current review was to identify and examine measures of these constructs within trials primarily assessing pain relief among patients with metastatic disease.

Methods

Literature search

A literature search was performed to identify studies that used PROs to assess pain and associated functional status or HRQL in clinical trials of bisphosphonates for patients with bone metastases. The search was conducted using the PubMed database (comprised of MEDLINE, HealthStar, CancerLit, AIDSline, and OLDMEDLINE) and restricted to articles written in English and published during the 10-year period from January 1999 to April 2009.

An initial search identified citations mentioning bisphosphonates in general or one of four specific bisphosphonate medications (clodronate, ibandronate, pamidronate, and zoledronic acid), yielding 6,449 citations. Then, the bisphosphonate search was crossed with the search for articles examining pain associated with bone metastases (search phrase: metasta* AND [bone OR skeletal] AND pain), yielding 370 citations. In addition to identifying patient-reported measures of pain, this review also aimed to locate studies using PROs to assess functioning and HRQL related to the pain of bone metastases. However, a separate search was not conducted to identify PROs assessing functional status or HRQL because these articles would be identified within literature search focusing on pain.

Abstract review

The 370 abstracts were reviewed in order to select articles for more detailed full-text review. At this stage of the review process, the goal was to identify and obtain any articles that could have included a PRO measure. For this project, a measure was considered to be a PRO if it fit the definition stated in the FDA Guidance on Patient Reported Outcomes [15]. In this document, a PRO is defined as “any report of the status of a patient's health condition that comes directly from the patient, without interpretation of the patient's response by a clinician or anyone else.” PRO instruments may be patient-completed questionnaires or structured interviews. These instruments can be used to assess a wide variety of concepts, ranging from symptoms to more complex constructs such as functional status or quality of life.

Abstracts were selected for subsequent full-text article review if they mentioned the following: specific PRO measures (e.g., FACT-G and BPI), “patient-reported” or “subjective,” visual analog scale or VAS, constructs that are likely to be assessed by PROs (e.g., quality of life, function, pain, pain intensity, pain score, symptomatic improvement, and symptomatic response), or pain assessed via analgesic use (e.g., “pain assessed as number of days on opioids”). Clinical, bio-marker, and performance-based measures (e.g., Eastern Cooperative Oncology Group Performance Status [18], Karnofsky Performance Status [19]) were not considered relevant for the current review of PRO measures because they do not involve patient reports. Furthermore, the current review excluded patient-reported measures of constructs that were not directly related to pain, functional status, or HRQL, such as treatment satisfaction and time spent travelling to the hospital to receive bisphosphonate infusions. The following types of articles were also excluded from this review: case studies, review articles, meta-analyses, studies with a sample size of less than ten patients, letters, commentaries, retrospective studies (e.g., chart review rather than PRO), and cost-effectiveness studies.

Full-text article review

Based on the abstract review, 68 articles were obtained for full-text review following the same inclusion/exclusion criteria described above. During this full-text review, several additional study characteristics were considered when determining whether to include an article. The current review aimed to identify PROs used to assess outcomes of bisphosphonate trials. Therefore, articles in which PROs were administered only at baseline were excluded. Also excluded were studies focused only on instrument validation without reporting trial outcomes Similarly, PRO measures mentioned in the “Methods” section without subsequent results were excluded from this review. Articles were also excluded if they reported assessment of pain or another construct that was likely based on a PRO, but did not provide a name or description of a PRO measure.

When an original article and a secondary analysis of the same trial were both published within the 10-year time window of the current review, the secondary analysis was excluded in order to avoid double-counting individual studies. However, two secondary analyses were included in the current review because the primary analysis was not published within the 10-year time window of this review [20, 21]. Furthermore, two additional secondary analyses were included in this review because they focus on PRO results that were not reported in the original publication (included secondary analyses: Body et al. [22] and Diel at al. [10]; excluded primary publications: Body et al. [23] and Tripathy et al. [24]).

Data extraction procedures

A total of 49 articles were selected for inclusion in the current review. Table 1 presents the following information on each study with a PRO assessing pain: citation, specific bisphosphonate treatment, total sample size, description of the pain PRO measure (as presented in the article itself), the reference provided in each article for the pain PRO, and information regarding whether the article specifies that the measure is patient-reported. Table 2 presents similar information for the subset of 19 studies that used PRO measures to assess functional status or HRQL. Studies in Tables 1 and 2 are grouped based on the PRO measures that were administered.

Table 1 Pain measures used in trials of bisphosphonate treatment for bone metastases
Table 2 Function and HRQL measures used in trials of bisphosphonate treatment for bone metastases

When analgesic use was assessed separately from pain, we did not report this analgesic measure even if it was a PRO. In some studies, however, analgesic use (which may or may not have been patient reported) was one component of an overall pain score which also incorporated a patient rating of pain. In these situations, the overall pain assessment is included in the tables, and the individual components are listed (e.g., Berruti et al. [25], Jagdev et al. [26], Mitsiades et al. [27], and Wang et al. [28]). Similarly, functional status measures were occasionally used as one component of a pain composite score. In these situations, the functional status measure is listed as part of the composite score in Table 1, even if the functional status measure was not patient-reported.

Recent literature has distinguished among three types of single-item measures of pain intensity [2931]. A visual analog scale (VAS) is a line, most frequently 100-mm long, with each end of the line labeled with categorical descriptors representing the minimum and maximum of pain intensity, such as no pain to extreme pain. Patients are asked to place a mark on the line that represents their pain intensity level between the two extremes. A numerical rating scale (NRS) consists of a range of numbers, usually 0 to 10, with anchors at each end of the scale representing no pain and extreme pain. Respondents are asked to choose the number that best represents their level of pain intensity. A verbal rating scale (VRS) consists of a list of descriptors or phrases that represent varying degrees of pain intensity. Each of these descriptors often has a number associated with it (e.g., 0 = none, 1 = mild pain, 2 = moderate pain, 3 = severe pain, and 4 = intolerable pain). In the articles reviewed for the current study, single-item scales were frequently called a “VAS” even if they were actually an NRS. In Table 1, single-item measures have been categorized based on the definitions of VAS, NRS, and VRS described here, regardless of the label used by the authors. However, the descriptions of measures in the fourth column of Table 1 use the exact wording from the original sources, regardless of whether these labels were used correctly. Thus, there are several measures described by the original authors as a VAS, but categorized in the current review as an NRS. In some cases, descriptions of measures were not sufficiently clear to allow us to categorize them with certainty. In these circumstances, we categorized measures based on the terminology used by the authors. All three of these single-item approaches can be used to assess current pain as well as worst pain or average pain during a specified recall period, such as 24 h or 7 days.

Results

Summary of studies in this review

A total of 49 studies were located that included patient-reported measures of pain to assess outcomes of bisphosphonate treatment for bone metastases. Sample sizes of the individual studies (excluding secondary and pooled analyses) ranged from 10 to 1,648 patients. The most common treatment under investigation was zoledronic acid, which was examined in 22 of the studies. Other bisphosphonate treatments included clodronate (6 studies), etidronate (1 study), ibandronate (8 studies), and pamidronate (15 studies). Three of the pamidronate studies included another bisphosphonate as a treatment comparator. Trial designs varied across the studies reviewed, with designs ranging from open-label, single-center studies to randomized, double-blind, multi-center controlled trials.

Pain measures (49 studies)

All 49 studies are listed in Table 1, grouped according to 12 categories of pain measures. The first four measures are multi-item scales: (1) the Brief Pain Inventory (BPI), (2) Wisconsin Brief Pain Questionnaire (BPQ), (3) McGill–Melzack Pain Questionnaire, and (4) a questionnaire from Guy's Hospital Assessment of Response Study. The next four categories of measures are reported by the original authors as “visual analogue scales”: (5) VASs with a specified length, (6) ten-point VASs, (7) five-point VASs, and (8) unspecified VASs. Then, four additional groups of measures are presented: (9) single-item scales that are not labeled by the original authors as VASs, (10) scores derived by multiplying pain severity and pain frequency, (11) composite scores involving combinations of bone pain with other constructs, and (12) a face scale. Table 1 includes descriptions of each measure, quoted from the articles included in the current review.

The BPI was the most common formally developed and named instrument used for assessing pain. This questionnaire was developed by the Pain Research Group of the WHO Collaborating Center for Symptom Evaluation in Cancer Care [32]. The original development article states that “depending on the patient, it can be self-administered or used in a clinical interview, [and] the form of administration has little effect on the outcome.” The BPI was adapted from the Wisconsin Brief Pain Questionnaire for use with cancer patients to assess intensity and interference of pain. Both short and long form versions of the BPI include questions about pain location, severity, relief, and interference. The four pain severity items ask patients to rate their worst, least, average, and current pain (i.e., “pain right now”) over the past week (in the long form of the BPI) or 24 h (in the short form) using a 0–10 NRS (0 = no pain or 10 = pain as bad as you can imagine). The seven pain interference items ask the patients to rate the degree to which pain limits their functions using a 0–10 NRS (0 = no interference or 10 = interferes completely). A total of 13 studies used part or all of the BPI, including 11 studies using an English version and two studies using the Greek version. Among the 11 English studies, there was variation in the items that were used. For example, some studies focused on a composite of the four pain scores (worst, least, average, and current pain), while other studies appear to have used a smaller subset of these items, but this information was not always presented clearly. Seven of the studies using the BPI did not specify that the instrument was completed by patients, and some studies erroneously described the response options as “1 to 10” or “ten-point scale” [33, 34], when there are actually 11 response options ranging from 0 to 10. In addition, one of the studies appears to have used the worst pain item as a single-item NRS measure [33]. Furthermore, it is likely that most or all of these studies were using items from the most recent version of the BPI, which the instrument developer calls the short form [35]. However, few of the studies specified which version of the instrument was used.

Three additional multi-item scales were each used in only one study. The Wisconsin Brief Pain Questionnaire was administered in a study by Berenson et al. [36], although minimal details were provided regarding the characteristics or administration of the instrument. This measure was designed to be a self-administered assessment of pain associated with cancer and other diseases [37]. It assesses constructs similar to those subsequently included on the BPI, as described above. The McGill–Melzack Pain Questionnaire was used in a study by Ernst et al. [38], which administered only the six-point Present Pain Intensity scale of this instrument, ranging from no pain to excruciating pain. The study identified in the current review did not include other items of this instrument, such as those for which patients are instructed to select words that best describe their pain experience [39, 40]. Finally, Berruti et al. [25] administered a brief unnamed questionnaire, but provided minimal description and an incorrect citation as explained in Table 1.

The most common approach for assessing pain intensity, which was used in 24 studies, was to administer a single-item scale such as VAS, NRS, and VRS measures (these three types of single-item measures are defined above in the “Methods” section). Most articles did not explicitly state that these scales were completed by the patient, but it is likely that they were patient-completed in all cases. There was substantial variation in descriptions and citations of these single items. In four studies, the VAS was described in terms of a specific length (e.g., 10 cm or 100 mm), while the most frequently used single-item was a 0–10 NRS. Two studies mentioned that a VAS was used to assess pain, but did not provide any description of the VAS. Less than half of the studies using a single-item provided a citation for the scale, and there was great variation in the citations among the articles that did provide a reference. One of the single-item scales combined two constructs, requiring patients to simultaneously rate pain and analgesic use [27]. In sum, although it was common to use a single-item for pain assessment, there was substantial variation in the type of single-item used, strategy for implementation, citation, and clarity with which the measure was described. Furthermore, no studies mentioned that the single-item measure was validated for use in the target population.

Several studies derived a single pain score from a combination of multiple scores. For example, five studies included a pain score that was computed by multiplying severity and frequency of pain. Three of these five studies cite a pamidronate clinical trial published by Theriault et al. [41] when discussing this approach. However, Theriault and colleagues do not provide a citation or details for the origin of this scoring system. Two additional studies included a pain rating as one component of a composite score that also incorporated analgesic consumption and performance status or overall health [26, 28]. In addition, one study conducted by Viatale et al. [42] computed a pain score by adding scores from a 100-mm VAS for pain to a Trial Outcome Index, which was the “sum of the physical and functional domains of the FACT-G.” In this study, the authors referred to the VAS as the “Huskisson” VAS, but did not provide a citation for this VAS. Huskisson published several articles on the measurement of pain using visual and graphic methods [4345]. His publications suggest that VASs “provide the patients with a robust, sensitive, reproducible method of expressing pain severity” [44]. For the current review, we were unable to locate a specific VAS called the “Huskisson VAS.” It is possible that Vitale et al. [42] developed a new VAS according to Huskisson's principles and called it the Huskisson VAS.

Finally, one study used an approach called a “face scale,” which involved assessing patients' bone pain based on choosing a face that most reflected their current mood [46]. This face scale appears to be similar to items from the Functional Health Assessment Charts used in the Dartmouth Primary Care Cooperation Project (COOP Project) [47].

HRQL and functional status measures (19 studies)

Of the 49 studies using a PRO measure to assess pain, 19 studies included at least one PRO assessing functional status or HRQL. These studies are listed in Table 2, grouped according to the seven measures used: (1) European Organization for Research and Treatment of Cancer QLQ-C30 (EORTC QLQ-C30), (2) EORTC Breast Cancer Module (QLQ-BR23), (3) Functional Assessment of Cancer Therapy-General (FACT-G), (4) EuroQol EQ-5D (EQ-5D), (5) Prostate Cancer-Specific Quality of Life Instrument (PROSQOLI), (6) the Edmonton Symptom Assessment System (ESAS), and (7) linear analog scale assessment of quality of life. The EORTC QLQ-C30 and the FACT-G were the most commonly used HRQL and functional status measures, while the other five measures were each used in only a single study. Table 2 includes descriptions of each measure, quoted from the articles included in the current review.

Seven studies administered the EORTC QLQ-C30, which was developed to evaluate the quality of life of patients participating in international oncology trials [48]. The EORTC QLQ-C30 was designed to be relevant to a broad range of cancer patients. It may be administered along with separate questionnaire modules designed for specific cancer types or treatments (although only one study in the current review administered one of these supplemental modules, discussed below). This measure includes 30 items which contribute to five functional scales (physical, role, cognitive, emotional, and social), three symptom scales (fatigue, pain, and nausea and vomiting), a global health scale, and a quality of life scale. In the most recent version (version 3.0), the first 28 items are rated on a four-point Likert scale (1 = not at all or 4 = very much), and the last two items are rated on a seven-point Likert scale (1 = very poor or 7 = excellent) [49]. Among the seven studies administering the EORTC QLQ-C30, there were differences in how the questionnaire was used. Two studies analyzed only the physical functioning domain of this questionnaire, while another study described only the functional scales without indicating whether the other scales were also administered. The study by Mañas et al. [50] does not specify which version of the EORTC QLQ-C30 was used, but describes a series of “yes/no questions,” which is suggestive of version 1.0 before response options were modified to the four-point Likert scales used in subsequent versions. Six of the seven studies did not specify which version was administered, and four of the studies did not mention whether the instrument was completed by patients.

In addition to administering the EORTC QLQ-C30, Wardley et al. [51] assessed quality of life using the Breast Cancer Module (QLQ-BR23). This condition-specific questionnaire was developed by the EORTC Study Group to be used in conjunction with the EORTC QLQ-C30. Developed according to the EORTC guidelines for module development, the QLQ-BR23 focuses on aspects of quality of life specifically related to breast cancer with 23 items assessing treatment modalities, body image, sexuality, and future perspective [52].

Nine studies administered the FACT-G, which was developed by Cella et al. [53] as a quality of life questionnaire for use in patients receiving cancer treatment. This questionnaire is part of the Functional Assessment of Chronic Illness and Therapy measurement system, which is a collection of HRQL questionnaires focusing on chronic illnesses [54]. Version 4 of the FACT-G (the most recent version) contains 27 items, which comprise four subscales (physical, social/family, emotional, and functional). Each item is rated on a five-point Likert scale (0 = not at all or 4 = very much) [53, 55]. Among the nine studies that administered the FACT-G, there was some variation in the way the subscales were used. Two studies indicated that only the functional and physical well-being subscales were analyzed. In one of these studies, Facchini et al. [56] calculated a Trial Outcome Index (TOI) by summing only the functional and physical subscales. This approach to calculating the TOI is not entirely consistent with recommendations from the instrument developers. The developers specified that the TOI can be calculated for use as an endpoint in clinical trials based on these two subscales as well as an “additional concerns” subscale from one of the Functional Assessment of Chronic Illness Therapy (FACIT) disease-, treatment-, or condition-specific scales [55]. In the other study, Mystakidou et al. [34] stated that the functional and physical subscales were used to calculate a “total physical and functional QOL score,” which also appears to be inconsistent with instructions from the instrument developers. Only three of the nine studies specified the version of the FACT-G that was used, and five studies did not specify that this questionnaire was completed by patients.

The large zoledronic acid trial published by Saad et al. [57] was the only study in this review to administer the EQ-5D. The EQ-5D is a generic preference-based health status instrument used in health economic relations. Patients report their functioning in five dimensions: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression [58, 59]. Responses to these five items are used to derive the EQ-5D index score, which represents overall health. After completing the five dimension items, patients complete the EQ-5D VAS, on which they rate their current health on a scale ranging from 0 (worst imaginable health state) to 100 (best imaginable health state).

Ernst et al. [38] used the nine-item PROSQOLI questionnaire, which was derived by Stockler et al. [60] from a longer version originally developed by Tannock et al. [61]. The original version was designed to assess the effects of systemic treatments in men with advanced hormone-resistant prostate cancer. The Stockler version consists of nine linear analog self-assessment scales assessing pain and quality of life, the six-point Present Pain Intensity (PPI) verbal scale from the McGill Pain Questionnaire, and an analgesic score. These nine linear scales are each 10-cm long, and the score is measured in millimeters with 100 representing best function or quality of life.

The small ibandronate trial published by Mancini et al. [62] was the only study in this review to use an item from the ESAS, which was developed and validated as a self-report symptom intensity measure for patients with cancer [63, 64]. The ESAS originally consisted of eight 100-mm VASs for pain, activity, nausea, depression, anxiety, drowsiness, appetite, and sensation of well-being. A ninth VAS for shortness of breath was subsequently added, and some versions also include an additional blank VAS for a symptom that can be added by the patient. The Mancini et al. [62] study only used the well-being item of the ESAS.

Lastly, one study administered a measure described as a “linear analog scale assessment (LASA) of quality of life” [65]. Linear analog self-assessments, also referred to in the literature as LASAs or linear analog scale assessments, use 100-mm lines with descriptors at each extreme [66]. Patients are asked to mark their current state along the line, and scores are measured in distance (e.g., centimeters or millimeters) to this mark from point 0. Mystakidou et al. [65] do not provide a citation or additional details for this measure.

Discussion

A diverse range of PRO measures has been used to assess pain and its impact among patients with bone metastases, with little consistency in measurement approach across studies. Based on this review, there appears to be no consensus on a strategy for assessing pain in patients with bone metastases. Furthermore, presentation of measures in the published articles often lacked clear description, information on measurement properties, citations, or a consistent approach to naming the instruments and method of administration. Given this lack of measurement consistency and clarity, it is difficult to directly compare findings across studies in order to understand the relative potential for pain relief offered by the various bisphosphonates and other treatments. Comparisons across studies using different outcome measures would require calculating effect sizes for each measure [67], an approach which is based on the assumption that different measures are truly assessing the same construct. The lack of measurement consistency across clinical trials is similar to the inconsistent methods clinicians use to assess pain in clinical settings, as reported by patients in a recent large international survey [68].

The results of this review raise questions regarding instrument development and validation. It is frequently recommended that PRO instruments be psychometrically evaluated in the population under investigation [69, 70]. However, none of the pain measures reviewed for this study (Table 1) were developed specifically for patients with bone metastases, and none of the articles mentioned instrument validation conducted within this population. Similarly, none of the HRQL/function measures (Table 2) are specifically targeted towards this population. As a result, it is not possible to know whether these instruments are capturing the aspects of pain and its impact that are most relevant and important to patients with bone metastases. Furthermore, the measurement approaches in these studies frequently do not meet the standards for PRO development and validation set forth in the FDA guidance document [15].

A first step toward improving measurement of pain in patients with bone metastases would be to establish the content validity of frequently used instruments from the patient's perspective [71, 72]. This process would require qualitative interviews in which patients are asked about the relevance of the items to their condition, as well as the clarity and comprehensiveness of each item. These interviews could help determine whether single items, which are commonly used, are sufficient for capturing pain among these patients. Although there is support for reliability and validity of single-item pain intensity measures, pain is known to have qualities beyond the single dimension of intensity, with descriptors such as tingly, deep, sharp, or dull [31, 73]. Consequently, a thorough assessment of pain associated with bone metastases may require a multidimensional instrument assessing the range, types, frequency, duration, location, and impact of bone pain. It is possible that the pattern of a patient's ratings across all of these dimensions could influence treatment decisions.

Several PRO instruments have recently been developed specifically for use in patients with bone metastases. For example, a bone metastases module has been drafted to supplement the European Organization for Research and Treatment of Cancer (EORTC) Core Questionnaire [6]. This new module includes 22 items assessing symptoms, functional interference, and psychosocial domains. Furthermore, the FACIT system now has a questionnaire called the Functional Assessment of Cancer Therapy-Bone Pain (FACT-BP), which was recently developed to assess bone pain and its effects on quality of life. In an initial validation study, this 16-item questionnaire demonstrated good internal consistency reliability, construct validity, and sensitivity to change [74]. A related questionnaire designed to assess treatment satisfaction and convenience in this population was also examined in this validation study. None of these new condition-specific measures were used in clinical trials of bisphosphonates meeting criteria for inclusion in the current review. However, the FACT-BP is being used in clinical trials, and results will likely be published in the future. If these condition-specific measures are widely adopted, they may substantially improve outcomes assessment in future trials of treatments for bone metastases.

Even as new condition-specific measures are developed and implemented, it is likely that previously existing PRO measures of pain will continue to be used in many trials of treatments for bone metastases. There are four steps authors can take when drafting manuscripts to enhance clarity. First, we recommend explicitly stating when a measure is patient-reported rather than clinician-reported. Second, clear and accurate terminology should be used to identify and describe concepts. For example, there is a difference between the terms “PRO measure” and “quality of life (QOL) measure.” PRO implies that the measure is completed directly by patients, whereas QOL refers to the content of a measure, indicating that it was designed to capture quality of life. Despite this distinction, the literature in this field includes examples of researchers erroneously referring to any PRO measure as a QOL measure, even if the measure does not assess QOL [75]. Third, all PRO measures should be clearly named and/or described so that readers can understand exactly how pain was assessed in each study. For example, VAS and NRS measures should be clearly described, and when measures have multiple versions, the version number should be specified. Fourth, validated PRO measures should be implemented according to instructions from instrument developers, and any deviations from the instrument's intended use and scoring approach should be specified. We also recommend avoiding the use of unusual item subsets or the creation of new composite scores derived from multiple measures. These idiosyncratic approaches are not validated, and they are difficult to interpret. Together, these four steps will enhance clarity and consistency of results, while facilitating interpretation of findings and comparisons across studies.

Several limitations of the current review should be acknowledged. First, this review focused only on studies of bisphosphonate treatment because this is the most commonly administered pharmaceutical treatment for patients with bone metastases. Therefore, we cannot comment on PRO measures used to assess pain related to other treatments such as radiotherapy. Second, the literature search conducted for this review only located articles that mentioned “pain” in the abstract or title. There may be published trials of bisphosphonates that included measures of pain, but did not explicitly mention pain in the title or abstract. Such articles are not included in the current review. Third, this review did not include a thorough search for the complete psychometric validation history of each instrument. It is possible that some of these measures could have been validated in the target population, although this was never mentioned in any of the articles included in this review. Fourth, this review focused on identifying and describing measures, rather than reporting the results of each study. Therefore, we cannot comment on which measures were most likely to reflect change in patients' conditions.

Findings of the current review suggest that pain is often a key outcome of trials examining treatment for bone metastases. However, results also highlight the measurement challenges for the field as new treatments are introduced and evaluated. Future research is needed to develop instruments specifically for assessing pain in patients with bone metastases, while validating previously existing measures for use in this population. In recent years, the Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT) has led to some consensus among leading researchers on the optimal measurement strategies for assessing chronic pain in clinical trials [29, 30]. These efforts have provided a multidisciplinary expert consensus on recommended measures and interpretation of PRO results. The inconsistencies revealed by the current review suggest that a similar effort focusing on assessment of pain associated with bone metastases would be a helpful first step toward improving the evaluation of treatments for these patients. With improved assessment tools, it may be possible to identify treatments targeting specific types of pain experienced by patients with bone metastases.