Research report
Relationships among measures of treatment outcome in depressed patients

https://doi.org/10.1016/S0165-0327(02)00080-0Get rights and content

Abstract

Background: Studies attempting to identify predictors of antidepressant response in patients with major depression have reported inconsistent results. One explanation may be the different definitions of outcome used. Methods: 187 depressed subjects were recruited and were randomised to treatment with fluoxetine and nortriptyline. At baseline and 6 weeks, subjects completed Hamilton Depression Rating Scale HDRS-17 and 27, Montgomery Asberg Depression Rating Scale (MADRS), the Hopkins Symptom Checklist (SCL-90) and the Social Adjustment Scale (SAS) as well as the Clinical Global Impression Scale (CGI). Relationships among outcome measures were assessed. Receiver Operator Characteristic (ROC curves) were used to show the diagnostic ability of the MADRS and HDRS in predicting the clinician’s rating. Results: All outcome measures were moderately to highly correlated. All measures were significantly related to the clinician’s global impression, but the strongest associations were with the MADRS score. Using ROC curves we showed that a score of 8 on the HDRS or 14 on the MADRS was the optimal compromise between sensitivity and specificity in dividing this sample into responders and non-responders. A 60% reduction in HDRS and MADRS scores rather than a 50% reduction appeared the most valid division between responders and non-responders. Limitations: We relied on clinician judgement as the validating criterion. The results only apply to a sample of moderately depressed outpatients. Conclusions: The MADRS score seemed to most accurately reflect a clinician’s impression of change. Dividing a sample into responders and non-responders can be approached empirically.

Introduction

The question of which antidepressant works best in an individual patient is of fundamental clinical importance. A number of investigators have attempted to identify predictors of response in major depression using variables such as depression severity, depression subtypes, comorbidity, gender, age and so on (Joyce and Paykel, 1989). So far the studies have reported inconsistent results. For example, melancholia has been reported to be associated with better response (Heiligenstein et al., 1994, Roose et al., 1994, Parker, 2001), poorer response (Davidson and Turnbull, 1983) and no difference in response (Joyce et al., 1994, Zimmerman and Spitzer, 1989) to antidepressants. Comorbid panic disorder has been reported to predict worse response to antidepressants (Brown et al., 1996, Grunhaus et al., 1986) and no difference in response (Joyce et al., 1994). Explanations for these inconsistencies have usually concentrated on sampling issues, patient characteristics and non-linear relationships between the outcome measures and the predicting variables.

Another potential explanation for the discrepancies among studies is the definition of outcome itself. Varying definition of response will have implications for which factors are found to be predictive. Differing definitions of response, such as a 50% reduction in depression scores, will yield a mixture of partial and complete responders, potentially obscuring relationships between possible predictors and response (Kocsis, 1990). The definition of outcome will affect the strength and direction of the relationship between outcome and predicting variables. A clever study by Tedlow et al. (1998) highlighted this issue. They showed that the relationship between severity of depression and response to antidepressants depended almost entirely on the definition of outcome used. If response was defined as HDRS-17 score at the end of the trial, then baseline depression was strongly related to poor outcome (r=0.41, P<0.0001). If response was defined as the change in HDRS-17 score from baseline to endpoint, then there was no relationship (r=0.07, P=0.26), while if response was defined as percentage reduction in HDRS-17 score from baseline there was a positive effect i.e. baseline depression severity predicted a moderately better response (r=−0.15, P=0.02).

A second major problem is the lack of an accepted validated measure of response. There are a large number of scales claiming to measure depression, with over 30 in English alone (Snaith, 1993). The current solution to this dilemma over validity is an ad hoc one. The HDRS appeared in 1960 (Hamilton, 1960) and was rapidly incorporated into research practice. It has become the premier measure of outcome—a recent review noted that the HDRS was the chosen scale on which conclusions were based in 66% of research reports (Snaith, 1996). One could argue that this consistency is useful for comparing results from different studies and for meta analyses. However, there are two significant difficulties. The first is that the HDRS has several alternative versions, although it is usually the 17 item version that is used. More concerning are the different definitions of outcome—studies use cut-off scores varying from 5 to 8 (Hedlund and Vieweg, 1979), and a 50–60% reduction in score, to define response. The more restrictive criteria will clearly identify a different group from the less restrictive ones. The choice of response criteria appears to be arbitrary and set up by convention in different research groups.

The second, and possibly more serious, dilemma is whether the HDRS-17 is an appropriate instrument to measure outcome in depression studies. Most investigators who have studied the psychometric properties of the HDRS have called its integrity into question (Bech and Coppen, 1990). A comprehensive review by Gibbons et al. (1993) claimed that HDRS scores do not measure a unidimensional index of global depression but five distinct factors comprised of different symptom items. They concluded that an HDRS score is a weak index of depressive syndrome severity, its clinical value is not clear cut, and they wondered why most investigators continued to employ it despite the consistent criticism (Gibbons et al., 1993).

At least two studies have highlighted this problem of different results depending on what outcome measure was used. Lonnqvist et al. (1994) reported the results of a drug trial showing significant improvement using the MADRS but no significant improvement using the HDRS. Poirier and Boyer (1999) claimed venlafaxine may be superior to paroxetine in treatment resistant depression. There was a significant difference in remission rate (HDRS score less than 10), and response rate (HDRS less than 50% improvement), but not in change in HDRS score or Clinical Global Impression Scale (CGI). Were the patients better or not? Both studies mentioned these discrepancies and then focused on the outcome measure which highlighted the positive result but at least they acknowledged the different outcome and made some comment.

This paper attempts to define which outcome measure may be the most appropriate in a group of depressed outpatients who have been recruited for a study of predictors of response to antidepressant treatment. We have included a number of possible outcome measures; the clinician rated HDRS-17, HDRS-27, MADRS and CGI as well as self-reports—Hopkins Symptom Check List (SCL-90) and Social Adjustment Scale (SAS). There are three steps. The first is to explore the relationships among the various putative outcome measures. The second is to relate these measures to the clinician’s global impression of improvement. We felt that an experienced clinician’s global judgement as to whether a patient had responded or not was as valid as any other criteria against which depression scales could be judged. The third step was to explore ways of dividing the group into responders and non-responders and whether there were empirical ways of doing this, rather than simply following precedent.

Section snippets

Subjects

The depressed patients for this study were recruited as part of the Christchurch Outcome of Depression Study. For inclusion in this study, the current principal diagnosis needed to be major depression, for which the treating clinician considered that treatment with an antidepressant drug was appropriate. Apart from the oral contraceptive and an occasional hypnotic, patients were required to be free of all psychotropic drugs for a minimum of 2 weeks. Patients were excluded if they had a history

Characteristics of sample

A total of 195 subjects (111 females and 84 males) completed the 6-week treatment trial. The mean age was 31.9 years (±11.2). The mean scores of various depression measures at baseline and 6 weeks are shown in Table 1.

Relationship among the outcome measures

Table 1 also shows the mean scores at baseline and 6 weeks and Pearson correlations at 6 weeks. All the measures of outcome are moderately to highly correlated. The strongest correlations are between the various clinician administered measures. The high correlation between the

Discussion

The results from this study suggest that the decision on what measures should be used when assessing treatment response in major depression is more than a matter of convention and precedent. Although all the outcome measures chosen were significantly correlated, some appear to more accurately reflect a clinician’s judgement of improvement. Using the clinician’s judgement also allowed us to divide the subjects into responders and non-responders by selecting the score and percentage improvement

References (25)

  • L. Grunhaus et al.

    Simultaneous panic and depressive disorder: Response to antidepressant treatments

    J. Clin. Psychiatry

    (1986)
  • M. Hamilton

    A rating scale for depression

    J. Neurol. Neurosurg. Psychiatry

    (1960)
  • Cited by (0)

    View full text