Depressive Response Sets due to gender and culture-based Differential Item Functioning

https://doi.org/10.1016/S0191-8869(01)00203-3Get rights and content

Abstract

Two studies tested a “strong” version of Nolen-Hoeksema's [Nolen-Hoeksema, S. (1987). Sex difference in unipolar depression: evidence and theory. Psychological Bulletin, 101, 259–282.] hypothesis of depressive response sets using samples of Australian respondents (Study I, n=1111) as well as US respondents (Study II, n=300), using a Rasch version of Thalbourne's Manic-Depressiveness Scale (MDS) whose contents are consistent with atypical depression (i.e. depressive episodes with hypomanic symptoms). As predicted, tests for differential item functioning in both studies revealed that women are more likely to worry about “being poor” than equally depressive men (P<0.05), thus ruling out the alternative hypothesis that depressive response sets are simply a byproduct of more frequent or stronger depression in women. Australian women and men, and US women used the MDS items in a similar fashion, whereas equally depressed US men seriously underreported their symptoms. Yet, using top-down purification to derive an unbiased baseline set of items, a “split” 12-item Rasch measure (R-MDS) could be developed that is not affected by differential test functioning due to gender or cultural differences. Comparison of the R-MDS measure and the original MDS scores revealed that while women are more depressive than men by 0.4 SD, the absence of gender bias in the R-MDS decreased this effect by about 20% (0.08 SD). Moreover, gender bias interacted with culture, suggesting that comparisons of depression levels of diverse groups within the same culture may be biased as well. Raw score to R-MDS conversion tables are included.

Introduction

The available literature agrees that women are almost twice more likely than men to report protracted sadness, apathy, low self-esteem and other symptoms that are indicative of depression. Yet, there continues to be less than perfect agreement on how men and women differ with respect to their expression of depressive symptoms (Nolen-Hoeksema, 1987, Nolen-Hoeksema, 1994, Santor and Ramsay, 1998). For instance, based on factor analytic techniques Williamson (1998) identified social withdrawal, indecisiveness, and irritability as characteristic of depressed men. Wilhelm and Hadzi-Pavlovic (1997) suggest that women are more likely to report anxiety, but this hypothesis disagrees with Breslau, Schultz, and Peterson's (1996) finding that anxiety disorders predict the occurrence of depressive disorders for men and women alike. Given this conflicting picture, it is understandable that “negative symptoms” (i.e. symptoms of depression) are often difficult to identify in psychiatric practice (Greden and Tandon, 1991, Muller and Davids, 1999). This problem is exacerbated by the fact that depressive symptoms may vary with the nature of patients' psychiatric illnesses (Gibbons, Clark, Cananaugh, & Davis, 1985).

A coherent framework to address the expression of depressive symptoms is provided by Nolen-Hoeksema's (1987) “response set” explanation according to which women tend to show passive ruminating responses, whereas men actively use distraction to cut off depression before it ramifies. The response set explanation has received consistent empirical support in descriptive research (e.g. Butler & Nolan-Hoeksema, 1994) as well as in laboratory research (e.g. Lyobomirsky & Nolen-Hoeksema, 1993). The detrimental effects of negative rumination are further demonstrated in an experiment showing that the level of depression in mildly-to-moderately depressed subjects increased when they were induced to ruminate, while depression decreased when distractive behaviors were induced (Nolen-Hoeksema & Morrow, 1994). Despite such findings, the possibility remains that in daily life similar levels of depression naturally produce similar negative ruminative behaviors in men and women. Thus, the greater rumination exhibited by women might be a direct consequence of their more frequent depression, thereby implying that gender differences merely serve to moderate the expression of depressive response sets.

This paper tests the “strong” response set hypothesis according to which women express depression differently than equally depressed men. Although Nolen-Hoeksema's theorizing clearly implies the existence of qualitative gender differences, the response set hypothesis has not been tested in this form. We attribute this to the fact that the stronger version poses technical problems that are difficult to solve within the framework of classical test theory (Thissen, Steinberg, & Gerrard, 1986). In particular, a rigorous test of the strong response set hypothesis requires that depression first should be assessed in a gender-neutral fashion. No bias tests were performed in previous research on depressive symptoms (Butler and Nolen-Hoeksema, 1994, Christensen et al., 1999, Lyobomirsky and Nolen-Hoeksema, 1993, Nolen-Hoeksema, 1994, Nolen-Hoeksema and Morrow, 1994), and it is not clear therefore whether the requirement of gender-neutral measurement was met. The following describes how the strong version of Nolen-Hoeksema's basic hypothesis can be tested by removing gender bias within a Rasch (1960) scaling approach.

Most applications of Rasch scaling, as well as related Item Response Theory methods, in clinical and personality testing (e.g. Birenbaum, 1986, Carter and Wilkinson, 1984, Cooke and Michie, 1997, Gibbons et al., 1985, Lange and Houran, 1999 are primarily intended to obtain unidimensional scales with interval level measurement properties. Unfortunately, as pointed out by Thissen et al. (1986) and Santor and Ramsay (1998), one of the most useful properties of such methods, namely, their ability to detect and quantify a particular kind of item bias called Differential Item Functioning (DIF), is often ignored. DIF violates the assumption of local independence which requires that items' measurement properties should not be affected by external variables such as gender or culture. As such, DIF is evidenced by the fact that equally depressed men and women respond systematically differently to the same item. Although formal methods to determine the presence of differential item functioning are discussed in a later section, we point out that DIF is also often accompanied by psychometric anomalies (Tanzer, 1996) and that it may produce “phantom” (i.e. actually non-existing) factors in factor analysis (Lange, Irwin, & Houran, 2000).

Because Nolen-Hoeksema's (1987) response set hypothesis entails that women are more ruminative than equally depressed men, it follows that measures of depression and rumination should show DIF (cf. Santor & Ramsay, 1998). There is indirect evidence however that gender is not the only source of DIF in depression related measures and that cultural factors may play a role as well. For instance, Thompson, Kaslow, Weiss, and Nolen-Hoeksema (1998) administered a revised version of the Children's Attributional Style Questionnaire (CASQ) to large samples of African American and Caucasian American girls and boys. Suggestive of cultural DIF, the psychometric properties of the CASQ varied by race since the Caucasian data showed higher internal consistency. To address issues related to cultural DIF we compare samples of North American and Australian respondents. Differential item and test functioning in depression due to age have recently been reported as well (Christensen et al., 1999). We judged, however, that our samples contained too few older respondents to address this factor in a satisfactory manner.

The strong response set hypothesis will be tested in two studies. To address the possible effects of cultural DIF, Study I uses Australian respondents, while Study II is based on North American respondents. Both studies combine the data sets of projects in which Thalbourne's Manic-Depressiveness Scale (MDS) was administered for purposes not related to the present research. The MDS addresses history of manic symptoms as well as history of depression (Thalbourne, Delin, & Bassett, 1994). While relatively new, the MDS has profitably been used in clinical reserach (for a review see Thalbourne, Keough, & Crawley, 1999) and two preliminary validation studies have been reported (Lester, 1999, Thalbourne and Bassett, 1998). The MDS consists of 18 True/False type questions which were largely derived from the clinical criteria listed in the DSM-III and DSM-III-R. Table 1 lists twelve of the MDS questions in abbreviated form. The remaining questions are listed in the Method section to Study I.

Note that Item 1 (“worried about being poor”) corresponds most closely to Nolen-Hoeksema's notion of negative rumination, and we predict therefore that women are more likely to endorse this item than equally manic-depressive men. A test of this hypothesis requires that the remaining MDS items contain no gender DIF as such DIF may combine to produce bias at the test level, thereby making it impossible to identify equally manic-depressive respondents. Bias at the test level is customarily referred to as Differential Test Functioning (DTF). When tests consist of many nearly equivalent items (e.g. as in educational testing), DTF can be removed simply by discarding the items showing DIF. However, this approach is not feasible for shorter instruments like the MDS as too few items would remain. For this reason we use Rasch (1960) scaling as this allows DTF to be neutralized by equating differentially functioning items across genders or cultures based on an unbiased, i.e. DIF-free, baseline set of items. Where the context allows, we use the terms DIF and DTF as synonymous with bias.

Section snippets

Participants

We combined the results of eight different research projects conducted at Adelaide University, Australia, into a single data set consisting of 742 women and 369 men. The average age of these 1111 individuals was 27.7 years (SD=8.51, Median=22, range=17–74 years). Most respondents were undergraduate students who volunteered their participation. We estimate that at most 1% of the respondents were non-Australian.

Materials

The Manic-Depressiveness Scale (MDS) consists of 18 True/False items, twelve of which

Study II

To replicate our basic findings, while simultaneously addressing the possible effects of cultural differences on the expression of depressive symptoms, we also analyzed a data set of responses obtained from volunteer student participants at Stockton College, USA. Because the data had originally been gathered for different purposes, it was not possible to screen out non-US respondents and the sample probably contained a small number of foreign students as well. Again, the basic hypothesis is

Conclusions

Consistent with Lester's (1999) findings, six of the manic items of the MDS did not survive the top-down item purification process due to severe item misfit and low internal consistency. In addition, several of the twelve remaining items showed statistically significant differential item functioning. However, an unbiased baseline set of items could be identified and this allowed the “splitting” of the remaining to arrive at a Rasch scale that is neutral with respect to gender and culture. This

Acknowledgements

We like to thank Ben Wright and Anne Sustik, as well as the attendees of the Midwestern Objective Measurement Seminar, held in the MESA Institute, University of Chicago, June 4 1999, for their valuable suggestions concerning this research.

References (51)

  • L.D. Butler et al.

    Gender differences in responses to depressed mood in a college sample

    Sex Roles

    (1994)
  • J.E. Carter et al.

    A latent trait analysis of the MMPI

    Multivariate Behavior Research Monographs

    (1984)
  • H. Christensen et al.

    Age differences in depression and anxiety symptoms: a structural equation modelling analysis of data from a general population sample

    Psychological Medicine

    (1999)
  • B.E. Clauser et al.

    Using statistical procedures to identify differentially functioning test items

    Educational Measurement: Issues and Practice

    (1998)
  • D.J. Cooke et al.

    An item response theory analysis of the Hare Psychopathology Checklist—Revised

    Psychological Assessment

    (1997)
  • S.E. Embretson

    The new rules of measurement

    Psychological Assessment

    (1996)
  • J.F. Greden et al.

    Negative schizophrenic symptoms: pathophysiology and clinical implications

    (1991)
  • J. Hattie

    Methodology review: assessing unidimensionality of tests and items

    Applied Psychological Measurement

    (1985)
  • P.C. Kendall et al.

    Issues and recommendations regarding the use of the Beck Depression Inventory

    Cognitive Therapy and Research

    (1987)
  • D. Lester

    Comment on “Manic-depressiveness and its correlates”

    Psychological Reports

    (1999)
  • J.M. Linacre et al.

    A user's guide to Winsteps, Bigsteps, Ministep Rasch-Model computer programs

    (1998)
  • S. Lyobomirsky et al.

    Self-perpetuating properties of dysphoric rumination

    Journal of Social Psychology and Social Psychology

    (1993)
  • R.J. Mislevy et al.

    BILOG 3: item analysis and test scoring with binary logistic models

    (1990)
  • M.J. Muller et al.

    Relationship of psychiatric experience and interrater reliability in assessment of negative symptoms

    Journal of Nervous and Mental Disease

    (1999)
  • R. Nandakumar et al.

    Refinements of Stout's procedure for assessing latent trait unidimensionality

    Journal of Educational Statistics

    (1993)
  • Cited by (37)

    • Gender-based differential item function for the difficulties in emotion regulation scale

      2016, Personality and Individual Differences
      Citation Excerpt :

      When a measure contains DIF, it can cause problems, both for the validity of the measurement and in the interpretation of results that employ the measurement (Clauser & Mazor, 1998). Recent attention has been given to potential gender-related DIF in measurement relevant to emotional experiences, such as the Anxiety Sensitivity Index (Van Dam, Earleywine, & Forsyth, 2009), the Center for Epidemiological Studies, Depression Scale (Cole, Kawachi, Maller, & Berkman, 2000; Covic, Pallant, Conaghan, & Tennant, 2007), the Thalbourne Manic-Depressiveness Scale (Lange, Thalbourne, Houran, & Lester, 2002) and the BPD criteria (Sharp et al., 2014). Additionally, other work has examined gender-based DIF in other psychological measurement, including the Brief Fear of Negative Evaluation Scale (Harpole et al., 2014), and the Multidimensional Personality Questionnaire Stress Reaction Scale (Smith & Reise, 1998).

    • Performance of the 6-item Kessler scale for measuring serious mental illness in Hong Kong

      2012, Comprehensive Psychiatry
      Citation Excerpt :

      Further analysis of the severity profile of each item endorsed by our respondents, such as through IRT analysis, should shed light on whether these findings are related to the differential sensitivity of item category across respondents who varied in symptom severity [51]. Thus, sex bias has been found in the reporting of functional disability [52] and depressive symptom severity [53]. It is also possible that respondents with more severe SMI would endorse more severe symptom categories differently from those with milder symptoms.

    • 5 Differential Item Functioning and Item Bias

      2006, Handbook of Statistics
      Citation Excerpt :

      DIF has become an integral part of test validation, being included among the Standards (Standard 7.3), and is now a key component of validity studies in virtually all large-scale assessments. In addition, the study of DIF is being extended to other fields that utilize constructed variables, such as psychology (Bolt et al., 2004; Dodeen and Johanson, 2003; Lange et al., 2002) and the health sciences (Gelin et al., 2004; Gelin and Zumbo, 2003; Iwata et al., 2002; Kahler et al., 2003; Panter and Reeve, 2002). In these areas, the assessment of DIF is not only a concern for selection bias, but also for ensuring the validity of research examining between-group differences on traits measured with constructed variables.

    View all citing articles on Scopus
    View full text