Depressive Response Sets due to gender and culture-based Differential Item Functioning
Introduction
The available literature agrees that women are almost twice more likely than men to report protracted sadness, apathy, low self-esteem and other symptoms that are indicative of depression. Yet, there continues to be less than perfect agreement on how men and women differ with respect to their expression of depressive symptoms (Nolen-Hoeksema, 1987, Nolen-Hoeksema, 1994, Santor and Ramsay, 1998). For instance, based on factor analytic techniques Williamson (1998) identified social withdrawal, indecisiveness, and irritability as characteristic of depressed men. Wilhelm and Hadzi-Pavlovic (1997) suggest that women are more likely to report anxiety, but this hypothesis disagrees with Breslau, Schultz, and Peterson's (1996) finding that anxiety disorders predict the occurrence of depressive disorders for men and women alike. Given this conflicting picture, it is understandable that “negative symptoms” (i.e. symptoms of depression) are often difficult to identify in psychiatric practice (Greden and Tandon, 1991, Muller and Davids, 1999). This problem is exacerbated by the fact that depressive symptoms may vary with the nature of patients' psychiatric illnesses (Gibbons, Clark, Cananaugh, & Davis, 1985).
A coherent framework to address the expression of depressive symptoms is provided by Nolen-Hoeksema's (1987) “response set” explanation according to which women tend to show passive ruminating responses, whereas men actively use distraction to cut off depression before it ramifies. The response set explanation has received consistent empirical support in descriptive research (e.g. Butler & Nolan-Hoeksema, 1994) as well as in laboratory research (e.g. Lyobomirsky & Nolen-Hoeksema, 1993). The detrimental effects of negative rumination are further demonstrated in an experiment showing that the level of depression in mildly-to-moderately depressed subjects increased when they were induced to ruminate, while depression decreased when distractive behaviors were induced (Nolen-Hoeksema & Morrow, 1994). Despite such findings, the possibility remains that in daily life similar levels of depression naturally produce similar negative ruminative behaviors in men and women. Thus, the greater rumination exhibited by women might be a direct consequence of their more frequent depression, thereby implying that gender differences merely serve to moderate the expression of depressive response sets.
This paper tests the “strong” response set hypothesis according to which women express depression differently than equally depressed men. Although Nolen-Hoeksema's theorizing clearly implies the existence of qualitative gender differences, the response set hypothesis has not been tested in this form. We attribute this to the fact that the stronger version poses technical problems that are difficult to solve within the framework of classical test theory (Thissen, Steinberg, & Gerrard, 1986). In particular, a rigorous test of the strong response set hypothesis requires that depression first should be assessed in a gender-neutral fashion. No bias tests were performed in previous research on depressive symptoms (Butler and Nolen-Hoeksema, 1994, Christensen et al., 1999, Lyobomirsky and Nolen-Hoeksema, 1993, Nolen-Hoeksema, 1994, Nolen-Hoeksema and Morrow, 1994), and it is not clear therefore whether the requirement of gender-neutral measurement was met. The following describes how the strong version of Nolen-Hoeksema's basic hypothesis can be tested by removing gender bias within a Rasch (1960) scaling approach.
Most applications of Rasch scaling, as well as related Item Response Theory methods, in clinical and personality testing (e.g. Birenbaum, 1986, Carter and Wilkinson, 1984, Cooke and Michie, 1997, Gibbons et al., 1985, Lange and Houran, 1999 are primarily intended to obtain unidimensional scales with interval level measurement properties. Unfortunately, as pointed out by Thissen et al. (1986) and Santor and Ramsay (1998), one of the most useful properties of such methods, namely, their ability to detect and quantify a particular kind of item bias called Differential Item Functioning (DIF), is often ignored. DIF violates the assumption of local independence which requires that items' measurement properties should not be affected by external variables such as gender or culture. As such, DIF is evidenced by the fact that equally depressed men and women respond systematically differently to the same item. Although formal methods to determine the presence of differential item functioning are discussed in a later section, we point out that DIF is also often accompanied by psychometric anomalies (Tanzer, 1996) and that it may produce “phantom” (i.e. actually non-existing) factors in factor analysis (Lange, Irwin, & Houran, 2000).
Because Nolen-Hoeksema's (1987) response set hypothesis entails that women are more ruminative than equally depressed men, it follows that measures of depression and rumination should show DIF (cf. Santor & Ramsay, 1998). There is indirect evidence however that gender is not the only source of DIF in depression related measures and that cultural factors may play a role as well. For instance, Thompson, Kaslow, Weiss, and Nolen-Hoeksema (1998) administered a revised version of the Children's Attributional Style Questionnaire (CASQ) to large samples of African American and Caucasian American girls and boys. Suggestive of cultural DIF, the psychometric properties of the CASQ varied by race since the Caucasian data showed higher internal consistency. To address issues related to cultural DIF we compare samples of North American and Australian respondents. Differential item and test functioning in depression due to age have recently been reported as well (Christensen et al., 1999). We judged, however, that our samples contained too few older respondents to address this factor in a satisfactory manner.
The strong response set hypothesis will be tested in two studies. To address the possible effects of cultural DIF, Study I uses Australian respondents, while Study II is based on North American respondents. Both studies combine the data sets of projects in which Thalbourne's Manic-Depressiveness Scale (MDS) was administered for purposes not related to the present research. The MDS addresses history of manic symptoms as well as history of depression (Thalbourne, Delin, & Bassett, 1994). While relatively new, the MDS has profitably been used in clinical reserach (for a review see Thalbourne, Keough, & Crawley, 1999) and two preliminary validation studies have been reported (Lester, 1999, Thalbourne and Bassett, 1998). The MDS consists of 18 True/False type questions which were largely derived from the clinical criteria listed in the DSM-III and DSM-III-R. Table 1 lists twelve of the MDS questions in abbreviated form. The remaining questions are listed in the Method section to Study I.
Note that Item 1 (“worried about being poor”) corresponds most closely to Nolen-Hoeksema's notion of negative rumination, and we predict therefore that women are more likely to endorse this item than equally manic-depressive men. A test of this hypothesis requires that the remaining MDS items contain no gender DIF as such DIF may combine to produce bias at the test level, thereby making it impossible to identify equally manic-depressive respondents. Bias at the test level is customarily referred to as Differential Test Functioning (DTF). When tests consist of many nearly equivalent items (e.g. as in educational testing), DTF can be removed simply by discarding the items showing DIF. However, this approach is not feasible for shorter instruments like the MDS as too few items would remain. For this reason we use Rasch (1960) scaling as this allows DTF to be neutralized by equating differentially functioning items across genders or cultures based on an unbiased, i.e. DIF-free, baseline set of items. Where the context allows, we use the terms DIF and DTF as synonymous with bias.
Section snippets
Participants
We combined the results of eight different research projects conducted at Adelaide University, Australia, into a single data set consisting of 742 women and 369 men. The average age of these 1111 individuals was 27.7 years (SD=8.51, Median=22, range=17–74 years). Most respondents were undergraduate students who volunteered their participation. We estimate that at most 1% of the respondents were non-Australian.
Materials
The Manic-Depressiveness Scale (MDS) consists of 18 True/False items, twelve of which
Study II
To replicate our basic findings, while simultaneously addressing the possible effects of cultural differences on the expression of depressive symptoms, we also analyzed a data set of responses obtained from volunteer student participants at Stockton College, USA. Because the data had originally been gathered for different purposes, it was not possible to screen out non-US respondents and the sample probably contained a small number of foreign students as well. Again, the basic hypothesis is
Conclusions
Consistent with Lester's (1999) findings, six of the manic items of the MDS did not survive the top-down item purification process due to severe item misfit and low internal consistency. In addition, several of the twelve remaining items showed statistically significant differential item functioning. However, an unbiased baseline set of items could be identified and this allowed the “splitting” of the remaining to arrive at a Rasch scale that is neutral with respect to gender and culture. This
Acknowledgements
We like to thank Ben Wright and Anne Sustik, as well as the attendees of the Midwestern Objective Measurement Seminar, held in the MESA Institute, University of Chicago, June 4 1999, for their valuable suggestions concerning this research.
References (51)
A typical depression with hypomanic symptoms
Journal of Affective Disorders
(2001)- et al.
The responsiveness of the Hamilton Depression Rating Scale
Journal of Psychiatric Research
(2000) - et al.
Application of modern psychometric theory in psychometric research
Journal of Psychiatric Research
(1985) - et al.
Scaling MacDonald's AT-20 using item-response theory
Personality and Individual Differences
(1999) - et al.
Top-down purification of Tobacyk's Revised Paranormal Belief Scale
Personality and Individual Differences
(2000) - et al.
The structure and stability of the functional independence measure
Archives of Physical Medicine and Rehabilitation
(1994) Diagnostic and statistical manual of mental disorders
(1994)Effect of dissimulation motives and anxiety on response pattern appropriateness
Applied Psychological Measurement
(1986)Comments on the O'Neil and McPeek paper
- et al.
Sex differences in depression: a role for preexisting anxiety
Psychiatric Research
(1996)
Gender differences in responses to depressed mood in a college sample
Sex Roles
A latent trait analysis of the MMPI
Multivariate Behavior Research Monographs
Age differences in depression and anxiety symptoms: a structural equation modelling analysis of data from a general population sample
Psychological Medicine
Using statistical procedures to identify differentially functioning test items
Educational Measurement: Issues and Practice
An item response theory analysis of the Hare Psychopathology Checklist—Revised
Psychological Assessment
The new rules of measurement
Psychological Assessment
Negative schizophrenic symptoms: pathophysiology and clinical implications
Methodology review: assessing unidimensionality of tests and items
Applied Psychological Measurement
Issues and recommendations regarding the use of the Beck Depression Inventory
Cognitive Therapy and Research
Comment on “Manic-depressiveness and its correlates”
Psychological Reports
A user's guide to Winsteps, Bigsteps, Ministep Rasch-Model computer programs
Self-perpetuating properties of dysphoric rumination
Journal of Social Psychology and Social Psychology
BILOG 3: item analysis and test scoring with binary logistic models
Relationship of psychiatric experience and interrater reliability in assessment of negative symptoms
Journal of Nervous and Mental Disease
Refinements of Stout's procedure for assessing latent trait unidimensionality
Journal of Educational Statistics
Cited by (37)
Gender-based differential item function for the difficulties in emotion regulation scale
2016, Personality and Individual DifferencesCitation Excerpt :When a measure contains DIF, it can cause problems, both for the validity of the measurement and in the interpretation of results that employ the measurement (Clauser & Mazor, 1998). Recent attention has been given to potential gender-related DIF in measurement relevant to emotional experiences, such as the Anxiety Sensitivity Index (Van Dam, Earleywine, & Forsyth, 2009), the Center for Epidemiological Studies, Depression Scale (Cole, Kawachi, Maller, & Berkman, 2000; Covic, Pallant, Conaghan, & Tennant, 2007), the Thalbourne Manic-Depressiveness Scale (Lange, Thalbourne, Houran, & Lester, 2002) and the BPD criteria (Sharp et al., 2014). Additionally, other work has examined gender-based DIF in other psychological measurement, including the Brief Fear of Negative Evaluation Scale (Harpole et al., 2014), and the Multidimensional Personality Questionnaire Stress Reaction Scale (Smith & Reise, 1998).
Performance of the 6-item Kessler scale for measuring serious mental illness in Hong Kong
2012, Comprehensive PsychiatryCitation Excerpt :Further analysis of the severity profile of each item endorsed by our respondents, such as through IRT analysis, should shed light on whether these findings are related to the differential sensitivity of item category across respondents who varied in symptom severity [51]. Thus, sex bias has been found in the reporting of functional disability [52] and depressive symptom severity [53]. It is also possible that respondents with more severe SMI would endorse more severe symptom categories differently from those with milder symptoms.
Evaluation of the "Consultation and Relational Empathy" (CARE) measure by means of Rasch-analysis at the example of cancer patients
2011, Patient Education and CounselingScreening for depression: Rasch analysis of the dimensional structure of the PHQ-9 and the HADS-D
2010, Journal of Affective DisordersChat-up lines as male displays: Effects of content, sex, and personality
2007, Personality and Individual Differences5 Differential Item Functioning and Item Bias
2006, Handbook of StatisticsCitation Excerpt :DIF has become an integral part of test validation, being included among the Standards (Standard 7.3), and is now a key component of validity studies in virtually all large-scale assessments. In addition, the study of DIF is being extended to other fields that utilize constructed variables, such as psychology (Bolt et al., 2004; Dodeen and Johanson, 2003; Lange et al., 2002) and the health sciences (Gelin et al., 2004; Gelin and Zumbo, 2003; Iwata et al., 2002; Kahler et al., 2003; Panter and Reeve, 2002). In these areas, the assessment of DIF is not only a concern for selection bias, but also for ensuring the validity of research examining between-group differences on traits measured with constructed variables.