Elsevier

Psychoneuroendocrinology

Volume 30, Issue 7, August 2005, Pages 698-714
Psychoneuroendocrinology

Estimating between- and within-individual variation in cortisol levels using multilevel models

https://doi.org/10.1016/j.psyneuen.2005.03.002Get rights and content

Summary

Cortisol measures often are used to examine variation in hypothalamic–pituitary–adrenal axis (HPA) activity as well as broader patterns of differential health. However, substantial within-individual variation renders single cortisol measurements unreliable as estimates for probing differences between individuals and groups. A standard practice to clarify between-individual differences involves collecting multiple samples from each participant and then deriving person-specific averages. By ignoring information about variation at between- and within-individual levels, this technique impedes cross-study comparison of results, ignores data useful for future study design, and hinders the analysis of cross-level interactions. This report describes how multilevel approaches can simultaneously model between- and within-individual variation in diurnal cortisol levels without using crude averages. We apply these models to data from children in Nepal (n=29, 11–15 samples per child), Mongolia (n=47, 8–12 samples per child) and the US (n=1269, 1–6 samples per child). Using the Nepal data, we show how an analysis of crude time-adjusted aggregates does not detect an association between aggressive behavior and cortisol levels, while a multilevel analysis does. More importantly, we argue that the ‘roadmap’ to variation generated by these multilevel models provides meaningful information about the predictive accuracy—not just statistical significance—of relationships between cortisol levels and individual-level variables, such as psychopathology, age, and gender. The ‘roadmap’ also facilitates comparison between the results from different studies and estimation of the necessary number of cortisol measurements for future investigations.

Introduction

As an essential and measurable product of the HPA system, cortisol has become a key variable in studies of human response to the environment (Flinn and England, 1997, Gunnar, 2001, Kirschbaum and Hellhammer, 2000, Pollard, 1995, Worthman, 1999). Such studies have established the existence of stable between-individual differences in several aspects of cortisol secretion, including overall mean or basal levels (Kirschbaum et al., 1990), diurnal trends (Smyth et al., 1997, Stone et al., 2001), and response to stressors (Kirschbaum et al., 1998) or morning awakening (Kudielka and Kirschbaum, 2003, Wust et al., 2000). Between-individual differences in these aspects of cortisol secretion have been associated with a variety of health measures, including post-traumatic stress disorder (Carrion et al., 2002, Glover and Poland, 2002, Yehuda et al., 2000) and other psychiatric conditions (Goodyer et al., 1996, Goodyer et al., 2001, Harris et al., 2000, Weber et al., 2000), physical illness (Flinn and England, 1997, Heim et al., 2000), and psychosocial functioning (Adam and Gunnar, 2001, Decker, 2000, Kiecolt-Glaser et al., 1997, Koertge et al., 2002, Melamed et al., 1999, Nicolson and Van Diest, 2000, Pruessner et al., 1999, Schulz et al., 1998, Van Eck et al., 1996).

Consequently, comparisons involving cortisol levels assume that individuals differ in their patterns of cortisol secretion and that these differences exhibit some stability over time. However, momentary assessments of cortisol depend on a number of factors, including whether the person recently consumed food or caffeine, the time of day at which the measurement was taken, the person's general basal cortisol level, whether infectious or inflammatory processes are active, or whether the person was currently anticipating a stressful situation (Pollard, 1995). The measurement can also reflect error due to the processes involved in extracting, storing and analyzing samples. These factors can be divided into three rough categories—between-individual differences, within-individual variation, and measurement error.

Such complexities present well-recognized challenges to study design and interpretation of cortisol data. The relative contributions of each of these three sources of variation—individual difference, acute effects, and method error—cannot be parsed by a single measure from each member of a population. Collecting multiple measurements from each individual in a sample allows comparison of the amount of variation that exists between individual means relative to the amount of variation observed within each individual. If variation in individual means is high relative to within-individual variation then we can be confident that a single measure of cortisol reflects stable between-individual differences in tonic levels (see Fig. 1). If, however, within-individual variation is high relative to between-individual variation, then any single measure of cortisol tells us less about an individual's tonic levels and more about the situation in which the cortisol measurement was taken.1

A clear understanding of these sources of variation prevents incorrectly interpreting within-individual variation as a signal of between-individual difference. Conversely, it avoids the erroneous conclusion that all variations in cortisol measurements is a result of within-individual variation or measurement error, and that we can say nothing reliable about individual differences.

Researchers aiming to assess between-individual differences in HPA activity have designed studies to control (experimentally and/or statistically) for the known and unknown situational factors that can generate within-individual variation. A standard practice in research conducted in naturalistic settings is to collect repeated observations on each individual to estimate reliable individual cortisol level means or medians (Decker, 2000, Dettling et al., 2000, Fisher et al., 2000, Flinn and England, 1997, Harris et al., 2000, Koertge et al., 2002, McBurnett et al., 2000, Watamura et al., 2002, Weber et al., 2000, Wolf et al., 2002, Wust et al., 2000). However, simply analyzing aggregates removes information about within-individual variation which can be used (1) to improve interpretations of current results, (2) to inform future study design, and (3) to explore interactions across levels of analysis.

In this paper, we describe how a class of models, alternatively referred to as multilevel (Goldstein, 2003, Snijders and Bosker, 2002), mixed (Littell et al., 1996, Singer, 1998, Verbeke and Molenberghs, 2000), or hierarchical linear models (Raudenbush and Bryk, 2002), can be used to analyze simultaneously within- and between-individual variation in cortisol measures, and thereby avoid the drawbacks of analyzing crude means. We identify issues specific to modeling cortisol data, including diurnal changes in mean and variance, and suggest specific and broadly applicable solutions to resolve these issues. We apply this approach to illustrate how a multilevel analysis detects an association between aggressive behavior and cortisol levels, while an analysis of crude means does not. Finally, we show how simple statistics provided by multilevel models can inform the number and organization of cortisol measurements for future studies. Analyses of cortisol measurements collected from populations in Nepal, Mongolia and the US are used for illustrative purposes.

For simplicity, we will only describe the application of multilevel models to the estimation of between-person differences in the means and diurnal slopes of momentary cortisol assessments (e.g. salivary and plasma) taken in naturalistic settings. However, this approach would provide similar benefits when applied to cortisol responsivity to acute challenges, and more broadly to other physiological indicators that exhibit significant within-individual variation.

The use and report of results based on averages has drawbacks for analysis, for the interpretation of results, for testing assumptions about between-individual differences, and for the design of future studies.

First, although an averaged measure is generally a more reliable indicator of between-individual difference than any of its single summands (Epstein, 1986, Pruessner et al., 1997), if reliability statistics are not reported we have no way of establishing how much of an improvement is yielded by any specific aggregation (Decker, 2000, Harris et al., 2000).2 Furthermore, such reports are necessary to determine how much of the variation in cortisol levels we can expect to model with between-individual variables. For example, if between-individual differences account for only 30% of the variance in estimated cortisol levels, then the correlation of these estimated cortisol levels with any between-individual trait would necessarily be limited to a maximum of 0.30. This is an important fact to consider when interpreting the significance and magnitude of results, comparing results across studies, and examining the fit of models of between-individual differences.

More broadly, the lack of good information on the general reliability of cortisol measures impedes future study design. As Gunnar (2001) has noted, “the true association between HPA axis activity and behavioral dispositions probably requires aggregation across many days. Unfortunately, no studies are available (especially with children) to help the researcher determine, a priori, how many days of sampling are needed.” In reality, any past study where repeated cortisol measures were collected can provide such information, so long as the requisite statistics are reported. Regrettably, very few published reports provide descriptive statistics indicating how cortisol measurements are correlated within individuals, and fewer still describe how these within-individual correlations may differ across varying time intervals and at different times of day (Coste et al., 1994, Kirschbaum et al., 1990).

Finally, measurement aggregation also ignores the possibility that a variable may have different effects at different times of day. For example, one must disaggregate measurements to examine whether depression is related to higher morning cortisol levels, but not necessarily higher evening levels (Harris et al., 2000). This limitation extends to all cross-level interactions of within-individual and between-individual variables. For example, using crude aggregates, it would be difficult to assess whether the effect of naturalistic anticipation of a stressor on cortisol levels is different for females versus males.

The apparently disparate problems generated by analyzing crude aggregates derive from a common source—inadequate examination of the sources of variation. Multilevel models provide a powerful means to model data simultaneously at the levels of the moment and the individual, to estimate variation at each of these levels, and to see how known variables predict the variation at these different levels (Goldstein, 2003, Singer and Willett, 2003, Snijders and Bosker, 2002). They also offer an improvement over repeated-measures ANOVA models, which have been used in the past to model repeated cortisol measures, because they do not require that data be completely balanced (that each individual has the same number of observations), that observations be regularly spaced in time, or that all observations be present (Goldstein, 2003).

Several researchers have applied multilevel, mixed or hierarchical linear models to analyze cortisol data. Kirschbaum and others (Kirschbaum et al., 1990, Shirtcliff et al., in press) have used latent state-trait models to assess trait levels of cortisol. Stone et al. (2001) have used random coefficient regressions to model the diurnal cycle of cortisol in a number of samples, Smyth et al. (1998) and Van Eck et al. (1996) have used mixed models to examine sources of both within- and between-individual variation, and Adam and Gunnar (2001) have used hierarchical linear models to estimate individual differences in diurnal cortisol cycles. We build on these previous approaches to develop a general multilevel framework for analyzing cortisol that addresses issues specific to cortisol secretion, including diurnal variation in means and the possibility of within-day correlations. We further propose guidelines for reporting the results of these models that would provide more comparability of effects across studies and would better inform the design of future collection protocols.

In a traditional linear regression represented by LNCORT=β0+β1×TIME+ε, the natural log of cortisol, LNCORT, varies systematically with time of day (TIME), while the variation that cannot be accounted for by TIME is captured with a random variable, ε. In this case β1 captures the population's average diurnal slope, which is typically negative over the day (see Fig. 2, Population Model (A)). In traditional linear regressions, the values of ε are expected to be normally distributed with a mean of 0. This extra unmodeled variation can arise from a number of sources including measurement error, situational factors, and stable differences between individuals. With the appropriate data, multilevel models allow the researcher to partition this random variation into conceptually distinct parts. For example, consider cortisol data collected on 30 individuals at five points during 1 day, where LNCORTij represents the jth measurement on the ith person. The following model describes these data:LNCORTij=β0+β1×TIME+bi+εij

As in the last model, LNCORTij is modeled by time of day. What distinguishes this model is that random variation is partitioned into two components. Specifically, the random variable bi has a unique value for each person and captures the deviation of the ith individual's mean from the population mean (accounting for time of day), while εij captures the deviation of single cortisol observations from the ith individual's overall mean. In this model, the mean for the ith individual (β0+bi) is less interesting than how much of the total random variance (Var(LNCORTij|TIME)=Var(bi)+Var(εij)) can be partitioned into between-individual (Var(bi)) and within-individual (Var(εij)) components.

This is a very simple version of a multilevel model. Multilevel models are often referred to as mixed models or mixed effects models because they include not only fixed effects (i.e. time of day, age, psychiatric diagnosis, gender), but also random effects (i.e. unique individual means, unique individual diurnal slopes). These general models can be used to analyze a wide range of phenomena, including students' school achievement (Snijders and Bosker, 2002), changes in the severity of addiction in response to treatment programs (Wallace and Green, 2002), adolescent growth in height (Goldstein and Woodhouse, 2001), and interviewer effects (O'Muircheartaigh and Campanelli, 1998). Each phenomenon presents a new modeling challenge with different sets of assumptions about covariation between observations. Fortunately, the behavior of cortisol secretion in humans is richly documented, enabling more confident formulation of assumptions made for model building. Specifically, several key issues require consideration when modeling cortisol observations across the day:

  • 1.

    First, mean values show clear diurnal change. Researchers have applied a number of techniques to model this diurnal pattern—splines (Van Eck et al., 1996), polynomials (Decker, 2000), linear slopes (Smyth et al., 1998, Stone et al., 2001), time-of-day specific means or medians (Decker, 2000, Flinn and England, 1997, Knutsson et al., 1997).

  • 2.

    Second, correlations between observations can depend on the time interval between observations as well as the time of day at which observations were taken. For example, within-individual correlations may be greater within a day than across days, or they may be greater in the morning than the evening.

  • 3.

    Third, the shape of diurnal cycles may vary among individuals (Adam and Gunnar, 2001, Smyth et al., 1997, Stone et al., 2001).

To account for these observations and by illustration, we describe the stepwise construction of a model for cortisol over the day, beginning with a simple population mean model and then adding successive layers of complexity. This includes consecutively (1) allowing individual means to differ, (2) considering the possibility that an individual's measurements correlate more within days than between days, and (3) allowing diurnal slopes to vary between individuals. We then use the resultant underlying multilevel model to assess how fixed effects, such as psychopathology, group status, or gender, account for between-individual variation. We show how this approach provides more comparable estimates of effect and more powerful statistical tests than that provided by crude aggregation. Using information from these models, we also describe how to estimate the appropriate number of cortisol measurements in future studies which seek to examine between-individual differences in means or diurnal slopes.

A key statistic when considering the relative proportion of within- and between-individual variation is the expected correlation among measurements from the same individual. This statistic is often referred to as the intra-class (or intra-unit, intra-individual) correlation coefficient (ICC). Since the ICC assesses the degree of correlation within individuals, it can conversely indicate the degree of difference between individuals. For any sample, the ICC is easily calculated with estimates from multilevel models of both between-individual variance and within-individual variance. Specifically:ICC=τ2τ2+σ2where τ2 is a commonly used symbol for between-individual variance and σ2 for within-individual variance (Singer, 1998, Snijders and Bosker, 2002).3 If the ICC is 0.30 for cortisol measurements taken on a set of individuals every morning over several days, we would expect the correlation between any pair of measurements on the same individual to be about 0.30.

This ICC statistic has several other useful interpretations. For example, it indicates the proportion of total variance (τ2+σ2) attributable to between-individual differences (τ2). If σ2 were 0, for example, then the ICC=1 and all of the variance in cortisol measurements would be attributable to differences between individuals. In this case, we could clearly differentiate people according to cortisol measures, and we would only have to take one measure to get a reliable estimate of this difference. If, however, τ2=0, then we would have ICC=0 and we could infer that very little, if any, of the variation in cortisol measurements is attributable to between-individual differences. In this case it would be difficult to justify characterizing a person as ‘having’ reliably high or low cortisol levels when compared to other individuals. As we will show, cortisol displays small but non-zero ICC across a wide range of studies.

A final interpretation of the ICC is as the reliability statistic in classical measurement theory, in this case for a single assessment of cortisol as a measure of individual difference (Snijders and Bosker, 2002). The higher the ICC the more reliably a single measure of cortisol reflects true between-individual differences. This is different from commonly reported estimates of inter- and intra-assay reliability, typically coefficients of variation (CV), which measure the reliability of momentary cortisol measurements. Rather, the ICC indicates the degree to which momentary cortisol measurements are stable within individuals at different times. Thus, it generally indicates lower reliability than that described by intra- and inter-assay reliability statistics. In addition, since coefficients of variation measure random variation, high CVs indicate low reliability while high ICCs, which measure ‘true’ variation, indicate high reliability.

Section snippets

Description of samples and data collection methods

Mongolia study. 47 participants, 3–4 salivary samples per day, 3 days. 8–12 samples per person.

Mongolian boys aged 3.3–10.3 years were recruited for a study of behavioral disorders, ecological risk factors, and endocrine profiles. The study comprised four groups: institutionalized children, urban poor, urban middle class, and semi-rural. The Institutional Review Board of Emory University School of Medicine approved the study. Informed consent was obtained from parents if possible; otherwise

Population-level diurnal model

As shown earlier, one of the simplest models of cortisol measurements collected over the waking day is a linear function:LNCORTij=β0+β1×TIME+εijHere, LNCORTij is the natural log of the jth cortisol measure for the ith individual. We define TIME as hours since waking centered at 6 h post-waking.4 This also means that β0 represents the population mean of cortisol at 6 h after awakening (see Fig. 2, Population Model (A)).

Discussion

Multilevel or mixed effects models provide the researcher with a flexible set of modeling strategies and inferential tools for examining variation in cortisol measures both within and between individuals. In addition to offering a straightforward approach to partitioning variance into within- and between-individual components, these models are flexible enough to consider several levels of variation (i.e. individual, occasion, and measurement) and to examine complex, time-dependent correlations

Conclusion

Multilevel approaches provide the flexibility to model variation in cortisol at multiple levels under a variety of study designs. By partitioning variation, multilevel models permit examination of the degree to which fixed effects, such as psychopathology, social stressors, or mood, explain variance at each of these different levels, a significant improvement over crude assessments of total explained variance. In preserving information about the precision of person-level estimates, they also

Acknowledgements

We greatly appreciate access to cortisol data from the Great Smoky Mountain study collected under the directorship of and in collaboration with E. Jane Costello. We gratefully acknowledge laboratory assistance provided by Katrina Trivers and Linda Cangelose and the field assistance of Holbrook Kohrt and Suren Baigal (Agency of the Prevention and Protection of Children from Abuse and Neglect, Ulaanbaatar, Mongolia) in Mongolia, and Richard Kunz and Indra Rai in Nepal. Finally, we appreciate the

References (73)

  • C. Heim et al.

    The potential role of hypocortisolism in pathophysiology of stress-related bodily disorders

    Psychoneuroendocrinology

    (2000)
  • C. Kirschbaum et al.

    Cortisol and behavior: 2. Application of a latent state-trait model to salivary cortisol

    Psychoneuroendocrinology

    (1990)
  • J. Koertge et al.

    Cortisol and vital exhaustion in relation to significant coronary artery strenosis in middle-aged women with acute coronary syndrome

    Psychoneuroendocrinology

    (2002)
  • B.M. Kudielka et al.

    Awakening cortisol responses are influenced by health status and awakening time but not by menstrual cycle phase

    Psychoneuroendocrinology

    (2003)
  • N. Nicolson et al.

    Salivary cortisol patterns in vital exhaustion

    Journal of Psychosomatic Research

    (2000)
  • J.C. Pruessner et al.

    Increasing correlations between personality traits and cortisol stress responses obtained by data aggregation

    Psychoneuroendocrinology

    (1997)
  • M.P. Roy et al.

    Psychological, cardiovascular, and metabolic correlates of individual differences in cortisol stress recovery in young men

    Psychoneuroendocrinology

    (2001)
  • J.M. Smyth et al.

    Individual differences in the diurnal cycle of cortisol

    Psychoneuroendocrinology

    (1997)
  • J.M. Smyth et al.

    Stressors and mood measured on a momentary basis are associated with salivary cortisol secretion

    Psychoneuroendocrinology

    (1998)
  • A.A. Stone et al.

    Individual differences in the diurnal cycle of salivary free cortisol: a replication of flattened cycles for some individuals

    Psychoneuroendocrinology

    (2001)
  • O.T. Wolf et al.

    Salivary cortisol day profiles in elderly with mild cognitive impairment

    Psychoneuroendocrinology

    (2002)
  • S. Wust et al.

    Genetic factors, perceived chronic stress, and free cortisol response to awakening

    Psychoneuroendocrinology

    (2000)
  • M. Bartels et al.

    Heritability of daytime cortisol levels in children

    Behavior Genetics

    (2003)
  • J. Coste et al.

    Reliability of hormonal levels for assessing the hypothalamic-pituitary-adrenocorticol system in clinical pharmacology

    British Journal of Psychiatry

    (1994)
  • E.J. Costello et al.

    The Great Smoky Mountains study of youth: goals, design, methods, and the prevalence of DSM-III-R disorders

    Archives of General Psychiatry

    (1996)
  • S. Epstein

    Does aggregation produce spuriously high estimates of behavior stability?

    Journal of Personality and Social Psychology

    (1986)
  • M.V. Flinn et al.

    Social economics of childhood glucocorticoid stress response and health

    American Journal of Physical Anthropology

    (1997)
  • H. Goldstein

    Multilevel Statistical Models

    (2003)
  • H. Goldstein et al.

    Modelling repeated measurements

  • I.M. Goodyer et al.

    Adrenal secretion during major depression in 8 to 16 year olds. I: altered diurnal rhythms in salivary cortisol and dehydroepiandrosterone (DHEA) at presentation

    Psychological Medicine

    (1996)
  • I.M. Goodyer et al.

    Possible role of cortisol and dehydroepiandrosterone in human development and psychopathology

    British Journal of Psychiatry

    (2001)
  • M.R. Gunnar

    The role of glucocorticoids in anxiety disorders: a critical analysis

  • T.O. Harris et al.

    Morning cortisol as a risk factor for subsequent major depressive disorder in adult women

    British Journal of Psychiatry

    (2000)
  • J.K. Kiecolt-Glaser et al.

    Marital conflict in older adults: endocrinological and immunological correlates

    Psychosomatic Medicine

    (1997)
  • C. Kirschbaum et al.

    Salivary cortisol in psychobiological research: an overview

    Neuropsychobiology

    (1989)
  • C. Kirschbaum et al.

    Salivary cortisol

  • Cited by (0)

    View full text