Introduction

The annual costs of major depressive disorder (MDD) are estimated at 83.1 billion dollars in the United States, with nearly two thirds of this cost arising from functional disability [1]. The costs of MDD are high in part because it takes so long for patients to recover from the illness [2]. Current treatment guidelines recommend that an initial treatment be tried for long enough a period to determine how much it will benefit a patient [3]. On average, at least 4 weeks are needed to attain response, and 6 weeks to attain remission during treatment with an initial selective serotonin reuptake inhibitor (SSRI) antidepressant, but remission can take 12 weeks or longer [4]. Because most patients fail to enter remission with the first antidepressant prescribed [4], they then commonly enter a period of serial trial-and-error, using switches in or combinations of medications [5] and typically taking 1 year or more to hit upon a successful treatment [6, 7]. It is not surprising that using this “hit-or-miss” approach, 26% of those who fail to improve with the first treatment simply stop taking medication, frequently within the first 2 weeks [8], and up to 42% of patients discontinue medication within the first 30 days [9].

What is needed is an improved method of selecting antidepressant medications for individual patients. All medications are thought of as equally effective, but clinicians have sought to personalize selection by targeting groups of patients with specific symptoms with medications that have different putative mechanisms of action (MOAs) (eg, SSRIs vs serotonin-norepinephrine reuptake inhibitors vs bupropion). Although some data suggest that subsets of patients may be more likely to benefit from medications with particular MOAs, the number needed to treat (NNT) to see such differences can be so large (NNT = 27) [10] that the value is questionable. MOA is more routinely considered in clinical decision making for second-line treatment (ie, after a patient fails to benefit from initial SSRI monotherapy). The Sequenced Treatment Alternatives to Relieve Depression (STAR*D) study examined medications with differing MOAs—either singly or in combination—after patients failed to respond to an initial SSRI [4, 11]. The results from this second level of treatment showed numeric superiority for switching to medications with a different MOA and slightly more so for combining treatments with different MOAs, although none of these differences reached statistical significance [11, 12]. The field is thus left with no clear evidence base about how to choose among existing medications to maximize benefit for the individual patient.

Some have argued that existing monotherapies are inherently inadequate and that the central challenge is to develop treatments for MDD with fewer side effects and more rapid onset of action [13]. Our immediate imperative, however, is to be “smarter” in our use of existing agents. We need treatment paradigms for MDD that reliably match patients with the right treatment either before or early in the course of treatment to retain the patient in treatment, minimize disability and suffering, avoid treatment-emergent adverse events such as worsening suicidal ideation [14], and prevent the development of negative attitudes and expectations that may perpetuate poor outcome [2]. This is the impetus for developing a personalized medicine approach based on biomarkers that could predict the likelihood of success with any given treatment. Clinical care would be greatly improved if we had reliable clinical predictors of treatment outcome. Most patients who are going to benefit from our current medications start to experience some improvement within the first 2 weeks of treatment, but early symptom improvement is a nonspecific predictor of benefit. Most early physiologic biomarkers, such as plasma hormone levels or changes in blood pressure during treatment, lacked sensitivity and specificity as predictors. Nevertheless, physiologic biomarkers could, in theory, speed recovery from MDD by matching patients with the treatment most likely to be effective for them as a result of their neurobiologic characteristics. We review here the literature supporting the use of different types of biomarkers before or during treatment to direct selection of an initial or second-line treatment in MDD.

When Should Biomarker Measurements Be Made?

Two complicating factors in biomarker development are the timing and conditions of measurement. The ideal biomarker is measurable at diagnosis to assist in selection of the first treatment. However, pretreatment predictors thus far have identified indicators only of general prognosis, not which specific treatment is likely to benefit a particular patient [2]. While some potential biomarkers (eg, genotype) presumably are stable over time, other biomarkers (eg, measures of gene expression or brain function) may emerge only during treatment [2, 15]. For this reason, much research now is aimed at identifying biomarkers that emerge early in the course of treatment and may indicate whether the medication that the patient is receiving is likely to lead to remission [16••, 17••]. To the extent that such biomarkers are determined by genetic factors and emerge reliably in response to a particular treatment, they may represent “response endophenotypes” [2]. Treatment-emergent biomarkers are not helpful with the initial treatment selection; nevertheless, if a biomarker can be used to change an ineffective treatment to another that is more likely to be effective within a few weeks of treatment initiation, this could still shorten the duration and lower the number of ineffective antidepressant treatment trials [2]. Furthermore, the use of treatment-emergent biomarkers does offer certain advantages. First, the measurement of biomarkers “within patients” likely enhances stability, statistical reliability, and therefore predictive accuracy of the biomarker. Second, the measurement of biomarkers in response to newly administered treatments may help overcome confounds inherent in pretreatment, cross-sectional measures (eg, number and severity of prior episodes, the current phase of illness), and the extent and types of prior and current treatment [15]. Examination of dynamic measures during the current treatment may detect features common across individuals responding to that treatment, regardless of confounding factors [2]. The literature offers limited guidance as to when in the course of treatment such treatment-emergent biomarkers should be measured. For quantitative electroencephalographic (QEEG) measures, it seems that changes in the first week of treatment are predictive [16••, 17••]. For changes in gene expression or brain-derived neurotrophic factor (BDNF) levels, however, the most reliable data are for pre- to post-treatment changes, and the question of how early in treatment changes that might be predictive of outcome could be detected remains unclear.

It is important to note that difficulty in identifying practical biomarkers is not unique to the treatment of depression. Although there are successful biomarkers for disease processes (eg, elevated thyroid-stimulating hormone in hypothyroidism), relatively few biomarkers in clinical medicine are useful for choosing a particular medication treatment. The challenges are particularly great for identifying biomarkers for brain diseases because of the relative inaccessibility of the brain and limited understanding of the basis of basic pathophysiology of disease [18].

Which Biomarkers Appear to Have Clinical Usefulness?

At the present time, no biomarkers have sufficiently proven utility to be ready for clinical application. However, several types of biomarkers show promise for predicting clinical response. The evidence supporting each type of biomarker is considered separately below.

Brain Structural Measures

Several different brain structural measures have demonstrated usefulness as pretreatment predictors of treatment outcome. Recent meta-analyses of structural neuroimaging studies indicate that depressed patients have reduced gray matter in multiple areas, including the anterior cingulate cortex (ACC) [19], subgenual cingulate cortex [20], and hippocampus [21]. The most robust evidence is for the hippocampus, for which larger volumes predicted better response after 8 weeks of pharmacotherapy in two separate samples [22, 23]. Furthermore, in a prospective study, smaller hippocampal volumes were predictive of clinical outcome 3 years later [24]. In another prospective study, larger hippocampal volume was associated with a lower probability of relapse in men at a 2-year follow-up [25]. The predictive utility of structural data is not limited to the hippocampus, as gray matter density in the ACC and posterior cingulate cortex was also predictive of clinical remission following 8 weeks of fluoxetine treatment [26]. Notably, of the studies cited above, only one group has examined the relationship between treatment and brain structure directly by longitudinal assessment [24]. Studies also have been limited by small sample sizes, measurement approaches that used manual delineation of the hippocampus/amygdala, or whole brain voxel-based morphometry methods sensitive to registration and partial volume confounds.

The structural integrity of the fiber tracts between neural areas affected in depression is another source of information that may facilitate prediction of treatment response. Although the application of diffusion tensor imaging to the study of depression in adulthood is relatively novel, evidence from studies of individuals with late-life depression indicates that this imaging technique, which tracks the diffusion properties of water through brain tissue in vivo, has predictive potential for delineating treatment responders. For example, nonresponders to 12 weeks of citalopram [27] or escitalopram treatment [28] showed a greater prevalence of microstructural abnormalities in white matter pathways connecting the cortex with limbic and paralimbic areas such as the anterior cingulate, as estimated using regions-of-interest and voxel-based analysis approaches. These abnormalities may be associated with poorer outcome because they impair mood regulatory interactions between prefrontal and limbic areas [29]. The integrity of these corticolimbic pathways is adversely impacted by adverse life events [30] as well as genetic polymorphisms (eg, 5-HTTLPR) [31]. Although diffusion tensor imaging metrics of fiber integrity may constitute a useful predictor of treatment outcome, there are insufficient data to assess its usefulness. Prior studies also have been limited to measuring scalar metrics such as fractional anisotropy, which reflects the extent to which diffusion is directionally restricted in a voxel in arbitrarily delineated white matter regions. This approach may overlook significant findings in nonstudied regions. Furthermore, the automated voxel-based approaches that have been used at times, while convenient for exploratory analysis, are susceptible to registration confounds that may contribute to regional discrepancies in findings. No study to date has incorporated more refined tractography measurement approaches to better determine whether the structural connectivity of specific white matter tracts predicts treatment response.

Although considerable data link brain structural measurements to treatment outcome, these measures seem to be primarily indicators of general prognosis. Although they may indicate the likelihood that a patient will recover, regardless of the treatment selected, they have not been examined for their usefulness in selecting a particular option from a set of potential treatments for individual patients.

Brain Functional Measures

As an alternative to assessing the structural integrity of brain areas and networks associated with treatment outcome, assessing the functional properties of these circuits may provide a more complete picture of a patient’s biological state before treatment. Two types of functional MRI data have demonstrated the most promise as biomarkers of treatment outcome: 1) intrinsic connectivity analyses performed while the patient is resting in the scanner with eyes closed; and 2) task-related activations, specifically the viewing of negative emotional facial expressions.

The functional connectivity between regions can be assessed by measuring spontaneous low-frequency (typically 0.01–0.1 Hz) fluctuations in resting state blood oxygen level-dependent signal [32], which are phase locked between areas. The correlation in these fluctuations between cortical and limbic areas therefore serves as a functional connectivity measure that reflects functioning in these mood-regulating pathways. Anand and colleagues [33] were the first to use this technique to show that corticolimbic connectivity is decreased in depression. Anand and colleagues [34] also showed that corticolimbic connectivity increased as scores on the Hamilton Depression Rating Scale decreased during treatment, suggesting that assessment of resting state corticolimbic connectivity could be useful for predicting antidepressant treatment response.

An additional method of assessing function within the neural circuitry for emotional processing is reactivity to the processing of negative facial expressions. When viewing negative facial expressions, depressed patients show exaggerated changes in activity in the limbic system, particularly the amygdala, in comparison with healthy controls [35]. Studies that have used baseline neural reactivity to emotional facial expressions as a predictor of treatment response generally have been underpowered, but encouraging signs indicate that this task [36, 37] and analogous tasks [38] may be of utility for prediction. Increased reactivity seems to normalize after successful antidepressant pharmacotherapy [39, 40] and cognitive-behavioral therapy [41], and may be a sufficiently consistent finding to be of eventual clinical utility. When viewing negative emotional faces, depressed patients have greater amygdala activation but also reduced coactivation of the dorsal anterior cingulate and increased coactivation in the subgenual cingulate [42]. In a treatment study, 8 weeks of fluoxetine administration ameliorated the deficient connections between the amygdala and anterior cingulate [43]. These changes in task-related reactivity are complementary to differences between treatment responders and nonresponders in resting state connectivity within corticolimbic circuits [44].

Fluorodeoxyglucose positron emission tomography (PET) scanning has shown some promise as a predictor of response to medication [45]. The number of studies indicating some predictive value for fluorodeoxyglucose PET is encouraging, although results have been inconsistent [45] and confounded by relatively small sample sizes and heterogeneity in treatment techniques, as well as in imaging methods. PET imaging with the serotonin transporter ligand [(11)C]-3-amino-4-(2-dimethylaminomethylphenylsulfanyl)-benzonitrile], or DASB, which binds to the serotonin transporter, has not been shown to predict treatment response in MDD, although patients with depression have low DASB binding potential [46, 47]. DASB binding potential also is associated with the HT2A single-nucleotide polymorphism (SNP) that has been associated with SSRI treatment response in some studies, suggesting that this technique may be worthy of further study [48].

The best-documented brain functional biomarker for predicting antidepressant treatment response is QEEG. QEEG signals are generated by assemblages of neurons in the cortex and deeper structures and as such provide a global measure of brain function [49]. Responders to medication differ from nonresponders in QEEG power, either in the resting state or during simple tasks [50]. Three complementary measures of brain electrical activity have shown significant promise for predicting treatment response: cordance, low-resolution brain electromagnetic tomography (LORETA), and the Antidepressant Treatment Response (ATR) Index. Cordance is a QEEG power measure (calculated from a full scalp electrode array) that is more strongly associated with perfusion of cerebral cortex underlying each electrode than other power measures [51, 52]. Cordance accurately characterizes brain function on the cortical convexities (eg, dorsolateral prefrontal cortex) and has demonstrated usefulness for characterizing medication response [5360]. LORETA extends the topographic capabilities of QEEG, enabling characterization of brain activity not only on the cortical convexities but also in deeper specific cortical areas (eg, the ACC and medial orbitofrontal cortex) [61]. While cordance may in fact reflect activity of areas such as the ACC that is projected to the surface [62], LORETA permits attribution of electrical activity to specific deeper structures. Cordance and LORETA require whole-head electrode montages to collect data, which provide a view of function over all cortical regions; therefore, these techniques are well-suited to exploration of brain function. They share the disadvantage of requiring up to 75 min to record in a QEEG laboratory facility. The ATR Index is a biomarker optimized to predict medication response that is calculated from data collected from a five-electrode montage tightly focused on the frontal regions of the brain, and can be recorded in only 10 min in a general office-based setting [16••, 17••].

Most studies of brain functional biomarkers are of small size and therefore are inadequate to fully assess the utility of the biomarkers. One of the largest studies performed, which examined ATR, is the national multisite study Biomarkers for the Rapid Identification of Treatment Effectiveness in Major Depression (BRITE-MD), which evaluated neurophysiologic and genomic predictors of response and remission in MDD. BRITE-MD enrolled 375 MDD patients and collected comprehensive clinical, neurophysiologic, and genomic data. BRITE-MD developed one of the only predictors of differential response to two antidepressants with different putative MOAs (escitalopram and bupropion) using the ATR Index as a dichotomous predictor [16••, 17••]. A positive ATR biomarker predicted response and remission to escitalopram with 74% overall accuracy, and those with a positive ATR were more than 2.4 times as likely to respond to escitalopram as those with a negative ATR (68% vs 28%; P = 0.001) [16••]. Conversely, those with a negative ATR who were switched to bupropion treatment were 1.9 times as likely to respond to bupropion alone as those who remained on escitalopram treatment (53% vs 28%; P = 0.034) (Figs. 1 and 2) [17••].

Fig. 1
figure 1

Logistic regression model of escitalopram and bupropion responders stratified by Antidepressant Treatment Response (ATR) Index values. ATR values are shown for patients randomly assigned to each treatment and who responded to escitalopram or bupropion treatment. Patients who responded to escitalopram tended to have higher ATR values, and those who responded to bupropion tended to have lower ATR values. Markers represent observed values, and lines represent modeled values

Fig. 2
figure 2

Logistic regression model of escitalopram and bupropion remitters stratified by Antidepressant Treatment Response (ATR) Index values. ATR values are shown for patients randomly assigned to each treatment and who remitted with escitalopram or bupropion treatment. Patients who remitted with escitalopram tended to have higher ATR values, and those who remitted with bupropion tended to have lower ATR values. Markers represent observed values, and lines represent modeled values

Critically, this is one of the only instances in which a biomarker has predicted differential response to two antidepressant medications with distinct MOAs. The NNT to see a benefit from the use of the ATR Index based on these results is 10 to 11, which is within the range proposed for a clinically useful measure [2, 17••]. However, these results must be interpreted with the caveat that treatment was not assigned prospectively on the basis of ATR Index values [2]. It is encouraging that another group independently replicated these findings in a naturalistic study with antidepressants selected by clinician choice [63]. These results are encouraging and warrant further exploration of these two medications with distinct MOAs using a wider range of biomarkers.

Genomic Measures

Pharmacogenetic investigations postulate that responsiveness to or tolerability of treatment may be influenced by inherited factors. Several observational studies suggest an inherited basis for antidepressant treatment outcomes. Despite evidence that pharmacokinetic factors under genetic control are correlated with response and toxicity to tricyclic antidepressants, there has been little evidence to suggest that this is the case with response to or tolerance of the SSRIs [64]. The current widespread use of SSRIs in treatment of depression has resulted in a large body of literature on SSRI pharmacogenetics [65]. Most of these studies focus on candidate genes related to monoamine function, including the serotonin transporter (the molecular target for SSRIs), tryptophan hydroxylase-1, monoamine oxidase A, and the type 2A serotonin receptor. These studies have demonstrated only a few associations with treatment response, most of which have not been consistent.

Several large genome-wide association studies (GWAS) used the STAR*D pharmacogenetic sample. Because the findings have been extensively reviewed previously [66, 67], only the most prominent findings are highlighted here. In an analysis of 763 SNPs in 68 candidate genes, a single SNP in the type 2A serotonin receptor was reported to show statistical association with citalopram response [64, 68]. Several research teams have specifically addressed the association between citalopram treatment response and genetic variation in the serotonin transporter using the STAR*D sample [6870]. Most showed no association to measures of efficacy, while one showed modest association for a particular haplotype in only a subset of the STAR*D sample [70]. Taken together, these studies suggest an overall lack of association between this most obvious of candidate genes and citalopram response. A recent GWAS of drug response, made up of 700 German inpatients from two cohorts treated with a variety of drugs, found that no SNP was significant at a genome-wide level using results from individual (n = 339) or pooled (n = 361) genotyping [71]. A set of 328 SNPs was carried forward and genotyped in 832 Caucasians from STAR*D for replication. While 46 SNPs showed P values <0.05, none withstood multiple-test correction. Garriock and colleagues [72••] performed comprehensive GWAS on the STAR*D sample for response and remission phenotypes and identified 41 and 39 SNPs with principal components ancestry-adjusted P < 0.0001 in remission and response phenotypes, respectively. They found modest levels of association, although nothing at the proclaimed genome-wide significance level (P < 5 × 10−8). The strongest finding for response and remission occurred 51 kb upstream of the gene UBE3C (P = 3.63 × 10−7; additive OR, 1.68), which encodes ubiquitin protein ligase E3C, a gene found to be significantly downregulated in a stress-induced manner in primates in the adult ventromedial prefrontal cortex. More recently, 706 European individuals treated with escitalopram or nortriptyline underwent GWAS, with the strongest finding in the combined sample localizing to a duplicated region of chromosome 1 (P = 3.82 × 10−7). The authors carried out drug-specific analyses and reported a genome-wide significant finding for nortriptyline-treated individuals (P = 3.56 × 10−8) within the uronyl-2-sulfotransferase gene, while they found a suggestive finding in the coding region of the interleukin-11 gene (P = 2.83 × 10−6) for escitalopram-treated individuals. There is little overlap among the three GWAS, suggesting prominent heterogeneity between studies and low power within any single study [73]. The lack of strong, clear, and consistent associations between genetic polymorphisms and treatment response is in some ways not surprising; depression is a complex and heterogeneous disorder, and multiple additive genetic factors are likely to be associated with response. These reports do not offer a comprehensive assessment of the role of variation across the genome or of the potential for multiple additive effects [72••].

Gene expression analysis in neuropsychiatric disorders has been challenging due to the relative inaccessibility of brain tissue. Direct assessment of postmortem brain samples from individuals with MDD has indicated some differences in gene expression that are correlated with disease state [74]. These studies indicate detectable differential gene expression in at least subregions of the central nervous system. To assess expression in living humans, these studies have been extended to the assessment of gene expression via analysis of mRNA from circulating mononuclear cells in whole blood [75, 76]. Support for the analysis of gene expression in peripheral blood comes from the assessment of inter- and intraindividual differences in primate blood and brain. It was recently demonstrated that interindividual gene expression differences can be conserved in primate brains and primate peripheral blood, extending the general model for how blood sampling provides useful information on interindividual expression differences in brain subregions [77]. Limited small studies have demonstrated gene expression changes from leukocyte mRNA in response to antidepressant or lithium treatment in patients with MDD or bipolar disorder [75]. In some small studies, antidepressant treatment tended to normalize gene expression patterns, and the degree of normalization was proportional to the degree of symptom improvement [75, 78, 79]. These data suggest that peripheral expression signatures may be relevant to MDD and that the disease state is reflected in expression changes in the leukocyte transcriptome. However, sample sizes have been limited; therefore, information about the reliability of these findings for predicting treatment response is unknown. Rich sets of data have been generated, however, to study gene expression as quantitative traits controlled by specific DNA variants in mice, humans, and other species. These data demonstrate strongly that cis-regulatory DNA variants control the basal level of gene expression in a variety of tissues, including blood, and contribute to the neurologic phenotypes in the mouse. These expression quantitative trait loci are a powerful resource for complex disease gene mapping, as recently reviewed by Cookson et al. [80]. The promise of this technique suggests it should be pursued for development of possible biomarkers in the future.

Proteomic and Metabolomic Measures

Proteomic and metabolomic biomarkers of treatment response in MDD remain in very early stages of development, and none have demonstrated reliability for predicting treatment response. These potential biomarkers are attractive for further research, however, because of the relatively constrained space of biomarker targets. Still, the field faces several challenges. First among them is the source of materials to be examined. Cerebrospinal fluid might be considered as the source most closely reflective of brain activity, but it is not easily accessible on a routine, risk-free basis that is likely to be acceptable to patients. Urine, although perhaps the most easily collected in humans, is furthest removed from brain function, and the degree to which saliva is reflective of brain function is uncertain given the present state of the field. Thus, plasma appears to be the rational source for proteomic and metabolomic measurements because it is easily accessible, and many small molecules from the brain reach the circulation en route to excretion; several also are exchanged or transported across the blood–brain barrier.

Proteomic investigations in MDD mostly have been performed on cerebrospinal fluid [81] and specific brain regions collected at autopsy [82]. Comprehensive proteomic investigations of plasma from MDD patients have yet to be reported. Furthermore, no clinical study has been directed at identifying protein signatures related to different treatments and responses in MDD. Metabolomic reports, although limited, highlight the opportunities that metabolomic investigations have for research on MDD [83]. Paige and colleagues [84] compared gas chromatography–mass spectrometry profiles of plasma extracts from depressed, remitted, and never-depressed older adults, revealing differences in levels of several fatty acids, glycerol, and γ-aminobutyric acid. Focused investigations of one or a few metabolites in bipolar disorder and MDD have drawn attention to the role of neurotransmitter abnormalities (norepinephrine, dopamine, serotonin, γ-aminobutyric acid, glutamate/glutamine) and lipids, including arachidonic and other fatty acids [83].

Of all serum protein measures, BDNF is the most clearly implicated in the pathogenesis and treatment of MDD [85]. The origins of serum BDNF are unclear, and in animal models, it does cross the blood–brain barrier [86]. There is strong evidence of low serum BDNF in unmedicated MDD patients and recovery of serum BDNF levels after antidepressant therapy [87], including in a large meta-analysis [88]. These findings suggest that BDNF has the potential to be a biomarker of treatment response, although only a few studies have explored changes during treatment in humans [8789].

Conclusions

The development of biomarkers to guide treatment decision making in MDD would offer significant advantages. Several putative biomarkers have been identified that provide information about the general prognosis for recovery from depression and, in some instances, about whether a specific treatment may lead to remission. Several questions must be addressed before biomarkers can be introduced into clinical practice. These include the following:

  1. 1.

    When is the optimal time in the course of treatment to measure such biomarkers to obtain maximum predictive accuracy?

  2. 2.

    Are these biomarkers predictive of differential response (or remission), or instead of prognosis for any treatment that the patient would receive?

  3. 3.

    Do the biomarkers have sufficient sensitivity, specificity, and reproducibility for predicting response and remission that they can be relied upon in clinical practice?

For some putative biomarkers, data suggest that the answers to these questions are favorable and that clinical application may be practical and useful. It is likely that no one biomarker alone will be sufficient to direct clinical treatment decisions, and that a panel of multimodal biomarkers may be necessary to obtain the degree of accuracy necessary for clinical utility. Research should focus on replicating existing findings and examining groups of biomarkers in sufficiently large cohorts of patients with MDD to examine the clinical effectiveness of biomarker-guided treatment.