Elsevier

NeuroImage

Volume 51, Issue 2, June 2010, Pages 752-764
NeuroImage

Sparse logistic regression for whole-brain classification of fMRI data

https://doi.org/10.1016/j.neuroimage.2010.02.040Get rights and content

Abstract

Multivariate pattern recognition methods are increasingly being used to identify multiregional brain activity patterns that collectively discriminate one cognitive condition or experimental group from another, using fMRI data. The performance of these methods is often limited because the number of regions considered in the analysis of fMRI data is large compared to the number of observations (trials or participants). Existing methods that aim to tackle this dimensionality problem are less than optimal because they either over-fit the data or are computationally intractable. Here, we describe a novel method based on logistic regression using a combination of L1 and L2 norm regularization that more accurately estimates discriminative brain regions across multiple conditions or groups. The L1 norm, computed using a fast estimation procedure, ensures a fast, sparse and generalizable solution; the L2 norm ensures that correlated brain regions are included in the resulting solution, a critical aspect of fMRI data analysis often overlooked by existing methods. We first evaluate the performance of our method on simulated data and then examine its effectiveness in discriminating between well-matched music and speech stimuli. We also compared our procedures with other methods which use either L1-norm regularization alone or support vector machine-based feature elimination. On simulated data, our methods performed significantly better than existing methods across a wide range of contrast-to-noise ratios and feature prevalence rates. On experimental fMRI data, our methods were more effective in selectively isolating a distributed fronto-temporal network that distinguished between brain regions known to be involved in speech and music processing. These findings suggest that our method is not only computationally efficient, but it also achieves the twin objectives of identifying relevant discriminative brain regions and accurately classifying fMRI data.

Introduction

Multivariate pattern recognition (MPR) methods are rapidly becoming a popular tool for analyzing fMRI data (Cox and Savoy, 2003, De Martino et al., 2008, Haynes et al., 2007, Kriegeskorte et al., 2006, Mourao-Miranda et al., 2005, Pereira et al., 2009). These methods use fMRI data to detect activity patterns in brain regions that collectively discriminate one cognitive condition or participant group from another. Most fMRI studies that use MPR methods restrict the analysis to specific brain regions of interest (ROI) (Cox and Savoy, 2003, Haynes et al., 2007), however this approach is problematic if the ROIs are not known a priori. In these cases, a data-driven approach that incorporates multiple brain regions is desirable for several reasons. For one, it is possible that no single brain region can accurately discriminate given a set of experimental stimuli, task conditions or participant groups, and simultaneously incorporating multiple brain regions may be necessary to describe the distributed networks sub serving differential brain processes. Therefore, the MPR method used in fMRI data analysis should, ideally, consider activity patterns in all brain regions, and identify the subset of regions that discriminates between experimental conditions in an unbiased manner. Hereafter, we refer to MPR methods that include activity patterns across the entire brain as “whole-brain classifiers.”

Designing a whole-brain classifier presents a number of technical challenges since the number of regions considered in the analysis of fMRI data (“features”) is large compared to the number of observations (trials or participants). Typically, this results in over-fitting of the data, leading to high classification accuracies for data used in designing the classifier, but poor classification accuracies for independent “test” data. Furthermore, a common characteristic of fMRI data is that the number of brain regions involved in a given cognitive task is typically small relative to the total number of brain regions. Selecting the brain regions that are most relevant in discriminating cognitive tasks/condition overcomes the problem of over-fitting and improves the generalization performance of the classifier. Furthermore, identifying these relevant regions is also critical for understanding which brain regions can discriminate between stimulus conditions. Taken together, the problem of whole-brain classification can be distilled to two key problems: (1) feature selection, or selection of only those relevant regions that discriminate between cognitive conditions, and (2) designing a classifier using these selected regions.

The problem of feature selection has been extensively studied by the machine learning community (Kohavi, 1997). The overall goal of feature selection is to identify subsets of features that are most useful in discriminating two or more conditions of interest. Existing methods for feature selection can be grouped in two categories: filter and wrapper (Guyon, 2003, Kohavi, 1997). In the filter strategy, features are selected independent of classification, and the selected features are then used in designing the classifier. The features are ranked based on univariate scores such as correlation or mutual information between a feature and an experimental manipulation. This strategy has been implemented in a number of fMRI studies (Haynes and Rees, 2005, Mitchell et al., 2004, Mourao-Miranda et al., 2006). A limitation of the filter strategy is that this method applies only univariate measures and therefore does not consider the relationships between features while selecting them. This is a major limitation since fMRI data is inherently multivariate, with strong spatial correlation between neighboring voxels. Furthermore, this method does not consider classifier performance in selecting features. In contrast, the wrapper strategy utilizes methods in which features are selected that maximize the performance of the classifier. The selected features are then used in designing the classifier, as in the support vector machine-based recursive feature elimination algorithm (SVM-RFE) developed by Guyon et al. (2002) and Guyon (2003). This method has been applied for feature selection and classification of fMRI data by De Martino et al. (2008). A weakness of this approach is that thresholds used to select features are arbitrary and different datasets may require different settings of thresholds (De Martino et al., 2008).

An alternative strategy was recently proposed to simultaneously address the problem of feature selection and classifier design (Krishnapuram et al., 2005, Tipping, 2001, Zou and Hastie, 2005). In this strategy, feature selection is included as part of the classifier design, ensuring efficient use of data and faster computation time since the classifier does not need to be repeatedly trained during feature selection. In this approach, regularization is used to prevent over-fitting of the data and thereby improve generalizability of the classifier. Regularization-based approaches have been successfully applied to problems such as EEG/MEG source localization (Phillips et al., 2002), classification of multi sensor EEG data (van Gerven et al., 2009) and gene selection in micro data analysis (Zou and Hastie, 2005). Moreover, these approaches are well-suited for the analysis of fMRI data which, as mentioned earlier, is characterized by a large number of features and limited training data. SVM based feature selection using L1, L2 or L0 regularization methods was also proposed in the literature (Bi et al., 2003, Perkins et al., 2003, Weston et al., 2003).

Here, we present a novel method LR12, based on logistic regression with a combination of L1 and L2 norm regularization to accurately estimate discriminative brain regions from whole-brain fMRI data. The use of L1 norm regularization results in sparse solutions, thereby helping in feature selection. However, when features are highly correlated, as in fMRI data, using only L1 norm regularization selects only a subset of relevant features. Using L2 norm regularization in addition to L1 helps in selecting all correlated and relevant voxels. Furthermore, our method uses a novel and fast component-wise update procedure to estimate discriminative brain regions; this procedure is used to maximize the logistic regression cost function that includes L1 and L2 norm regularization (Krishnapuram et al., 2005). The L1 norm and fast estimation procedure ensure rapid computation and a generalizable solution. The L2 norm provides additional benefit by including correlated brain regions in the solution, a critical step often overlooked by existing methods. We first evaluate the performance of our LR12 method, on simulated data and then examine its effectiveness in discriminating between well-matched music and speech stimuli. We also compared our procedures with other logistic regression methods and SVM-RFE.

Section snippets

Logistic regression with regularization

Logistic regression fits a separating hyper plane that is a linear function of input features between two conditions or classes. Here, we interchangeably use the terms conditions and class labels. Given a set of training data, the goal is (1) to estimate the hyper plane that accurately predicts the class label of a new example and (2) identify a subset of the features that is most informative about the class distinction. Let x = [x1,x2,…,xp]t  Rp be a vector of input features (voxels) and y (y is

Results

We first compare the performance of LR12, LR12-UST, LR1, and SVM-RFE on simulated datasets by evaluating the sensitivity, false positive rate, accuracy in feature selection and cross validation accuracy provided by each of these methods at various CNRs and feature prevalence rates. We then compare these methods on experimental data.

Discussion

We developed a novel whole-brain classification algorithm based on logistic regression for analysis of functional imaging data. Our LR12 method incorporates L1 and L2 norm regularization to achieve optimal feature selection in the presence of highly correlated features. This method provides three key improvements over existing methods: first, LR12 method can be scaled to whole-brain analysis; second, the method provides a data-driven mechanism to eliminate voxels which do not discriminate

Conclusions

We developed a new method for whole-brain classification based on a combination of L1 and L2 norm regularization. Our method provides a completely data-driven and computationally efficient approach for both accurate feature selection and classification of whole-brain fMRI data. Critically, it does not require user-specified thresholds for feature selection as in recursive feature elimination method. In the case of fMRI data, where voxels are spatially correlated, the combination of L1 and L2

Acknowledgments

We thank Dr. Lucina Uddin, Dr. Elena Rykhlevskaia and Rohan Dixit for reviewing the manuscript and for the insightful comments. We also thank the reviewers for their comments and suggestions, which resulted in an improvement of the manuscript. This research was supported by the National Institutes of Health (R01 HD047520, R01 HD045914, and NS058899), the National Science Foundation (BCS/DRL 0449927) and the Lucas Foundation.

References (38)

  • M. van Gerven et al.

    Interpreting single trial data using groupwise regularisation

    Neuroimage

    (2009)
  • Z. Wang

    A hybrid SVM-GLM approach for fMRI data analysis

    Neuroimage

    (2009)
  • O. Yamashita et al.

    Sparse estimation automatically selects voxels relevant for the decoding of fMRI activity patterns

    Neuroimage

    (2008)
  • Abrams, D.A., Bhatara, A.K., Ryali, S., Balaban, E., Levitin, D.J., Menon, V., submitted for publication. Music and...
  • J. Bi et al.

    Dimensionality reduction via sparse support vector machines

    J. Mach. Learn. Res.

    (2003)
  • E. Formisano et al.

    “Who” is saying “what”? Brain-based decoding of human voice and speech

    Science

    (2008)
  • A.D. Friederici et al.

    The role of left inferior frontal and superior temporal cortex in sentence comprehension: localizing syntactic and semantic processes

    Cereb. Cortex

    (2003)
  • K.J. Friston et al.

    Movement-related effects in fMRI time-series

    Magn. Reson. Med.

    (1996)
  • G.H. Glover et al.

    Self-navigated spiral fMRI: interleaved versus single-shot

    Magn. Reson. Med.

    (1998)
  • Cited by (0)

    View full text