Sparse logistic regression for whole-brain classification of fMRI data
Introduction
Multivariate pattern recognition (MPR) methods are rapidly becoming a popular tool for analyzing fMRI data (Cox and Savoy, 2003, De Martino et al., 2008, Haynes et al., 2007, Kriegeskorte et al., 2006, Mourao-Miranda et al., 2005, Pereira et al., 2009). These methods use fMRI data to detect activity patterns in brain regions that collectively discriminate one cognitive condition or participant group from another. Most fMRI studies that use MPR methods restrict the analysis to specific brain regions of interest (ROI) (Cox and Savoy, 2003, Haynes et al., 2007), however this approach is problematic if the ROIs are not known a priori. In these cases, a data-driven approach that incorporates multiple brain regions is desirable for several reasons. For one, it is possible that no single brain region can accurately discriminate given a set of experimental stimuli, task conditions or participant groups, and simultaneously incorporating multiple brain regions may be necessary to describe the distributed networks sub serving differential brain processes. Therefore, the MPR method used in fMRI data analysis should, ideally, consider activity patterns in all brain regions, and identify the subset of regions that discriminates between experimental conditions in an unbiased manner. Hereafter, we refer to MPR methods that include activity patterns across the entire brain as “whole-brain classifiers.”
Designing a whole-brain classifier presents a number of technical challenges since the number of regions considered in the analysis of fMRI data (“features”) is large compared to the number of observations (trials or participants). Typically, this results in over-fitting of the data, leading to high classification accuracies for data used in designing the classifier, but poor classification accuracies for independent “test” data. Furthermore, a common characteristic of fMRI data is that the number of brain regions involved in a given cognitive task is typically small relative to the total number of brain regions. Selecting the brain regions that are most relevant in discriminating cognitive tasks/condition overcomes the problem of over-fitting and improves the generalization performance of the classifier. Furthermore, identifying these relevant regions is also critical for understanding which brain regions can discriminate between stimulus conditions. Taken together, the problem of whole-brain classification can be distilled to two key problems: (1) feature selection, or selection of only those relevant regions that discriminate between cognitive conditions, and (2) designing a classifier using these selected regions.
The problem of feature selection has been extensively studied by the machine learning community (Kohavi, 1997). The overall goal of feature selection is to identify subsets of features that are most useful in discriminating two or more conditions of interest. Existing methods for feature selection can be grouped in two categories: filter and wrapper (Guyon, 2003, Kohavi, 1997). In the filter strategy, features are selected independent of classification, and the selected features are then used in designing the classifier. The features are ranked based on univariate scores such as correlation or mutual information between a feature and an experimental manipulation. This strategy has been implemented in a number of fMRI studies (Haynes and Rees, 2005, Mitchell et al., 2004, Mourao-Miranda et al., 2006). A limitation of the filter strategy is that this method applies only univariate measures and therefore does not consider the relationships between features while selecting them. This is a major limitation since fMRI data is inherently multivariate, with strong spatial correlation between neighboring voxels. Furthermore, this method does not consider classifier performance in selecting features. In contrast, the wrapper strategy utilizes methods in which features are selected that maximize the performance of the classifier. The selected features are then used in designing the classifier, as in the support vector machine-based recursive feature elimination algorithm (SVM-RFE) developed by Guyon et al. (2002) and Guyon (2003). This method has been applied for feature selection and classification of fMRI data by De Martino et al. (2008). A weakness of this approach is that thresholds used to select features are arbitrary and different datasets may require different settings of thresholds (De Martino et al., 2008).
An alternative strategy was recently proposed to simultaneously address the problem of feature selection and classifier design (Krishnapuram et al., 2005, Tipping, 2001, Zou and Hastie, 2005). In this strategy, feature selection is included as part of the classifier design, ensuring efficient use of data and faster computation time since the classifier does not need to be repeatedly trained during feature selection. In this approach, regularization is used to prevent over-fitting of the data and thereby improve generalizability of the classifier. Regularization-based approaches have been successfully applied to problems such as EEG/MEG source localization (Phillips et al., 2002), classification of multi sensor EEG data (van Gerven et al., 2009) and gene selection in micro data analysis (Zou and Hastie, 2005). Moreover, these approaches are well-suited for the analysis of fMRI data which, as mentioned earlier, is characterized by a large number of features and limited training data. SVM based feature selection using L1, L2 or L0 regularization methods was also proposed in the literature (Bi et al., 2003, Perkins et al., 2003, Weston et al., 2003).
Here, we present a novel method LR12, based on logistic regression with a combination of L1 and L2 norm regularization to accurately estimate discriminative brain regions from whole-brain fMRI data. The use of L1 norm regularization results in sparse solutions, thereby helping in feature selection. However, when features are highly correlated, as in fMRI data, using only L1 norm regularization selects only a subset of relevant features. Using L2 norm regularization in addition to L1 helps in selecting all correlated and relevant voxels. Furthermore, our method uses a novel and fast component-wise update procedure to estimate discriminative brain regions; this procedure is used to maximize the logistic regression cost function that includes L1 and L2 norm regularization (Krishnapuram et al., 2005). The L1 norm and fast estimation procedure ensure rapid computation and a generalizable solution. The L2 norm provides additional benefit by including correlated brain regions in the solution, a critical step often overlooked by existing methods. We first evaluate the performance of our LR12 method, on simulated data and then examine its effectiveness in discriminating between well-matched music and speech stimuli. We also compared our procedures with other logistic regression methods and SVM-RFE.
Section snippets
Logistic regression with regularization
Logistic regression fits a separating hyper plane that is a linear function of input features between two conditions or classes. Here, we interchangeably use the terms conditions and class labels. Given a set of training data, the goal is (1) to estimate the hyper plane that accurately predicts the class label of a new example and (2) identify a subset of the features that is most informative about the class distinction. Let x = [x1,x2,…,xp]t ∊ Rp be a vector of input features (voxels) and y (y is
Results
We first compare the performance of LR12, LR12-UST, LR1, and SVM-RFE on simulated datasets by evaluating the sensitivity, false positive rate, accuracy in feature selection and cross validation accuracy provided by each of these methods at various CNRs and feature prevalence rates. We then compare these methods on experimental data.
Discussion
We developed a novel whole-brain classification algorithm based on logistic regression for analysis of functional imaging data. Our LR12 method incorporates L1 and L2 norm regularization to achieve optimal feature selection in the presence of highly correlated features. This method provides three key improvements over existing methods: first, LR12 method can be scaled to whole-brain analysis; second, the method provides a data-driven mechanism to eliminate voxels which do not discriminate
Conclusions
We developed a new method for whole-brain classification based on a combination of L1 and L2 norm regularization. Our method provides a completely data-driven and computationally efficient approach for both accurate feature selection and classification of whole-brain fMRI data. Critically, it does not require user-specified thresholds for feature selection as in recursive feature elimination method. In the case of fMRI data, where voxels are spatially correlated, the combination of L1 and L2
Acknowledgments
We thank Dr. Lucina Uddin, Dr. Elena Rykhlevskaia and Rohan Dixit for reviewing the manuscript and for the insightful comments. We also thank the reviewers for their comments and suggestions, which resulted in an improvement of the manuscript. This research was supported by the National Institutes of Health (R01 HD047520, R01 HD045914, and NS058899), the National Science Foundation (BCS/DRL 0449927) and the Lucas Foundation.
References (38)
- et al.
Prediction and interpretation of distributed neural activity with sparse models
Neuroimage
(2009) - et al.
Functional magnetic resonance imaging (fMRI) “brain reading”: detecting and classifying distributed patterns of fMRI activity in human visual cortex
Neuroimage
(2003) - et al.
Combining multivariate voxel selection and support vector machines for mapping and classification of fMRI spatial patterns
Neuroimage
(2008) - et al.
Reading hidden intentions in the human brain
Curr. Biol.
(2007) - et al.
Bach speaks: a cortical “language-network” serves the processing of music
NeuroImage
(2002) Wrappers for feature selection
Artif. Intell.
(1997)- et al.
Musical structure is processed in “language” areas of the brain: a possible role for Brodmann Area 47 in temporal coherence
NeuroImage
(2003) - et al.
Classifying brain states and determining the discriminating activation patterns: Support Vector Machine on functional MRI data
Neuroimage
(2005) - et al.
The impact of temporal compression and space selection on SVM analysis of single-subject and multi-subject fMRI data
Neuroimage
(2006) - et al.
Systematic regularization of linear inverse solutions of the EEG source localization problem
Neuroimage
(2002)