Elsevier

NeuroImage

Volume 56, Issue 1, 1 May 2011, Pages 185-196
NeuroImage

Fast and robust extraction of hippocampus from MR images for diagnostics of Alzheimer's disease

https://doi.org/10.1016/j.neuroimage.2011.01.062Get rights and content

Abstract

Assessment of temporal lobe atrophy from magnetic resonance images is a part of clinical guidelines for the diagnosis of prodromal Alzheimer's disease. As hippocampus is known to be among the first areas affected by the disease, fast and robust definition of hippocampus volume would be of great importance in the clinical decision making. We propose a method for computing automatically the volume of hippocampus using a modified multi-atlas segmentation framework, including an improved initialization of the framework and the correction of partial volume effect. The method produced a high similarity index, 0.87, and correlation coefficient, 0.94, with semi-automatically generated segmentations. When comparing hippocampus volumes extracted from 1.5 T and 3 T images, the absolute value of the difference was low: 3.2% of the volume. The correct classification rate for Alzheimer's disease and cognitively normal cases was about 80% while the accuracy 65% was obtained for classifying stable and progressive mild cognitive impairment cases. The method was evaluated in three cohorts consisting altogether about 1000 cases, the main emphasis being in the analysis of the ADNI cohort. The computation time of the method is about 2 minutes on a standard laptop computer. The results show a clear potential for applying the method in clinical practice.

Research highlights

► Automatic segmentation provides a reliable estimate of hippocampus volume in MRI. ► Multi-atlas segmentation of hippocampus is computed in 2 minutes (laptop computer). ► Partial volume correction improves the classification accuracy.

Introduction

In current guidelines (Dubois et al., 2007), the diagnostic criteria for probable Alzheimer's disease (AD) require a presence of both impairment in episodic memory and one supportive feature, either medial temporal lobe atrophy, abnormal cerebrospinal fluid (CSF) biomarker, specific pattern in PET or proven AD autosomal dominant mutation. In addition, the guidelines specify a list of exclusion criteria. Similar components can be found also from the recent EFNS guideline (Waldemar et al., 2007). The revision of criteria for AD, mild cognitive impairment (MCI) and preclinical AD is also ongoing and will include further emphasis on biomarkers and imaging.

In medial temporal lobe (MTL), the volume loss of hippocampi, entorhinal cortex and amygdala is a hallmark indicating AD. The guidelines (Dubois et al., 2007) suggest that the volume loss is “evidenced on MRI with qualitative ratings using visual scoring”. Qualitative and subjective ratings may, however, lead to different results between interpreters and the diagnosis made by even a single interpreter may vary when re-examining images. Therefore, there is a clear need for objective methods for the assessment of hippocampal volume. Although automated tools are developed actively in many research groups, the development of robust, accurate and fast automatic methods is a highly challenging problem and automatic methods are still very much lacking in clinical practice.

Several methods have been published for segmenting hippocampus (Chupin et al., 2009a, Chupin et al., 2009b, Fischl et al., 2002, Lötjönen et al., 2010, Morra et al., 2008, van der Lijn et al., 2008, Wolz et al., 2010a). All these methods segment the hippocampus as a whole although in reality it contains sub-structures. However, the accurate segmentation of these structures is difficult from most images currently available in clinical practice. We therefore concentrate in this work on the segmentation of the hippocampus as a single structure. One of the main objectives of this work is to develop tools for clinical decision making.

Although many published methods are promising, some space remains for interpretations, either in accuracy, robustness or computational speed. First, there is no real gold-standard for defining the accuracy of segmentation. Currently manual segmentations by clinical experts represent the clinical gold-standard for hippocampal segmentation. Therefore, if the difference between automatically and manually generated segmentations is equal to the difference between two manual segmentations, automatic segmentation is typically considered to have corresponding accuracy to the manual segmentation. There are numerous methods characterizing the accuracy of segmentations: differences in various overlap measures between manually and automatically generated segmentations, such as the Dice similarity index, recall and precision values, or distances between the surfaces of objects, or differences in the volumes of objects, or differences in the ability to classify a subject to a correct class or group. Classification accuracy is an important measure if the ultimate goal is to use a certain biomarker in diagnostics. Classification accuracy reflects the robustness of segmentation not segmentation accuracy as such. For example, if an automatic method is consistent but systematically overestimates the volume, i.e., the measure is biased, the accuracy of the segmentation is obviously decreased. This systematic and consistent error does not, however, affect classification accuracy or ability to detect statistical difference between two populations. A less robust or consistent algorithm introduces noise into measurements and thus makes the classification less accurate. In diagnostics, the consistency of segmentation is even more important than ensuring that segmentation is not biased. As there are different guidelines for manual segmentation of the hippocampus, even the clinical gold-standards are biased relative to each other; efforts for harmonizing these guidelines are ongoing (Boccardi et al., 2010). All these indicators may lead to conflicting interpretations making the evaluation of results sometimes cumbersome. Second, methods are often validated using a relatively small database or somehow constrained data, e.g., from a single site or using only a device from one manufacturer. A clear problem in the evaluation of the accuracy is a limited number of manually segmented cases available because producing a representative set of manual segmentations is a highly laborious task. These issues make the extensive evaluation of the robustness in real clinical conditions difficult. Third, the computation time of a segmentation method is not considered in many scientific publications although it is a relevant issue in clinical practice. Computation times of hours or the requirement of special computer facilities or a need for careful and laborious tuning of the parameters of the method decrease the feasibility of a method in the clinical setting. In summary, demonstrating the usefulness of a method for clinical practice is a laborious task and still often leaves some space for interpretations.

Atlas-based segmentation is a commonly used technique to segment image data. In atlas-based segmentation, an intensity template is registered non-rigidly to an unseen image and the resulting transformation is used to propagate tissue class or anatomical structure labels of the template into the space of the unseen image. The segmentation accuracy can be improved considerably by combining basic atlas-based segmentation with techniques from machine learning, e.g. classifier fusion (Heckemann et al., 2006, Klein et al., 2005, Rohlfing et al., 2004, Warfield et al., 2004). In this approach, several atlases from different subjects are registered to unseen data. The label that the majority of all warped labels predict for each voxel is used for the final segmentation of the unseen image. This multi-atlas segmentation was shown to produce the best segmentation accuracy for subcortical structures in a comparison study (Babalola et al., 2008). However, the major drawback of the multi-atlas segmentation is that it is computationally expensive. For example, van der Lijn et al. (2008) reported computation times of several hours for multi-atlas segmentation.

In (Lötjönen et al., 2010), we recently presented a method for fast and robust multi-atlas segmentation of volumetric image data. The tool was based on a fast non-rigid registration algorithm, use of atlas-selection and use of intensity information via graph-cut or expectation maximisation (EM) algorithms. The use of atlas selection and the use of intensity modeling improved significantly the segmentation accuracy. The computation time for segmenting the hippocampus was 3–4 minutes using an 8-core workstation. The computation time was clearly shorter than in many published methods and it is not a limiting factor in many applications anymore. However, even shorter computation time would make online segmentation more attractive in clinical practice and allow more freedom in planning clinical work-flows. Other requirements for clinical use include that no manual tuning of segmentation parameters should be needed, and complex and expensive computer facilities and maintenance should not be required. In this work, we propose two major methodological contributions to our previously published method: 1) use of an inter-mediate template space between unseen data and atlas spaces for speeding up the computation time, and 2) use of partial volume modeling in segmenting hippocampus for improving the classification accuracy.

In (Lötjönen et al., 2010), atlas selection was performed first: the unseen data and all atlases were registered non-rigidly to a template, and atlases being most similar to the unseen data were selected. Then, multi-atlas segmentation was applied: each of the selected atlases was registered separately non-rigidly to the unseen data and classifier fusion was performed. The innovation of our current work is that transformations computed in the atlas selection step are used to initialize the multiple transformations when registering atlases to unseen data. The process becomes much faster as only small tuning of the transformations from atlases to unseen data is needed. The intermediate template space, used in our atlas-selection step, has been previously utilized to speed-up and to improve the accuracy of non-rigid registration by Tang et al. (2010) using initialization based on principal component analysis and by Rohlfing et al. (2009) using subject-specific templates generated by a regression model.

The volume of the hippocampus is typically 1–3 ml in elderly subjects, including Alzheimer's disease cases. In a typical clinical setting, the voxel size of MR images is around 1 × 1 × 1 mm3 which means that hippocampus is presented only by 1000–3000 voxels. Up to 80–90% of these voxels are on the surface of the object which means that partial volume effect may affect dramatically the estimate of the volume. There are multiple approaches published for estimating the partial volume effects in the EM framework (Acosta et al., 2009, Shattuck et al., 2001, Tohka et al., 2004). In this work, we used the method proposed by Tohka et al. (2004).

In addition to the methodological contributions, we demonstrate using large data cohorts the performance of automatically computed hippocampus volumes 1) in diagnostics of Alzheimer's disease and 2) compared with semi-automatically generated volumes. Data from almost 1000 cases originating from three different patient cohorts are used. For comparison, only 60 cases were used in our previous paper (Lötjönen et al., 2010).

In this article, we first introduce a method utilizing the template space to speed up the computation and an approach for modeling the partial volume effect. Thereafter, the data used and experiments performed are described. Finally results are shown and discussed.

Section snippets

Classification based on multi-atlas segmentation

Fig. 1 summarizes our multi-atlas segmentation pipeline (Lötjönen et al., 2010) including also the contributions made in this work (indicated by the blue text). Step 1: Both unseen data and atlases are registered non-rigidly to a template. The atlases most similar to the unseen data, measured by normalized mutual information in the template space, are selected to be used in the next step. Step 2: Non-rigid transformations between the unseen data and the selected atlases are computed. Our

ADNI cohort

Table 2 shows the similarity index and its standard deviation, the intra-class correlation coefficient of volumes and computation times for two computers: 8-core workstation (Intel Xeon E5420 @ 2.50 GHz) and dual-core laptop (Intel Core2 Duo P8600 @ 2.4 GHz). When compared with the accuracy between two different raters, reported in four publications, our method gives comparable results. The computation times are also on the range that is clinically acceptable.

There is no threshold for the

Discussion

In this work, we proposed and validated a method for automatic segmentation of the hippocampus from MRI images. Our final objective is to develop a tool for helping decision making in real clinical conditions. A segmentation tool must be accurate, robust and fast enough to be attractive in clinical practice. Our preliminary analysis shows that it is possible to generate fully automatically segmentations where the accuracy corresponds to semi-automatic segmentation, and the computation time is

Acknowledgments

This work was partially funded under the 7th Framework Programme by the European Commission (http.//cordis.europa.eu/ist; EU-Grant-224328-PredictAD; Name: From Patient Data to Personalized Healthcare in Alzheimer's Disease).

Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and

References (35)

  • T. Rohlfing et al.

    Evaluation of atlas selection strategies for atlas-based image segmentation with application to confocal microscopy images of bee brain

    Neuroimage

    (2004)
  • D. Shattuck et al.

    Magnetic resonance image tissue classification using a partial volume model

    Neuroimage

    (2001)
  • J. Tohka et al.

    Fast and robust parameter estimation for statistical partial volume models in brain MRI

    Neuroimage

    (2004)
  • F. van der Lijn et al.

    Hippocampus segmentation in MR images using atlas registration, voxel classification and graph cuts

    Neuroimage

    (2008)
  • R. Wolz et al.

    LEAP: Learning embeddings for atlas propagation

    Neuroimage

    (2010)
  • R. Wolz et al.

    Measurement of hippocampal atrophy using 4D graph-cut segmentation: application to ADNI

    Neuroimage

    (2010)
  • K.O. Babalola et al.

    Comparison and evaluation of segmentation techniques for subcortical structures in brain MRI

    Med. Image Comput. Comput. Assist. Interv. MICCAI

    (2008)
  • Cited by (109)

    • Comparison of 1.5 T and 3 T MRI hippocampus texture features in the assessment of Alzheimer's disease

      2020, Biomedical Signal Processing and Control
      Citation Excerpt :

      Furthermore, with stronger fields, the magnetic field inhomogeneity increases as well due to susceptibility increase in spatial variations [13]. Currently, most MRI studies are conducted at 1.5 T [14–17]; however, some studies investigated a stronger magnetic field, such as from 3 T as tabulated in Table 1, investigating whether 3 T MRI strength fields can provide better atrophy detection compared to 1.5 T [18–22]. Overall, 1.5 T and 3 T scans did not significantly differ in their power to detect neurodegeneration from atrophy.

    View all citing articles on Scopus
    1

    Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (www.loni.ucla.edu/ADNI). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: www.loni.ucla.edu\ADNI\Collaboration\ADNI_Authorship_list.pdf.

    View full text