Pattern recognition methods and applications in biomedical magnetic resonance

https://doi.org/10.1016/S0079-6565(00)00036-4Get rights and content

Introduction

High resolution one- and multi-dimensional NMR spectra measured on biological samples can be extremely complex. For example, for pure proteins in solution, and for biofluids and tissue extracts, 1H NMR spectra contain many thousands of resonances. In addition, multi-dimensional NMR spectra of such systems also provide much spin connection information and thus additional complexity. In many cases, visual inspection of such spectra releases only a small percentage of the information available in the data. For this reason there has been a move towards the use of computer-based methods for extracting maximum information from such complex spectra. This article is concerned with the application of such multivariate statistical methods, sometimes termed chemometrics or bio-informatics, for the extraction of information from high-resolution NMR spectra of biological samples. In addition, there are also many examples in the literature of the use of computer-based methods for enhancing information recovery from NMR spectra of proteins, for assignment of chemical structures from NMR spectra and for interpreting MR images. Given the potential diversity of these topics, this article will not cover the first two areas but will provide a brief overview of some important MRI studies with literature citations. The bulk of this article will describe the chemometrics techniques and their applications to extract meaningful information from complex 1H NMR spectra based on studies of biological fluids, tissue extracts, intact tissues and food substances.

Although statistical methods have been applied for many years to NMR data and some pioneering work was carried out on automatic structural fragment identification from NMR spectra in the DENDRAL project in the 1960s, the first real concerted use of applying multivariate statistics to high resolution NMR spectra in the biomedical area was the classification of 1H NMR spectra from rat urine samples according to the type of organ toxin which had been administered to the animals, thus providing a fast non-invasive way of assessing candidate drug toxicity [1], [2]. As will be evident in this article, this approach has now been extended significantly. The term metabonomics has been coined recently to describe this by analogy with genomics and proteomics and this requires some explanation.

Recently, there has been a huge effort to determine the human genetic sequence, the human genome project, and an announcement has been made of its essential completion. The complete genome of a number of lower organisms, such as infective pathogens, has already been published and this is likely to lead to a revolution in the search for new drugs. This whole science of genomics has consequently burgeoned. Identification of which genes are up- or down-regulated in a disease process or by a drug might lead to new insights into drug discovery or drug safety assessment. Each gene codes for a specific protein and now the focus in molecular biology is turning to the identification of all of the proteins in an organism, the proteome. This science of proteomics, which is generally based on two-dimensional (2D) gel electrophoresis for separation of the proteins followed by mass spectrometry (MS) for their identification and quantitation, is at an earlier stage of development than genomics. At present, this process is slow, labour-intensive, expensive and the ability to identify relevant processes is uncertain.

However, many such proteins are enzymes, and every enzyme in an organism is involved in the formation or reaction of a myriad of small molecule metabolites and so a third class of information to complement the genome and the proteome can be conceived. We have coined the term metabonomics, defined as the quantitative measurement of the dynamic multiparametric metabolic response of living systems to pathophysiological stimuli or genetic modification, to describe the science of studying the dynamic and time-dependent profile of metabolites within an organism and how they are altered by some biological process [3]. As will become clear, currently the only effective way to extract meaningful metabonomic information is by combining high resolution NMR spectroscopy of biofluids, cells and tissues with multivariate statistical methods, termed collectively here for brevity as pattern recognition (PR). The concept of metabonomics has arisen from our application of 1H NMR spectroscopy to study the multi-component metabolic composition of biofluids, cells and tissues and from studies utilising PR, expert systems and related bio-informatic tools to interpret and classify complex NMR-generated metabolic data sets [1], [2], [3], [4], [5].

There is also a related area to this work, known as metabolic control analysis [6], [7], which includes the concept of the “metabolome” defining the total small molecule complement of a cell. Since the metabolic profile of an organism is fluctuating rapidly to maintain homeostasis, the concept of a fixed metabolome may not be useful. Conversely, metabonomics, deals with detecting, identifying, quantitating and cataloguing the history of time-related metabolic changes in an integrated biological system rather than the individual cell. Such multi-dimensional metabolic trajectories can then be related to the biological events in an ongoing pathophysiological process.

Section snippets

Genomics, proteomics and metabonomics (bionomics)

As introduced above, one of the intellectual products of the molecular biology revolution has been the concept of genomics, which is basically a semi-quantitative approach to the measurement of gene expression. The genomic approach involves the observation of differential gene expression as a result of genetic modification, disease or xenobiotic toxicity (a xenobiotic is a foreign compound, e.g. a drug). The technology involves new generations of proprietary “gene chips”, small, disposable but

Sample handling and spectral processing

The use of flow-injection NMR probes has become widely available as a result of the need for high throughput in combinatorial chemistry studies, but the technique is also ideally suitable for biofluid NMR spectroscopy. The approach is now used routinely and compared with conventional automatic operation using 5 mm NMR tubes and an automatic sample changer, the method results in a significantly increased rate of sample throughput, requires minimal spectrometer optimisation such as magnetic field

Training and prediction

In this approach, a “training set” of sample data is used to build a statistical model which classifies the samples correctly. This predictive model is then evaluated using an independent “validation set” of sample data. Supervised methods can use a reduced data dimensionality such as PCs, but the methods usually involve all of the descriptor dimensions, choosing those that are most appropriate. The methods fall into two main types — those, which predict a class, and those, which return a

Tissue extracts

Howells et al. [65] have studied 58 1H NMR spectra from perchloric acid extracts of three normal tissues (liver, kidney and spleen) and five rat tumours (GH3 pituitary, fibrosarcoma, Morris Hepatomas 7777 and 9618a and Walker carcinosarcoma). Instead of quantifying individual metabolites, they used statistical PR techniques to classify them into groups. This differentiated spectra from normal and malignant rat tissue biopsies, and from different types of cancer. It was suggested that this

Applications of NMR-PR to in vivo NMR spectra

One of the earliest papers on using PR methods on in vivo NMR spectra was that by Morvan et al. on the 31P NMR spectra of muscle in myopathies [74]. PR has also been applied to the analysis of in vivo 31P NMR spectra of tumours [75]. Using four different classes of tumour and three types of normal tissue, cluster analysis and artificial neural networks were successful in separating and classifying the majority of samples analysed. Although the phosphomonoester and inorganic phosphate peaks

Method development and evaluation

A review has been published [81] which highlights various MRI segmentation algorithms that use PR methods. Techniques

Applications of NMR-PR to foods

Applications in the food and related industries have used particularly 2H NMR spectroscopy, site-specific natural isotopic fractionation studied by nuclear magnetic resonance (SNIF-NMR), isotope ratio analysis using MS and elemental trace analysis combined with interpretation using PCA and LDA. Areas of application include wine and wine constituents, fruit juices, caffeine and plant oils. The whole field has been reviewed [95], [96].

Stable isotope analysis of grape juices and fermented products

Concluding remarks

It is clear that the complexity of NMR spectra of biological systems can be regarded as a difficulty in relation to their interpretation, but this challenge, which if successfully overcome, offers tremendous opportunities for increases in understanding of biological processes. It appears that a promising approach for releasing the information in such complex data sets lies in the power of computer PR algorithms. Thus the combination of NMR spectroscopy with PR methods promises to be a major way

First page preview

First page preview
Click to open first page preview

References (113)

  • K.P.R. Gartland et al.

    J. Pharm. Biomed. Anal.

    (1990)
  • J.C. Lindon et al.

    Annu. Rep. NMR Spectrosc.

    (1999)
  • D. Moka et al.

    J. Pharm. Biomed. Anal.

    (1998)
  • T.R. Brown et al.

    J. Magn. Reson. B

    (1996)
  • R.D. Farrant et al.

    J. Pharm. Biomed. Anal.

    (1992)
  • B.R. Kowalski et al.

    Pattern Recogn.

    (1976)
  • S. Wold

    Pattern Recogn.

    (1976)
  • M. Clark et al.

    Quant. Struct.-Act. Relat.

    (1993)
  • A.C. Schulte et al.

    J. Magn. Reson.

    (1997)
  • M. Spraul et al.

    J. Pharm. Biomed. Anal.

    (1994)
  • C.L. Gavaghan et al.

    FEBS Lett.

    (2000)
  • E. Holmes et al.

    Anal. Biochem.

    (1994)
  • E. Holmes et al.

    Chemom. Intell. Lab. Syst.

    (1998)
  • M.L. Anthony et al.

    J. Pharm. Biomed. Anal.

    (1995)
  • S. Kruse et al.

    Chemom. Intell. Lab. Syst.

    (1991)
  • R. Stoyanova et al.

    J. Magn. Reson.

    (1995)
  • V. Digesu et al.

    Pattern Recogn. Lett.

    (1994)
  • L.P. Clarke et al.

    Magn. Reson. Imag.

    (1993)
  • M. Vaidyanathan et al.

    Magn. Reson. Imaging

    (1995)
  • M. Vaidyanathan et al.

    Magn. Reson. Imaging

    (1997)
  • W.E. Phillips et al.

    Magn. Reson. Imaging

    (1995)
  • M. Vaidyanathan et al.

    Magn. Reson. Imaging

    (1997)
  • M.B. Merickel et al.

    Comput. Med. Imaging Graph.

    (1991)
  • K.P.R. Gartland et al.

    Mol. Pharmacol.

    (1991)
  • J.K. Nicholson et al.

    Xenobiotica

    (1999)
  • J.C. Lindon et al.

    Concepts Magn. Reson.

    (2000)
  • H. Kacser et al.

    Rate control of biological processes

  • R. Goodacre et al.

    Cytotechnology

    (1996)
  • B. Sinclair

    The Scientist

    (1999)
  • M.J. Geisow

    Nature Biotechnol.

    (1998)
  • N.L. Anderson et al.

    Toxicol. Pathol.

    (1996)
  • L. Aicher et al.

    Electrophoresis

    (1998)
  • W. El-Deredy

    NMR Biomed.

    (1997)
  • G. Hagberg

    NMR Biomed.

    (1998)
  • E. Holmes et al.

    Current Opinions in Drug Development

    (2000)
  • M.A. Sharaf et al.

    Chemometrics

    (1986)
  • R.L. Somorjai

    Multivariate statistical methods

  • M. Spraul et al.

    Anal. Commun.

    (1997)
  • J.T.W.E. Vogels et al.

    J. Chemom.

    (1996)
  • W. Lehnert et al.

    Eur. J. Paediatr.

    (1986)
  • R.A. Wevers et al.

    Clin. Chem.

    (1995)
  • B.R. Kowalski et al.

    J. Am. Chem. Soc.

    (1972)
  • B.R. Kowalski et al.

    J. Am. Chem. Soc.

    (1973)
  • H. Wold

    Estimation of principal components and related models by iterative least squares

  • V.S. Rose et al.

    Quant. Struct.-Act. Relat.

    (1991)
  • J.A. Hartigan

    Clustering Algorithms

    (1975)
  • H. Wold
  • K.G. Joreskog et al.

    Systems under Indirect Observation

    (1982)
  • }.I.E. Frank et al.

    J. Chem. Inf. Comput.

    (1984)
  • N.J. Nillson

    Learning Machines

    (1965)
  • Cited by (409)

    • Current NMR strategies for biomarker discovery

      2019, Proteomic and Metabolomic Approaches to Biomarker Discovery
    View all citing articles on Scopus
    View full text