Original Article
Correspondence analysis is a useful tool to uncover the relationships among categorical variables

https://doi.org/10.1016/j.jclinepi.2009.08.008Get rights and content

Abstract

Objective

Correspondence analysis (CA) is a multivariate graphical technique designed to explore the relationships among categorical variables. Epidemiologists frequently collect data on multiple categorical variables with the goal of examining associations among these variables. Nevertheless, CA appears to be an underused technique in epidemiology. The objective of this article is to present the utility of CA in an epidemiological context.

Study Design and Setting

The theory and interpretation of CA in the case of two and more than two variables are illustrated through two examples.

Results

The outcome from CA is a graphical display of the rows and columns of a contingency table that is designed to permit visualization of the salient relationships among the variable responses in a low-dimensional space. Such a representation reveals a more global picture of the relationships among row–column pairs, which would otherwise not be detected through a pairwise analysis.

Conclusion

When the study variables of interest are categorical, CA is an appropriate technique to explore the relationships among variable response categories and can play a complementary role in analyzing epidemiological data.

Section snippets

Background

What is new?

Key finding:

  1. Correspondence analysis (CA) is an underutilized multivariate technique designed specifically to explore the relationships within and between two or more categorical variables.

What this adds to what was known:
  1. CA analyzes binary, ordinal, as well as nominal data without distributional assumptions (unlike traditional multivariate techniques) and preserves the categorical nature of the variables.

  2. CA provides a unique graphical display showing how the variable response categories are related.

What is the implication, what should change now:
  1. Epidemiologists should

Description of CA

In this hypothetical study, we are interested in exploring the relationship between country of residence and primary language spoken. The contingency table of the frequencies is shown in Table 1. Let i = 1, 2, 3, 4, 5 represent the levels of the row variable, country, and j = 1, 2, 3, 4, 5 represent the levels of the column variable, language (Table 1). A chi-square test reveals that there is a statistically significant association (P < 0.0001). However, this test does not tell us how the two

Extension to more than two variables: application in frailty

Possibly the most useful epidemiological application of CA is to explore the relationships among multiple variables (i.e., more than two variables). MCA is an extension to CA when multiple variables are being considered. We illustrate the use of MCA using an application involving multiple binary variables in the context of our research on frailty.

Frailty in the elderly population is generally acknowledged to be a state of increased vulnerability to stressors because of impairments in multiple

Discussion

Faced with the challenge of examining the associations among several categorical variables in our frailty research initiative, we chose to use MCA. The graphical display of relationships provides a user-friendly overview of the salient relationships among the variable categories that are not easily captured by visual inspection of contingency tables. CA can be used for nominal, ordinal, or binary variables [11]. In addition, unlike traditional PCA or FA, which requires an assumption of

Acknowledgments

This study was supported by grants from the Solidage Research Group and the Dr. Joseph Kaufmann Chair in Geriatric Medicine, McGill University; the Canadian Initiative on Frailty and Aging; the Canadian Institutes of Health Research (CIHR) International Opportunity Program Development Grant 68739; the CIHR team grant in frailty and aging 82945; and the Johns Hopkins Older Americans Independence Center (National Institutes of Health award P50AG-021334-01).

References (35)

  • N. Sourial et al.

    A correspondence analysis revealed frailty deficits aggregate and are multidimensional

    J Clin Epidemiol

    (2010)
  • J. Coste et al.

    Clinical and psychological diversity of non-specific low-back pain. A new approach towards the classification of clinical subgroups

    J Clin Epidemiol

    (1991)
  • P.S. Nagpaul

    Correspondence analysis. Guide to advanced data analysis using IDAMS software

    (1999)
  • J.B. Meigs

    Invited commentary: insulin resistance syndrome? Syndrome X? Multiple metabolic syndrome? A syndrome at all? Factor analysis reveals patterns in the fabric of correlated metabolic risk factors

    Am J Epidemiol

    (2000)
  • R. Kahn et al.

    The metabolic syndrome: time for a critical appraisal: joint statement from the American Diabetes Association and the European Association for the Study of Diabetes

    Diabetes Care

    (2005)
  • E.S. Ford

    Factor analysis and defining the metabolic syndrome

    Ethn Dis

    (2003)
  • B. Muthen

    Contributions to factor analysis of dichotomous variables

    Psychometrika

    (1978)
  • B. Harris

    Tetrachoric correlation coefficient. Encyclopedia of statistical sciences

    (1988)
  • J.P. Benzécri

    Correspondence analysis handbook

    (1992)
  • S.E. Clausen

    Applied correspondence analysis

    (1998)
  • M. Friendly

    Correspondence analysis. Visualizing categorical data

    (2000)
  • N.T. Higgs

    Practical and innovative uses of correspondence analysis

    Statistician

    (1991)
  • R.B. Cattell

    The Scree test for the number of factors

    Multivariate Behav Res

    (1966)
  • SAS Institute Inc

    %PLOTIT macro documentation [Internet]. SAS support

  • SAS Institute Inc

    SAS 9.1.3 output delivery system: user's guide, volumes 1 and 2

    (2006)
  • M.J. Greenacre

    Correspondence analysis in practice

    (2007)
  • G.D. Garson

    Correspondence analysis [Internet]. Statnotes: topics in multivariate analysis

  • Cited by (316)

    View all citing articles on Scopus
    View full text