Comparison of residential geocoding methods in population-based study of air quality and birth defects

https://doi.org/10.1016/j.envres.2006.01.004Get rights and content

Abstract

Our population-based case–control study of air quality and birth defects in Texas relied on the geocoding of maternal residence from vital records for the assignment of air pollution exposures during early pregnancy. We attempted to geocode the maternal addresses for 5338 birth defect cases and 4574 frequency-matched controls using an automated procedure with standard matching criteria in ArcGIS 8.2 and 8.3. Initially, we matched 7266 observations (73%). To increase the proportion of successful matches, we used an interactive procedure for the 2646 addresses that were initially not geocoded by the software. This yielded an additional 985 matches (37%). Using the same 2646 initially unmatched addresses, we compared the results of this interactive procedure to those of an automated procedure using lower standards. The automated procedure with lower standards yielded more matches (n=1559, 59%) but with questionable accuracy. We included the interactively geocoded observations in our final data set. Their inclusion did not affect the estimates of air pollution exposure but increased our statistical power to detect associations between air quality and risk of selected birth defects. The geocoded and not geocoded populations differed in the distribution of Latino ethnicity (51% vs 59%) and ethnicity was independently associated with air pollution exposures (P<0.05). Geocoding status also appeared to modify the association between ethnicity and risk of birth defects; Latina women appeared to have a slightly lower risk of birth defects than non-Latina women in the geocoded population and to have a slightly higher risk in the not geocoded population. Incomplete geocoding may have resulted in a selection bias because of the underrepresentation of Latinas in our study population.

Introduction

The use of geographic information systems (GIS) is becoming increasingly popular in environmental epidemiology (Nuckols et al., 2004). Geocoding, also called address matching, is one of the many tools available to the researcher in a variety of GIS software applications. It assigns latitude and longitude coordinates to addresses by linking to a reference theme or electronic street map that contains both address and geographic information (Bonner et al., 2003; Cayo and Talbot, 2003; Vine et al., 1997). Matching rates are typically 40–80% using commercially available software (Krieger et al., 2001). Only recently has the public health literature addressed the accuracy of geocoding methods (Bonner et al., 2003; Cayo and Talbot, 2003); a recent study compared the match rates, accuracy, and repeatability of commercial geocoding services and found that all three measures needed to be considered when deciding on a geocoding method (Whitsel et al., 2004).

Addresses may go unmatched because of a lack of standardization of street addresses, incorrectly spelled street or city names, missing suffixes (e.g., drive, street, road), non-existent house numbers, incorrect ZIP codes, use of apartment numbers, or limitations of the reference map (e.g., the reference map was created prior to the building of the home or after the destruction of a home) (Vine et al., 1997). In rural areas, additional problems include the use of rural routes and post office boxes instead of numerical street addresses; matching rates in rural areas can be as low as 20–30% (Vine et al., 1997). In the context of our population-based environmental epidemiology study, where inclusion in the study cohort relied on the successful geocoding of the participant's address, incomplete geocoding could have, at best, reduced the statistical power of the analyses. At worst, incomplete geocoding could have led to a selection bias with a potential impact on the validity of our study results.

In this paper, we compare the results of two geocoding methods, an interactive (manual) geocoding procedure and an automated geocoding procedure employing low matching standards, both used to try to increase the geocoding match rate. We also explore the possibility of selection bias resulting from the incomplete geocoding of the study population.

Section snippets

Materials and methods

We conducted a population-based case–control study investigating the association between exposure to carbon monoxide, nitrogen dioxide, ozone, sulfur dioxide, and particulate matter less than 10 μm in diameter (PM10) during weeks 3–8 of pregnancy and the risk of selected cardiac birth defects and oral clefts in seven Texas counties (Bexar, Dallas, El Paso, Harris, Hidalgo, Tarrant, Travis) between 1997 and 2000 (Gilboa et al., 2005). Cases were selected from the Texas Birth Defects Registry and

Comparison of automatically and interactively geocoded results

Of the 9912 cases and controls combined, 7266 (73.3%) were geocoded automatically using the standard matching criteria to the address reported in either the vital record (n=7012) or the Texas Birth Defects Registry (n=254). The 2646 addresses that were initially not geocoded by the software were submitted to an interactive procedure. The interactive procedure resulted in 985 (37%) matches for either the vital record address (n=914) or the Texas Birth Defects Registry address (n=71); we were

Discussion

The epidemiologic literature has recently given increased attention to GIS analyses and specifically the advantages and disadvantages of a variety of geocoding methods (Whitsel et al., 2004; Bonner et al., 2003; Cayo and Talbot, 2003; Krieger, 2003; McElroy et al., 2003; Krieger et al., 2001). This literature has provided epidemiologists with a number of concrete suggestions on logistics and execution in addition to cautionary notes on the geocoding of post office boxes (Hurley et al., 2003),

Acknowledgments

The authors thank Mary Ethen and Lisa Marengo of the Birth Defects Epidemiology and Surveillance Branch and Anna Vincent and Rachelle Moore of the Vital Statistics Unit of the Texas Department of State Health Services. The authors also acknowledge GIS support from Sonya Krogh of Computer Sciences Corp. We thank Maria Mirabelli of the University of North Carolina for reviewing an early version of the manuscript.

References (13)

  • M.R. Bonner et al.

    Positional accuracy of geocoded addresses in epidemiologic research

    Epidemiology

    (2003)
  • M.R. Cayo et al.

    Positional error in automated geocoding of residential addresses

    Int. J. Health Geogr.

    (2003)
  • S.M. Gilboa et al.

    Relation between ambient air quality and selected birth defects, Seven County Study, Texas, 1997–2000

    Am. J. Epidemiol.

    (2005)
  • S.E. Hurley et al.

    Post office box addresses: a challenge for geographic information system-based studies

    Epidemiology

    (2003)
  • N. Krieger

    Place, space and health: GIS and epidemiology

    Epidemiology

    (2003)
  • N. Krieger et al.

    On the wrong side of the tracts? Evaluating the accuracy of geocoding in public health research

    Am. J. Public Health

    (2001)
There are more references available in the full text version of this article.

Cited by (0)

View full text