Comparison of residential geocoding methods in population-based study of air quality and birth defects
Introduction
The use of geographic information systems (GIS) is becoming increasingly popular in environmental epidemiology (Nuckols et al., 2004). Geocoding, also called address matching, is one of the many tools available to the researcher in a variety of GIS software applications. It assigns latitude and longitude coordinates to addresses by linking to a reference theme or electronic street map that contains both address and geographic information (Bonner et al., 2003; Cayo and Talbot, 2003; Vine et al., 1997). Matching rates are typically 40–80% using commercially available software (Krieger et al., 2001). Only recently has the public health literature addressed the accuracy of geocoding methods (Bonner et al., 2003; Cayo and Talbot, 2003); a recent study compared the match rates, accuracy, and repeatability of commercial geocoding services and found that all three measures needed to be considered when deciding on a geocoding method (Whitsel et al., 2004).
Addresses may go unmatched because of a lack of standardization of street addresses, incorrectly spelled street or city names, missing suffixes (e.g., drive, street, road), non-existent house numbers, incorrect ZIP codes, use of apartment numbers, or limitations of the reference map (e.g., the reference map was created prior to the building of the home or after the destruction of a home) (Vine et al., 1997). In rural areas, additional problems include the use of rural routes and post office boxes instead of numerical street addresses; matching rates in rural areas can be as low as 20–30% (Vine et al., 1997). In the context of our population-based environmental epidemiology study, where inclusion in the study cohort relied on the successful geocoding of the participant's address, incomplete geocoding could have, at best, reduced the statistical power of the analyses. At worst, incomplete geocoding could have led to a selection bias with a potential impact on the validity of our study results.
In this paper, we compare the results of two geocoding methods, an interactive (manual) geocoding procedure and an automated geocoding procedure employing low matching standards, both used to try to increase the geocoding match rate. We also explore the possibility of selection bias resulting from the incomplete geocoding of the study population.
Section snippets
Materials and methods
We conducted a population-based case–control study investigating the association between exposure to carbon monoxide, nitrogen dioxide, ozone, sulfur dioxide, and particulate matter less than 10 μm in diameter (PM10) during weeks 3–8 of pregnancy and the risk of selected cardiac birth defects and oral clefts in seven Texas counties (Bexar, Dallas, El Paso, Harris, Hidalgo, Tarrant, Travis) between 1997 and 2000 (Gilboa et al., 2005). Cases were selected from the Texas Birth Defects Registry and
Comparison of automatically and interactively geocoded results
Of the 9912 cases and controls combined, 7266 (73.3%) were geocoded automatically using the standard matching criteria to the address reported in either the vital record or the Texas Birth Defects Registry . The 2646 addresses that were initially not geocoded by the software were submitted to an interactive procedure. The interactive procedure resulted in 985 (37%) matches for either the vital record address or the Texas Birth Defects Registry address ; we were
Discussion
The epidemiologic literature has recently given increased attention to GIS analyses and specifically the advantages and disadvantages of a variety of geocoding methods (Whitsel et al., 2004; Bonner et al., 2003; Cayo and Talbot, 2003; Krieger, 2003; McElroy et al., 2003; Krieger et al., 2001). This literature has provided epidemiologists with a number of concrete suggestions on logistics and execution in addition to cautionary notes on the geocoding of post office boxes (Hurley et al., 2003),
Acknowledgments
The authors thank Mary Ethen and Lisa Marengo of the Birth Defects Epidemiology and Surveillance Branch and Anna Vincent and Rachelle Moore of the Vital Statistics Unit of the Texas Department of State Health Services. The authors also acknowledge GIS support from Sonya Krogh of Computer Sciences Corp. We thank Maria Mirabelli of the University of North Carolina for reviewing an early version of the manuscript.
References (13)
- et al.
Positional accuracy of geocoded addresses in epidemiologic research
Epidemiology
(2003) - et al.
Positional error in automated geocoding of residential addresses
Int. J. Health Geogr.
(2003) - et al.
Relation between ambient air quality and selected birth defects, Seven County Study, Texas, 1997–2000
Am. J. Epidemiol.
(2005) - et al.
Post office box addresses: a challenge for geographic information system-based studies
Epidemiology
(2003) Place, space and health: GIS and epidemiology
Epidemiology
(2003)- et al.
On the wrong side of the tracts? Evaluating the accuracy of geocoding in public health research
Am. J. Public Health
(2001)