Cyberinfrastructure and population health
Grid-Enabled Measures: Using Science 2.0 to Standardize Measures and Share Data

https://doi.org/10.1016/j.amepre.2011.01.004Get rights and content

Abstract

Scientists are taking advantage of the Internet and collaborative web technology to accelerate discovery in a massively connected, participative environment—a phenomenon referred to by some as Science 2.0. As a new way of doing science, this phenomenon has the potential to push science forward in a more efficient manner than was previously possible. The Grid-Enabled Measures (GEM) database has been conceptualized as an instantiation of Science 2.0 principles by the National Cancer Institute (NCI) with two overarching goals: (1) promote the use of standardized measures, which are tied to theoretically based constructs; and (2) facilitate the ability to share harmonized data resulting from the use of standardized measures. The first is accomplished by creating an online venue where a virtual community of researchers can collaborate together and come to consensus on measures by rating, commenting on, and viewing meta-data about the measures and associated constructs. The second is accomplished by connecting the constructs and measures to an ontological framework with data standards and common data elements such as the NCI Enterprise Vocabulary System (EVS) and the cancer Data Standards Repository (caDSR). This paper will describe the web 2.0 principles on which the GEM database is based, describe its functionality, and discuss some of the important issues involved with creating the GEM database, such as the role of mutually agreed-on ontologies (i.e., knowledge categories and the relationships among these categories—for data sharing).

Introduction

The digital and information age has transformed how we access information and interact with others; a transformation that has implications for the conduct of health science. For example, the NIH was one of the first scientific agencies to use networked information technologies (a prototype cyberinfrastructure) to coordinate input from literally hundreds of laboratories from around the word to document the more than 3 billion base pairs comprising the human genome.1 This technology-mediated effort virtually defined the concept of “team science”2 in an era of distributed computing. Likewise, current efforts to extract research value from the terabytes of data existing within electronic medical records hold the promise of reducing healthcare costs through comparative effectiveness studies,3 reducing health disparities by informing policy decisions at the systems level,4 and accelerating discovery by closing the gap in translational research.5, 6

A limiting factor in taking full advantage of these networked information technologies, however, has been the organizational challenge of deriving agreement from scientific communities on the common terms, measurements, and data elements that will make up the content and structure of the interconnected data systems.7, 8, 9 This paper describes one solution to that problem: the Grid-Enabled Measures (GEM) database. Whereas other systems have used consensus panels7 and psychometric analytic techniques10 to select common measures for very specific purposes, the GEM database is distinct in that it uses “web 2.0” functionality to solicit, comment, vet, and select measures from the behavioral and population science communities in open and transparent ways. It is an example of what some National Science Foundation (NSF) grantees have referred to as “Science 2.0.”11, 12

Section snippets

Web 2.0, Health 2.0, and Science 2.0

In 2004, computer publishing entrepreneur Tim O'Reilly hosted a conference of information technology specialists to identify characteristics of successful websites that seemed to be thriving in spite of the financial downturn associated with the “dot.com” implosion. He referred to this new generation of the web as “web 2.0” to provide a sense of “lessons learned” along with a forecast of rising trends. Among the most notable changes he foresaw in the emerging online ecosystem was a movement

Open Science in Support of the National Institutes of Health

The need to marshal collective efforts in addressing the challenges of modern technology is readily apparent at the NIH. After succeeding in documenting the full human genome in 2003, the NIH community faced the task of identifying the precise connections between the DNA-sequenced base pairs and predictions for disease process. This link is necessary for advances in combating diseases such as cancer, cardiovascular disease, and neurologic disorders.26 This “needle-in-a-haystack”27 problem can

What Is the Grid-Enabled Measures Database?

The purpose of the GEM database is to serve as a portal for health scientists who wish to take advantage of the benefits of Science 2.0 to accelerate scientific discovery. The GEM database has two overarching goals: (1) promote use of standardized measures, which are tied to theoretically based constructs; and (2) facilitate sharing of harmonized data resulting from the use of standardized measures. Although the process by which the GEM database achieves these goals is unique, the overall

Principles Guiding Development of the Grid-Enabled Measures Database

The GEM database was designed to facilitate these standardization and sharing processes across scientific disciplines. Necessarily, the GEM database was built to be flexible enough to accommodate different types of measures (e.g., from self-report to biological). The system can accommodate independent and dependent variables across the health continuum, including prevention, diagnosis, treatment, and end-of-life issues, regardless of disease/wellness focus. Using principles of Science 2.0, GEM

Data-Driven Decisions

Identification of one measure as better than others and a decision to promote for standard use should be data-driven. In the GEM database, users are given data to help with this process. These data include subjective outcomes such as averages and distributions of ratings, objective measures such as he number of times a measure has been downloaded from the database, and more traditional psychometric information such as a measure's reliability and validity.

Wisdom of the Masses

Though crowd sourcing has its detractors, and even proponents acknowledge that it doesn't work effectively under all circumstances, the characteristics of the GEM database and its user community would suggest that the conditions for its use are optimal under the following conditions:40 (1) diversity of opinion: The users of the database will likely come from a variety of academic disciplines and have different scientific interests; (2) independence of members: Users will not be collectively

Open Access

Open access is an essential component of the philosophy underlying the GEM database. Users can easily find meta-data about measures and data sets. The measures themselves can be downloaded easily (where publicly available), and data can be identified and accessed in several different formats that should facilitate data sharing. However, the move toward open collaboration works only insofar as those working together are willing to share information or data and are willing to sublimate their own

The Importance/Challenges of Creating Ontologies

To meet the goals of the GEM database, common understanding of terms is essential. An ontology is a way of representing a knowledge domain to enable “a shared and common understanding that can be communicated between people and heterogeneous and distributed application systems.”44 Development of a useful ontology requires that the terms are defined with sufficient context so if two people—or computer systems—use the same terms, there is complete and correct communication between them. A good

The Grid-Enabled Measures Database As a Tool to Promote Integrative Data Analysis

Using standardized measures and sharing harmonized data are useful only if they ultimately result in more-efficient science and improved outcomes. As a tool, the GEM database can facilitate this improved way of doing science through its ability to combine shared constructs and associated measures across independent data sets. Scientific discovery can advance more quickly by combining data to create a cumulative knowledge base—literally standing on the shoulders of giants—as opposed to the

Providing Incentives for Use of the Grid-Enabled Measures Database and Addressing Issues Regarding Data Sharing

Scientists, especially those in academia, typically do not have institutional incentives to agree on standard measures and to share data with other scientists. Academia tends to reward researchers for individual scholarly productivity, not those who work collaboratively and share resources. Historically, there has been little support to engage with others outside of one's own discipline.48 But this individualism leads to only a continued fractured scientific landscape. Using agreed-on measures

The Changing Landscape

The GEM database is certainly not the first tool of its kind to promote standardizing measures and sharing data. In fact, numerous other examples of both exist within and outside of government. Projects such as PROMIS,10 PhenX, (https://www.phenx.org/; see Schad et al. this issue), and the NIH Toolbox (for the assessment of neurologic and behavioral functioning; www.nihtoolbox.org/default.aspx) have similar goals but use different processes. The GEM database tends to focus on the use of a

References (49)

  • K.H. Buetow et al.

    Infrastructure for a learning health care system: CaBIG

    Health Aff (Millwood)

    (2009)
  • M.C. Gibbons

    eHealth solutions for healthcare disparities

    (2007)
  • L.M. Etheredge

    A rapid-learning health system

    Health Aff (Millwood)

    (2007)
  • P.J. Stover et al.

    PhenX: a toolkit for interdisciplinary genetics research

    Curr Opin Lipidol

    (2010)
  • B.M. Psaty et al.

    Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium: design of prospective meta-analyses of genome-wide association studies from 5 cohorts

    Circ Cardiovasc Genet

    (2009)
  • Realizing the full potential of health information technology to improve healthcare for Americans: the path forward

    (2010)
  • M.M. Waldrop

    Science 2.0

    Sci Am

    (2008)
  • B. Shneiderman

    Computer scienceScience 2.0

    Science

    (2008)
  • B.W. Hesse et al.

    Information support for cancer survivors

    Cancer

    (2008)
  • B.W. Hesse et al.

    Surveys of physicians and electronic health information

    N Engl J Med

    (2010)
  • S. Fox

    Cancer 2.0: a summary of recent research

    (2010)
  • B.W. Hesse et al.

    Social participation in health 2.0

    IEEE Computer

    (2010)
  • T. Ferguson

    E-patients: how they can help us heal healthcare

    (2007)
  • Open Health Toolkit: innovation in the global economy

    (2008)
  • Cited by (35)

    • Behavioral research in cancer prevention and control: A look to the future

      2014, American Journal of Preventive Medicine
      Citation Excerpt :

      Successful data harmonization requires both consensus measure development and co-calibrated measures. It requires research infrastructure, such as the NCI Grid-Enabled Measures (GEM) portal,94 to support sharing of measures and their attributes. Despite attempts to harmonize health behavior theories or their component constructs, the field has more theories and constructs than ever before (many with limited empirical support).95,96

    • Procurement of shared data instruments for Research Electronic Data Capture (REDCap)

      2013, Journal of Biomedical Informatics
      Citation Excerpt :

      Moreover, researchers can recommend validated data dictionaries for inclusion into the library for potential collaborators to use for their projects. A similar effort is ongoing in the behavioral and social science field as part of the Grid-Enabled Measures (GEM) project built upon the caBIG® platform [19]. However, what differentiates SDIL from GEM is the fact that the instruments are already encoded in a consumable REDCap data dictionary format and can be readily incorporated into new research projects or pre-existing projects’ data dictionaries by research teams at any institution using the REDCap platform.

    View all citing articles on Scopus
    View full text