Elsevier

Information Sciences

Volume 179, Issue 12, 30 May 2009, Pages 1870-1880
Information Sciences

How valuable is medical social media data? Content analysis of the medical web

https://doi.org/10.1016/j.ins.2009.01.025Get rights and content

Abstract

It is still an open question where to search for complying a specific information need due to the large amount and diversity of information available. In this paper, a content analysis of health-related information provided in the Web is performed to get an overview on the medical content available. In particular, the content of medical Question & Answer Portals, medical weblogs, medical reviews and Wikis is compared. For this purpose, medical concepts are extracted from the text material with existing extraction technology. Based on these concepts, the content of the different knowledge resources is compared. Since medical weblogs describe experiences as well as information, it is of large interest to be able to distinguish between informative and affective posts. For this reason, a method to classify blogs based on their information content is presented, which exploits high-level features describing the medical and affective content of blog posts. The results show that there are substantial differences in the content of various health-related Web resources. Weblogs and answer portals mainly deal with diseases and medications. The Wiki and the encyclopedia provide more information on anatomy and procedures. While patients and nurses describe personal aspects of their life, doctors aim to present health-related information in their blog posts. The knowledge on content differences and information content can be exploited by search engines to improve ranking, search and to direct users to appropriate knowledge sources.

Introduction

Electronic media are increasingly used to obtain medical information and advice. Health information on the Internet ranges from personal experiences of medical conditions and patient discussion groups to peer reviewed journal articles and clinical decision support tools. A study on how consumers in America search for health-related information1 shows that the Web is the most widely used resource for health information. Nevertheless, finding the best knowledge source to comply a specific information need is difficult, because relevant information can be either hidden in web pages or encapsulated in social media data such as blogs and Q&A portals. Through content analysis, this paper tries to give an overview on content differences in the various social media resources on health-related topics.

We focus on health-related information provided in the Internet for two reasons. First, health-related experiences and medical histories offer unique data for research purposes, for practitioners, and for patients. Second, it is still an open question whether existing text and content analysis tools are able to process medical social media data and to identify relevant (medical) information out of them.

Section snippets

Analysis and assessing social media data in medicine

In the last couple of years, research interest in social media analysis increased due to the growing user interest in these tools. Most of the works focused on weblogs. One research aspect is the analysis of social aspects in weblog communities [16], [15]. Sekiguchi et al. detect topics of blogs based on interest similarities of users [22]. Approaches to content analysis and topic detection from weblogs work on determining information diffusion through blogspace [11] or analyze the sentiment of

Research questions

Weblogs and other social media data gain influence, and for this reason, more sophisticated access to this data needs to be provided. Since different user groups have different requirements on the type of information requested, a search engine should enable patients and health care professionals to find experiences or information on diagnoses, treatments or medications, and to restrict search results to texts written by a particular author class (e.g., by a physician, a nurse, and a patient) or

Research design

In the Internet, different sources of health-related information can be found. Our work focuses mainly on social media tools, in particular, on answer portals, Wikis, Reviews and weblogs that are well known or that are provided by famous communities or institutes (e.g., Mayo Clinic, National Library of Medicine). In Section 4.1, the data collection is described that has been crawled from the indicated web pages. For the analysis whose results are described in Section 6, methods that identify

Evaluation methodology

Before we apply the introduced method for blog post classification on the weblog dataset, its performance in a 10-fold cross-validation is tested. For this purpose, some weblogs from all author groups have been randomly selected. The corresponding 1509 posts were classified manually affective and informative. The evaluation corpus is almost balanced and consists of 771 affective and 738 informative posts. Table 2 shows the distribution on the two different classes per author group.

The purpose

Content analysis results

In Section 6.1, we study the medical content of the five resources of our dataset. In Section 6.2, the distribution of the two information types on the weblog dataset is presented. The results are discussed in Section 7.1.

Limitations and discussion of the results

Several conclusions can be drawn from the aforementioned results. Our hypotheses proved only to be partly true. Instead of offering a large diversity on topics as hypothesized, a focus on anatomy could be identified in the Wiki and the encyclopedia. We conclude that the latter are best suited to find information on anatomy, while people searching for information on disorders should be directed to weblogs or Q&A portals. We remark that we only considered one Wiki and one encyclopedia and that

References (28)

  • E. Agichtein, Finding high-quality content in social media, in: WSDM 2008: Proceedings of the International Conference...
  • A. Aronson, Effective mapping of biomedical text to the UMLS metathesaurus: the MetaMap, in: Program Proceedings of the...
  • P.A. Bath

    Health informatics: current issues and challenges

    Journal of Information Science

    (2001)
  • F.R. Chaumartin, UPAR7: a knowledge-based system for headline sentiment tagging, in: Proceedings of the SemEval...
  • K. Denecke

    Semantic structuring of and information extraction from medical documents using the UMLS

    Methods of Information in Medicine

    (2008)
  • G.W. Ryan et al.

    Data management and analysis methods

  • A. Devitt, K. Ahmad, Sentiment polarity identification in financial news: a cohesion-based approach, in: Proceedings of...
  • A. Esuli, F. Sebastiani, SentiWordNet: a publicly available lexical resource for opinion mining, in: Proceedings of the...
  • G. Eysenbach, C. Kohler, What is the prevalence of health-related searches on the World Wide Web? Qualitative and...
  • C. Friedman, A broad-coverage natural language processing system, in: Proceedings of the AMIA Annual Symposium, 2000,...
  • D. Gruhl, R. Guha, D. Liben-Nowell, A. Tomkins, Information diffusion through blogspace, in: Proceedings of the 13th...
  • S.C. Herring

    Longitudinal content analysis of weblogs: 2006–2007

  • J. Hillan

    Physician use of patient-centered weblogs and online journals

    Clinical Medicine and Research

    (2003)
  • W. Himmel, U. Reincke, H.W. Michelmann, Using text mining to classify lay requests to a medical expert forum and to...
  • Cited by (143)

    • Public library needs assessment to build a community-based library: Triangulation method with a social media data analysis

      2022, Library and Information Science Research
      Citation Excerpt :

      Reviews and information sharing posts (word-of-mouth) on social network services (SNS) and social online forum sites significantly impact the formation of public attitudes toward specific targets, and individuals are increasingly using such social sites for communication (Kwon & Wen, 2010). As the use of the Internet in daily life is increasing and information delivery and communication, such as positive or negative word-of-mouth, are more frequently performed through SNS or online sites, various issues, including saving people's time and money, must be considered in collecting large amounts of data using conventional data collection methods such as surveys (Denecke & Nejdl, 2009). The present study identified users' needs for libraries by analyzing the texts of SNS posts using a text mining technique and survey method.

    • Pharmacy and medical students' attitudes and perspectives on social media usage and e-professionalism in United Arab Emirates

      2021, Currents in Pharmacy Teaching and Learning
      Citation Excerpt :

      Social media is an internet-based form of communication that enables users to create content, share, and/or exchange information and ideas in virtual communities and networks.1 It can be classified on the basis of social functions into collaborative projects such as Wikipedia, independent blogs or microblogs (e.g., Twitter, Tumblr), content communities (e.g., YouTube, Instagram), other social networking sites (e.g., Facebook, LinkedIn) and certain virtual games that contain elements of social networking within their applications.2,3 With the rise of social media technologies among today's undergraduate learners, it is imperative to examine the perceived impact that such platforms can have on education and on building their professional identities.

    View all citing articles on Scopus
    View full text