How valuable is medical social media data? Content analysis of the medical web

doi:10.1016/j.ins.2009.01.025

Information Sciences

Volume 179, Issue 12, 30 May 2009, Pages 1870-1880

https://doi.org/10.1016/j.ins.2009.01.025 Get rights and content

Abstract

It is still an open question where to search for complying a specific information need due to the large amount and diversity of information available. In this paper, a content analysis of health-related information provided in the Web is performed to get an overview on the medical content available. In particular, the content of medical Question & Answer Portals, medical weblogs, medical reviews and Wikis is compared. For this purpose, medical concepts are extracted from the text material with existing extraction technology. Based on these concepts, the content of the different knowledge resources is compared. Since medical weblogs describe experiences as well as information, it is of large interest to be able to distinguish between informative and affective posts. For this reason, a method to classify blogs based on their information content is presented, which exploits high-level features describing the medical and affective content of blog posts. The results show that there are substantial differences in the content of various health-related Web resources. Weblogs and answer portals mainly deal with diseases and medications. The Wiki and the encyclopedia provide more information on anatomy and procedures. While patients and nurses describe personal aspects of their life, doctors aim to present health-related information in their blog posts. The knowledge on content differences and information content can be exploited by search engines to improve ranking, search and to direct users to appropriate knowledge sources.

Introduction

Electronic media are increasingly used to obtain medical information and advice. Health information on the Internet ranges from personal experiences of medical conditions and patient discussion groups to peer reviewed journal articles and clinical decision support tools. A study on how consumers in America search for health-related information¹ shows that the Web is the most widely used resource for health information. Nevertheless, finding the best knowledge source to comply a specific information need is difficult, because relevant information can be either hidden in web pages or encapsulated in social media data such as blogs and Q&A portals. Through content analysis, this paper tries to give an overview on content differences in the various social media resources on health-related topics.

We focus on health-related information provided in the Internet for two reasons. First, health-related experiences and medical histories offer unique data for research purposes, for practitioners, and for patients. Second, it is still an open question whether existing text and content analysis tools are able to process medical social media data and to identify relevant (medical) information out of them.

Section snippets

Analysis and assessing social media data in medicine

In the last couple of years, research interest in social media analysis increased due to the growing user interest in these tools. Most of the works focused on weblogs. One research aspect is the analysis of social aspects in weblog communities [16], [15]. Sekiguchi et al. detect topics of blogs based on interest similarities of users [22]. Approaches to content analysis and topic detection from weblogs work on determining information diffusion through blogspace [11] or analyze the sentiment of

Research questions

Weblogs and other social media data gain influence, and for this reason, more sophisticated access to this data needs to be provided. Since different user groups have different requirements on the type of information requested, a search engine should enable patients and health care professionals to find experiences or information on diagnoses, treatments or medications, and to restrict search results to texts written by a particular author class (e.g., by a physician, a nurse, and a patient) or

Research design

In the Internet, different sources of health-related information can be found. Our work focuses mainly on social media tools, in particular, on answer portals, Wikis, Reviews and weblogs that are well known or that are provided by famous communities or institutes (e.g., Mayo Clinic, National Library of Medicine). In Section 4.1, the data collection is described that has been crawled from the indicated web pages. For the analysis whose results are described in Section 6, methods that identify

Evaluation methodology

Before we apply the introduced method for blog post classification on the weblog dataset, its performance in a 10-fold cross-validation is tested. For this purpose, some weblogs from all author groups have been randomly selected. The corresponding 1509 posts were classified manually affective and informative. The evaluation corpus is almost balanced and consists of 771 affective and 738 informative posts. Table 2 shows the distribution on the two different classes per author group.

The purpose

Content analysis results

In Section 6.1, we study the medical content of the five resources of our dataset. In Section 6.2, the distribution of the two information types on the weblog dataset is presented. The results are discussed in Section 7.1.

Limitations and discussion of the results

Several conclusions can be drawn from the aforementioned results. Our hypotheses proved only to be partly true. Instead of offering a large diversity on topics as hypothesized, a focus on anatomy could be identified in the Wiki and the encyclopedia. We conclude that the latter are best suited to find information on anatomy, while people searching for information on disorders should be directed to weblogs or Q&A portals. We remark that we only considered one Wiki and one encyclopedia and that

References (28)

E. Agichtein, Finding high-quality content in social media, in: WSDM 2008: Proceedings of the International Conference...
A. Aronson, Effective mapping of biomedical text to the UMLS metathesaurus: the MetaMap, in: Program Proceedings of the...
P.A. Bath
Health informatics: current issues and challenges
Journal of Information Science
(2001)
F.R. Chaumartin, UPAR7: a knowledge-based system for headline sentiment tagging, in: Proceedings of the SemEval...
K. Denecke
Semantic structuring of and information extraction from medical documents using the UMLS
Methods of Information in Medicine
(2008)
G.W. Ryan et al.
Data management and analysis methods
A. Devitt, K. Ahmad, Sentiment polarity identification in financial news: a cohesion-based approach, in: Proceedings of...
A. Esuli, F. Sebastiani, SentiWordNet: a publicly available lexical resource for opinion mining, in: Proceedings of the...
G. Eysenbach, C. Kohler, What is the prevalence of health-related searches on the World Wide Web? Qualitative and...
C. Friedman, A broad-coverage natural language processing system, in: Proceedings of the AMIA Annual Symposium, 2000,...

D. Gruhl, R. Guha, D. Liben-Nowell, A. Tomkins, Information diffusion through blogspace, in: Proceedings of the 13th...

S.C. Herring

Longitudinal content analysis of weblogs: 2006–2007

J. Hillan

Physician use of patient-centered weblogs and online journals

Clinical Medicine and Research

(2003)

W. Himmel, U. Reincke, H.W. Michelmann, Using text mining to classify lay requests to a medical expert forum and to...

Cited by (143)

Public library needs assessment to build a community-based library: Triangulation method with a social media data analysis
2022, Library and Information Science Research
Citation Excerpt :
Reviews and information sharing posts (word-of-mouth) on social network services (SNS) and social online forum sites significantly impact the formation of public attitudes toward specific targets, and individuals are increasingly using such social sites for communication (Kwon & Wen, 2010). As the use of the Internet in daily life is increasing and information delivery and communication, such as positive or negative word-of-mouth, are more frequently performed through SNS or online sites, various issues, including saving people's time and money, must be considered in collecting large amounts of data using conventional data collection methods such as surveys (Denecke & Nejdl, 2009). The present study identified users' needs for libraries by analyzing the texts of SNS posts using a text mining technique and survey method.
Public libraries must respond to the needs of the communities they serve in order to remain relevant, but assessing these needs is especially challenging in the midst of the rapid development of information technology. This study examines needs assessments to understand the user community, library services, and expected sources to determine user needs regarding space and services. The research employed a mixed-method approach including semi-structured interviews, a questionnaire assessment, and Social Network Site (SNS) big data analysis. The study assessed the needs and characteristics of users and non-users at Yongsan-gu Public Library, South Korea. Data collected were used to examine how the library was affected by COVID-19, the steps it taken to adjust and provide services, and how users have adapted to library use during the pandemic. The research results provide direction for building a future public library in regions that lack cultural infrastructure. The results also demonstrate that it is necessary to construct infrastructure linked to cultural projects by creating complex cultural and user-oriented spaces.
Pharmacy and medical students' attitudes and perspectives on social media usage and e-professionalism in United Arab Emirates
2021, Currents in Pharmacy Teaching and Learning
Citation Excerpt :
Social media is an internet-based form of communication that enables users to create content, share, and/or exchange information and ideas in virtual communities and networks.1 It can be classified on the basis of social functions into collaborative projects such as Wikipedia, independent blogs or microblogs (e.g., Twitter, Tumblr), content communities (e.g., YouTube, Instagram), other social networking sites (e.g., Facebook, LinkedIn) and certain virtual games that contain elements of social networking within their applications.2,3 With the rise of social media technologies among today's undergraduate learners, it is imperative to examine the perceived impact that such platforms can have on education and on building their professional identities.
It is imperative to establish how students view and present themselves on social media and to assess level of awareness regarding the implications of their social media presence, e-professionalism, and accountability. The study objectives were to: 1) Determine the social media usage levels among medical and pharmacy students in the United Arab Emirates (UAE); 2) Characterize the students' views and perceptions, including their awareness of e-professionalism; and 3) Compare the responses in behavior between the two groups.
A cross-sectional study was performed on 575 undergraduate students from two study disciplines, pharmacy (n = 325) and medicine (n = 250). Minor revisions were made to previously validated assessment tools and pilot tested. The study sample included students from five different universities across the country.
In comparison to medical students, pharmacy students were observed to use social media more for learning purposes (χ² = 6.8, P < .05). However, medical students' opinions reflected more strongly on the context of accountability and e-professionalism (χ² = 15.8, P < .05). A considerable proportion (89%) of students felt it was discriminatory for prospective employers to use their social media profile information for investigative purposes while hiring. One-third of respondents reported sharing information that they would not want their employers to view, and 67.1% reported the same for information relevant to patients.
The research findings converge to address the need for educators and administrators in the UAE to develop guidelines concerning its safe use and proactively integrate e-professionalism into their respective curriculum.
Drivers of vulnerability to medicine smuggling and combat strategies: a qualitative study based on online news media analysis in Iran
2024, BMC Health Services Research
Sentiment Analysis of Pharmaceutical Data on Social Media: Nooj as an NLP Processor
2024, Lecture Notes in Networks and Systems
Attracting solvers' participation in crowdsourcing contests: The role of linguistic signals in task descriptions
2024, Information Systems Journal
Development and application of machine learning algorithms for sentiment analysis in digital manufacturing: A pathway for enhanced customer feedback
2023, Emerging Technologies in Digital Manufacturing and Smart Factories

View all citing articles on Scopus

View full text