Catalyst, Features, Weeklies

Twitter: A disease-networking site

A University of Rochester study can predict health using lifestyle information on Twitter. Photo Courtesy Adam Sadilek
A University of Rochester study can predict health using lifestyle information on Twitter. Photo Courtesy Adam Sadilek

Even people without Twitter accounts are familiar with the little blue bird that flutters around social networking sites, tweeting messages of 140 characters or fewer. Many social networkers have joined in on the trend, tweeting and retweeting about anything from breaking news to the latest photo of Grumpy Cat.

However, a recent University of Rochester study suggests that Twitter is no longer just a site for interactions between friends.

Professor of computer science, Henry Kautz, and postdoctoral researcher, Adam Sadilek, have discovered a new use for the website — one that may have major implications for the field of public health.

The paper, which was presented at the International Conference on Web Searching and Data Mining on Feb. 8, introduced a model that quantified the effects of certain lifestyle factors on health using data from Twitter.

Researchers claim the new model will predict the future health status of an individual with 91 percent accuracy.


The data

Researchers studied a sample of public tweets from the New York area, a data set representative of about 8 percent of the area’s population, according to the study.

Many Twitter users have public accounts, making it a quick and easy source of data, Kautz said. Traditional statistical analysis involves conducting surveys of individuals and medical professionals, which is not always efficient.

“We look at this as a way to augment the traditional work done in public health and epidemiology where you have to gather data from many institutions,” Kautz said. “This can become a very expensive and time-consuming process.”

To create a subset of the tweets, researchers focused on active users. Kautz described active users as people who posted at least three times a day during a three-month period.

Researchers were interested in studying certain lifestyle factors, such as social status, exposure to pollution and interpersonal interactions. Kautz said he and Sadilek wanted to connect tweets indicative of these factors to public health. For instance, when someone tweets about being ill, the cause of illness can be linked to his or her behavior and lifestyle.

“There is a lot of interest not only in tracking disease, but in determining which factors are actually influencing or causing disease,” Kautz said.

Because most tweeting is done from mobile devices, a tweet is often embedded with a geographic coordinate. Kautz explained how these geo-tagged tweets provide snapshots of millions of people in real time, indicating what they are doing and where they are doing it. This instantaneous picture may be beneficial, according to Eileen O’Keefe, a clinical associate professor and director of the Program in Health Sciences at Boston University.

“There tends to be a time lag in collecting and analyzing data,” O’Keefe said. “I see this type of data analysis being useful in acute situations, such as emergencies, storms or in disease outbreak.”


The technology

Researchers began by classifying tweets as ‘connected’ or ‘not connected’ to illness, based on their content, Kautz said. After manually inferring the health state of a single user, the researchers applied machine-learning techniques to the mined data to classify tweets based on key words.

The model, Kautz said, not only classifies individual words, but also categorizes groups of words. It can predict who is sick based on these key words in tweets.

GermTracker is a phone application that was created by the researchers using the geo-tagged tweets to map out where illnesses are located. An individual can use this application to track the spread of illnesses.

Kwansupa Panyawuthikrai, a graduate student studying innovation and technology in the Metropolitan College, said she would use this application.

“I would be more aware of what’s going on in the area,” said Panyawuthikrai. “I could choose where to go and where not to go. That would be great.”

However, a College of Engineering junior, Nikolaus Roman, said he was skeptical of the application.

“It seems like a good idea at first, but I feel like overall it’s not going to have a huge impact on public health,” Roman said.


The findings

Kautz said he used the study’s findings to determine correlations between lifestyle factors and an individual’s likelihood of becoming ill.

An individual’s social status, for example, is related to his or her health. Past animal and human studies indicate that those with a higher social status have a better immune system, according to the study. The theory is that a lower social status, on the other hand, is linked to higher social stress, which impairs immune responses.Researchers found a positive correlation between health and visits to public parks, Kautz said. They also found a negative correlation between health and the exposure to bars, gyms and public transportation.


Skeptics of the study

Although O’Keefe expressed interest in the research, she did not see it as an alternative to traditional methods.

“This research will complement what we’re doing in public health, but it will not replace it,” O’Keefe said.

The sampling, O’Keefe explained, is a limitation because it is not a random sample. An online Pew report calculated that only 13 percent of online adults use Twitter and an even smaller percentage keep their Twitter accounts public.

The paper cited these limitations and agreed that younger people and minorities represent the majority of Twitter users. The data does not represent older populations or those who do not use social media.

“To make a decision about public health, the data must be representative of the entire population,” O’Keefe added.

BU professor of epidemiology, Wayne LaMorte, agreed with this limitation and listed others.

LaMorte said “crisp definitions” are vital in conducting studies such as this. Adherence to such limitations was his biggest concern with the study, he said. An individual tweeting that he or she is sick, for example, could be sick with anything from a runny nose to a more serious illness.

LaMorte also said environmental exposures can be easily misclassified.

“Suppose I take a bus that drives past a gym every day and send a tweet as I do this,” LaMorte said. “Is the tweet going to read it as my having entered the gym?”

He said there are much more precise ways of collecting data.

“I can conduct a survey and provide definitions of what a cold is. I can ask if and how often you go to the gym or wash your hands,” LaMorte said. “They are saying I cannot do this as easily, but I can.”

Both LaMorte and O’Keefe agreed, however, that this use of technology will be valuable in the future. LaMorte cited newer technologies that already track illnesses, such as Google Flu Trends.


Reactions from the BU community

Students exhibited mixed attitudes toward the potential impact of this research.

Suzanne Cimolino, a sophomore in the College of General Studies, said the study seemed to be missing something.

“It seems like it needs something else, because how many people tweet about not feeling well?” Cimolino said. “Most of the people that I follow just tweet about stupid stuff.”

Unlike Cimolino, College of Arts and Science senior, Hasan Alhelo, said these findings could be helpful for the future of public health. He said implementing this type of technology on a large scale could bring researchers one step closer to preventing disease.

“This can provide an adequate estimation of geographical zones of high prevalence and high incidence of disease, making it much easier to locate the source of disease and to intervene,” Alhelo said.


The future

Kautz said he is designing a new study that will confirm the validity of these findings.

“We will contact a sample of Twitter users whose tweets are classified as ‘flu-related’ and reimburse them to come into the hospital,” he said. These individuals will be given a blood test to confirm the presence of flu antibodies. Kautz said the research team also plans to expand their studies to encompass other illnesses, such as depression.

“Using this same technique, we can find people who are depressed and discover how depression can be viewed in terms of interactions with a social network,” he said.

The main goal, Kautz said, is to extend these studies to observe connections between global movements of individuals and the spread of disease.

Kautz explained that if researchers are able to observe disease outbreaks in one city and see an individual move from that location to another, they might be able to predict disease outbreaks in other areas.


  1. By means of internet one can excess his favorite shopping website and
    can select and shop the desired product. This gives a fair idea to the customer as to whether to make
    or not to make or change the purchase decision. If the “https” is not present, then fraud is a possibility;
    your data is not guaranteed safe storage.

  2. Attackers are using these sites because connections to
    them won’t seem suspicious and are unlikely to get blocked by a domain.