Using Social Media to Predict and Track Disease Outbreaks
Using Social Media to Predict and Track Disease Outbreaks
A pioneer in this field, Brownstein worked with collaborators at Children's Hospital Boston to launch one of the earliest social media tools in infectious disease surveillance, a website called HealthMap (http://healthmap.org/) that mines news websites, government alerts, eyewitness accounts, and other data sources for outbreaks of various illnesses reported around the world. The site aggregates those cases on a global map, with outbreaks displayed in real time. Brownstein's team recently launched Outbreaks Near Me, an iPhone application that delivers HealthMap directly to cell-phone users. Their newest endeavor is Flu Near You (https://flunearyou.org/), a website created with the American Public Health Association and the Skoll Global Threats Fund of San Francisco, California, which allows individuals to serve as potential disease sentinels by reporting their health status on a weekly basis.
Traditional flu surveillance by the Centers for Disease Control and Prevention (CDC) relies on outpatient reporting and virological test results supplied by laboratories nationwide. That system confirms outbreaks within about 2 weeks after they begin, but social media can flag more immediate concerns, according to Ashley Fowlkes, an epidemiologist in the CDC Influenza Division.
One of the CDC's more recent collaborators is Google, to which millions of people turn for flu-related web searches. In September 2008, after the company's researchers showed that spikes in flu queries and disease outbreaks often coincide, Google launched Google Flu Trends (http://www.google.org/flutrends/), a website that allows people to compare volumes of flu-related search activity against reported incidence rates for the illness displayed graphically on a map. According to Fowlkes, the CDC monitors Google Flu Trends as a potential source for early warnings in locations where health officials might want to mount a response.
But Fowlkes also cautions that online search behavior might have no bearing on whether an outbreak is really occurring. For instance, when the popular singer Rihanna announced (via Twitter) that she had the flu in October 2011, flu-related web queries surged. The timing of the spike and the search terms used (such as "Rihanna" alongside "flu") suggest the queries were as much in response to public curiosity as anything else, Fowlkes says. "The Google Flu Trends system tries to account for that type of media bias by modeling search terms over time to see which ones remain stable," she says. Otherwise, it would be vulnerable to "noisy" queries (i.e., those that might have nothing to do with changes in disease incidence).
Google doesn't publicize its flu search terms for fear that malicious hackers might use them to undermine the system (for instance, by creating fake outbreaks). That's unlike Twitter, a fast-growing "microblogging" platform used by hundreds of millions of registered users who collectively send more than 200 million "tweets" a day. Each tweet is at most 140 characters, which is limited but still long enough to add contextual information beyond what search terms can offer. That makes it easier to exclude noisy tweets, and it also allows scientists to mine for content describing what people think about treatments and other issues that could be crucial for delivering better outbreak responses, says Philip Polgreen, an associate professor at the University of Iowa Carver College of Medicine.
Scientists have found that tweet streams closely track reported cases of influenza-like illnesses (ILIs), conditions that cause fever with cough or sore throat but that aren't necessarily influenza, which has its own viral etiology. In one study, Nello Cristianini, a professor at the University of Bristol, found that phrases containing terms such as "home worse," "cough night," "sore head," and "swine flu" tracked with reported ILI outbreaks throughout the United Kingdom. And Polgreen found that terms including "flu," "swine," "influenza," "symptom," "shortage," "hospital," and "infection," among many others, tracked user concerns during the H1N1 pandemic in 2009. What's more, he reported, Twitter content predicted flu outbreaks 1–2 weeks ahead of the CDC's surveillance average.
Fowlkes emphasizes that although flu-related tweet streams correlate with the CDC's ILI surveillance, they don't always match up with laboratory-confirmed influenza. "All the social media systems need to be compared back with virologic data to see how well they correlate with true influenza infection," she says. "Otherwise you risk treating the wrong people."
Marcel Salathé, an assistant professor at The Pennsylvania State University, says that open access is in part what makes Twitter so promising as a health research tool. "I respect Google and what they're doing with Google Flu Trends, but those data are closed and proprietary, so scientists can't use them," he says. "On the other hand, tweets are full of slang, but we can use machine-learning algorithms to make sense of those messages."
Improving Surveillance
A pioneer in this field, Brownstein worked with collaborators at Children's Hospital Boston to launch one of the earliest social media tools in infectious disease surveillance, a website called HealthMap (http://healthmap.org/) that mines news websites, government alerts, eyewitness accounts, and other data sources for outbreaks of various illnesses reported around the world. The site aggregates those cases on a global map, with outbreaks displayed in real time. Brownstein's team recently launched Outbreaks Near Me, an iPhone application that delivers HealthMap directly to cell-phone users. Their newest endeavor is Flu Near You (https://flunearyou.org/), a website created with the American Public Health Association and the Skoll Global Threats Fund of San Francisco, California, which allows individuals to serve as potential disease sentinels by reporting their health status on a weekly basis.
Traditional flu surveillance by the Centers for Disease Control and Prevention (CDC) relies on outpatient reporting and virological test results supplied by laboratories nationwide. That system confirms outbreaks within about 2 weeks after they begin, but social media can flag more immediate concerns, according to Ashley Fowlkes, an epidemiologist in the CDC Influenza Division.
One of the CDC's more recent collaborators is Google, to which millions of people turn for flu-related web searches. In September 2008, after the company's researchers showed that spikes in flu queries and disease outbreaks often coincide, Google launched Google Flu Trends (http://www.google.org/flutrends/), a website that allows people to compare volumes of flu-related search activity against reported incidence rates for the illness displayed graphically on a map. According to Fowlkes, the CDC monitors Google Flu Trends as a potential source for early warnings in locations where health officials might want to mount a response.
But Fowlkes also cautions that online search behavior might have no bearing on whether an outbreak is really occurring. For instance, when the popular singer Rihanna announced (via Twitter) that she had the flu in October 2011, flu-related web queries surged. The timing of the spike and the search terms used (such as "Rihanna" alongside "flu") suggest the queries were as much in response to public curiosity as anything else, Fowlkes says. "The Google Flu Trends system tries to account for that type of media bias by modeling search terms over time to see which ones remain stable," she says. Otherwise, it would be vulnerable to "noisy" queries (i.e., those that might have nothing to do with changes in disease incidence).
Google doesn't publicize its flu search terms for fear that malicious hackers might use them to undermine the system (for instance, by creating fake outbreaks). That's unlike Twitter, a fast-growing "microblogging" platform used by hundreds of millions of registered users who collectively send more than 200 million "tweets" a day. Each tweet is at most 140 characters, which is limited but still long enough to add contextual information beyond what search terms can offer. That makes it easier to exclude noisy tweets, and it also allows scientists to mine for content describing what people think about treatments and other issues that could be crucial for delivering better outbreak responses, says Philip Polgreen, an associate professor at the University of Iowa Carver College of Medicine.
Scientists have found that tweet streams closely track reported cases of influenza-like illnesses (ILIs), conditions that cause fever with cough or sore throat but that aren't necessarily influenza, which has its own viral etiology. In one study, Nello Cristianini, a professor at the University of Bristol, found that phrases containing terms such as "home worse," "cough night," "sore head," and "swine flu" tracked with reported ILI outbreaks throughout the United Kingdom. And Polgreen found that terms including "flu," "swine," "influenza," "symptom," "shortage," "hospital," and "infection," among many others, tracked user concerns during the H1N1 pandemic in 2009. What's more, he reported, Twitter content predicted flu outbreaks 1–2 weeks ahead of the CDC's surveillance average.
Fowlkes emphasizes that although flu-related tweet streams correlate with the CDC's ILI surveillance, they don't always match up with laboratory-confirmed influenza. "All the social media systems need to be compared back with virologic data to see how well they correlate with true influenza infection," she says. "Otherwise you risk treating the wrong people."
Marcel Salathé, an assistant professor at The Pennsylvania State University, says that open access is in part what makes Twitter so promising as a health research tool. "I respect Google and what they're doing with Google Flu Trends, but those data are closed and proprietary, so scientists can't use them," he says. "On the other hand, tweets are full of slang, but we can use machine-learning algorithms to make sense of those messages."