About Nick: Nick is the Data Science Manager at DataKind, an organization committed to harnessing the power of data science in the service of humanity. He loves empowering mission-driven organizations through data, and was previously a Data Scientist at the Center for Data Science and Public Policy and the Data Science for Social Good Fellowship.
About Neal: Neal is Director of Social Impact at Tableau Software and Director of Tableau Foundation, which encourages the use of facts and analytical reasoning to solve the world’s problems. Neal has served in both private and nonprofit senior leadership positions at intersection of information technology and social change.
Or, at least, they don’t think they have the data to do so.
The truth is, in this awesome age of the Internet, satellites, sensors and extreme connectivity, there’s an abundance of data being created all around us that can be mined, understood, and harnessed to gain new insights about our world and transform almost every sector.
Using data for good is a journey - from finding the best data, structuring it to unleash its potential, and effectively communicating the results, the pathway is always different and never dull. We’re excited to share three of our favorite journeys below that not only show data is all around us – but that it can also be a tremendous force for good.
1. Scraping real-time data sources to estimate inflation - The World Bank
In 2009, Kenya was struggling through a major food crisis. One million people faced starvation and the problem was exacerbated by extreme inflation that was difficult for governments and banks to measure. It takes time to gather the food price data used to calculate inflation indices and the data exists only at the national level, ignoring regional differences. That doesn’t cut it when a crisis is in process.
On a DataKind project with the World Bank, Max Richman led a team of volunteers to scrape novel data sources in order to get better, even real-time, estimates of inflation:
- humuch.com: scraped banana prices by continent from this site recording global food prices
- mFarm: scraped 1,000 days of pricing information about dry maize, a staple food crop in Kenya, from this marketplace platform
- Pick n’ Pay: scraped pricing data for 11 essential food types from this South African grocery chain’s website
The team supplemented this data with data from the World Bank and the Food and Agriculture Organization of the U.N, and scraped data from three cost-of-living sites—Numbeo, Xpatulator, and Expatisan—to validate that the compiled information was accurate.
As a result, the World Bank was able to not only identify the prices of staple foods at subnational levels, but also used the information to detect an impending food crisis.
2. Using satellite imagery to measure poverty and target relief programs - GiveDirectly
GiveDirectly is a nonprofit that addresses poverty in rural Kenya and Uganda by sending direct cash transfers to households in need. Because research has shown that villages with more thatch roof homes tend to be lower income than villages with more metal roofs, they use roof-type as a simple yet effective proxy to measure poverty levels. Historically, they’ve sent surveyors out to count the ratio of thatch vs. metal roofs in a given village to determine where to target their funds, but this is a labor and cost intensive task.
It’s also a task that could be automated…with the right data.
Enter Google Maps! And DataKind volunteers Kush Varshney and Brian Abelson.
They developed an image processing and machine learning algorithm that used Google’s satellite imagery to classify whether a home had a thatch or metal roof. To train the algorithm though, they first needed to gather some labelled sample data. They developed a simple web-app and crowdsourced an effort to label nearly 1,500 images with households’ roof types.
Then, by using template matching, deriving color histogram features, and applying a random forest on top, they were able to identify roofs in a given area and whether they were thatch or metal, allowing GiveDirectly to determine where to target their services.
3. Fighting Ebola with Data - NetHope
When fighting a contagious and deadly disease like Ebola, it’s important for information about the disease to spread faster than the disease itself. That’s the only way humanitarian aid organizations and government officials can respond quickly enough to stop an outbreak in its tracks.
NetHope helped make this happen in Guinea. They are an NGO that provides information communications infrastructure support to humanitarian groups in disaster zones.
During the Ebola outbreak, NetHope and Tableau Foundation partnered to use data analytics as a weapon in the arsenal of resources organizations were using to stop the disease. NetHope needed to optimize the priority and placement of communications infrastructure so cell towers and expensive satellite devices were installed where they were needed most – near Ebola Treatment Units that lacked connectivity and in areas with high disease burdens.
Data was everywhere. Road data was on Open Street Maps. Disease data came from community health workers, NGOs and government sources while other data were on GitHub and the Humanitarian Data Exchange. It wasn’t always consistent or clean – but it was critical.
And the larger Tableau community was there to help. Tableau Zen Masters - expert Tableau customers - jumped in, and helped NetHope’s humanitarian crisis informatics expert, Dr. Jennifer Chan, wrangle the data into amazing visualizations.
One viz showed locations of cell towers, satellite equipment and Ebola Treatment Units, combined with data about the burden of disease so areas with the highest burden could get priority attention. Another viz created a common operational picture that allowed the dozens of responding NGOs to understand who was doing what and where resources could be more strategically utilized. It was a herculean effort made possible thanks to the generous contributions of skills and time from public, private, and NGO actors who came together to fight disease with data.
From scraped data to satellite imagery to visualized disparate data sources, we hope these stories make it clear: data is there for the wrangling and just needs to be unearthed to tackle tough challenges. For your next project, dig deep and get creative about finding and visualizing novel data sources.
Upcoming Reddit AMA
Interested in engaging with DataKind's founder Jake Porway? Join him next Wednesday, January 13 for a Reddit AMA on harnessing the power of data science for good.