Getting started and learning about our dataset
We started this project by looking at ways we could work with an external collaborator and their dataset. Luckily, Olin’s professors have a breadth of data-rich research projects going on and we were able to find and work with Professor Scott Hersey and the environmental data he collected in the low-income South African township of Kwadela. Because Scott works and teaches a class on campus, we were able to meet with our collaborator to understand the context of the data and ask questions about circumstances surrounding data collection and odd points and outliers; before this project we had to make up our own context for data sets and catch probable errors based on our limited knowledge of the dataset subject field.
During our meeting with Scott, we were able to identify tangible research questions for our project, understand how the data was expected to behave based on principles of chemistry and anthropology, and select a handful of actually useful columns of data from the 50+ column long csv file Scott had originally provided us. For example, we learned that homes in Kwadela usually have very poor insulation and rely on coal burning stoves for warmth. These same coal burning stoves throw large quantities of pollutants in the air and ultimately cause lung related health concerns for the inhabitants of Kwadela. By comparing trends in outdoor temperature, indoor temperature (which indicates stove usage), and pollutant concentrations, we should be able to see how closely stove use is connected to pollution and cold weather.