On February 14, Analytics at Wharton, Wharton Customer Analytics, Penn Engineering, and Wharton Statistics collaborated to host the first Women in Data Science Conference (WiDS) at Penn. Among the impressive roster of PhD students, industry professionals, and professors that presented on a variety of topics were three Wharton undergrads who are working to develop a model that could help prevent forest fires in California.
Emily Fu, Zhun Yan Chang, and Melisa Lee (all W’21) began their research in Prof. Linda Zhao’s Modern Data Mining class for a final project. Emily, a southern California native, had a personal connection to the topic. “In late 2003, a fire broke out in the San Bernardino Mountains causing the displacement of more than 80,000 people,” she said. “My family was one of the families that ended up being evacuated.”
The group sought to answer a specific question: “Given its features, can we predict the size of a fire?”
Moving From Class Project to Real-World Impact
Answering that question posed several obstacles. Not only were the datasets difficult to aggregate, but they were also not usable when they were first pulled. For example, there were inconsistencies between government fire datasets over the years which required the team to develop a script to ‘clean’ the data so they could use it in their model.
The largest obstacle, however, was using the data to tell a story. Many different variables are taken into account in the study of forest fires — such as the time of year, type of fire, and vegetation type — and the group needed to find a way to distill all of these variables into a narrative that had an actionable solution.
The team used random forest, a complex statistical technique that aggregates multiple decision ‘trees’ to identify important variables, but there were still too many variables. Having identified that the largest fires tend to occur during the summer and that the biggest spike was in lightning fires, the team decided to narrow their focus on lightning fires. They found that the top vegetation fuel type predictor, timber litter, was highly correlated with lightning fires, and so they proposed lighting controlled fires early in the year to prevent fuel buildup during fire season. This way, layers of dead vegetation could be removed by several small controlled fires and prevent large uncontrollable fires from occurring.
Given the nature of the project, they also hope to create a tangible impact with their findings, something that they did not anticipate at first but are currently looking into. “Our next steps are working on an article to present our findings that others can reference, and potentially work with Professor Zhao to do some impactful work with the California government,” Melisa said.
Preparing for the Conference
When the team found out they were invited to speak at the conference over the winter break, they were excited to be part of an event with other high profile speakers.
Prof. Zhao has been serving as a helpful mentor and advisor and has pushed them to go above and beyond the original purpose of their work.
For the team, the WiDS Conference felt like a positive way to support women in data science. “I think the conference is a great step. Seeing women in the field is a great step. Because you never know that you’re represented — the people you see in clubs and other domains are predominantly male,” Zhun Yan said.
“It’s important to remember as a female student that you are entitled to your education. Women have just as much right as men to participate in class and voice their opinions and should not feel imposter syndrome,” Melisa emphasized.
— Jonathan Lahdo
Posted: March 6, 2020