Forest fires pose significant environmental, economic, and public health risks. Being able to predict the likelihood of a fire based on various factors can help allocate resources and minimize potential damage. The objective of this project is to build and train machine learning models to predict forest fires, optimizing for recall to avoid missing fire incidents, which could lead to catastrophic consequences.
The dataset used contains information on different weather conditions, ground moisture content, geographic locations (X, Y coordinates), and fire occurrence details in a specific area.
The first step in our analysis was loading the dataset using a custom load_data
function and taking a quick look at its contents.
df = load_data(RAW_DATA)
The dataset consists of a specific number of rows and columns (df.shape
).
df.shape
To gain insights into the dataset's overall structure, including the presence of missing values and memory usage, we use:
df.info()