| This research studies two sets of big data, one from the National Fire Incident Reporting System (NFIRS), and the other from the National Oceanic and Atmospheric Administration (NOAA). The analytics were focused on the effect of weather plays on the damage done by human-caused fires. In the literature, virtually all big data analysis and predictive analytics were done on a single set of file, where the predictor also comes from the same set of data. The key contribution and insight of this work are to use two sets of data to combine into one single training set, then use the first set of data, the weather data from NOAA, to predict the fire risk, which is the second set from NFIRS. We analyzed 10 years worth of data, from 2005 to 2014, yielding about 47 million fire incidents across the United States. In this context, the losses are referred as "Total Percent Loss", which is the overall loss calculated in percentage. The losses are calculated based on all the content and property loss of a particular owner over the total values of content and property. Using the trends and patterns, such as how weather affects the behavior of fire in incidents from the raw data sets and transition into the training set. In the training set, only selected attributes were kept, which includes the state and month where the fire incident occurs, as well as the primary predictor, weather data (average temperature, minimum temperature, maximum temperature, average wind speed, and precipitation). We then implement this training set into a model using machine learning algorithm, Gradient Boosting Tree (GBT). We were able to use weather data to predict the damage done by human-caused fires with an accuracy of 93 percent, and mean squared error (MSE) of 124.641 out of 10000 from the GBT model. In the final result comparison between the actual loss versus the predicted loss for the unlabeled data, the accuracy of fitting is 97 percent. |