Font Size: a A A

A comparison of classification methods in predicting the presence of DNA profiles in sexual assault kit

Posted on:2019-10-25Degree:M.SType:Thesis
University:Bowling Green State UniversityCandidate:Heckman, Derek JFull Text:PDF
GTID:2476390017485050Subject:Mathematics
Abstract/Summary:
In 2014 Ohio began the Sexual Assault Kit Testing Initiative with the goal of analyzing all previously untested sexual assault kits (SAKs). Approximately 13,900 previously untested SAKs were sent for forensic analysis. Of these SAKs, a sample of 2,500 was drawn for statistical analysis. The goal was to gain some general information about the SAKs as well as to answer a variety of specific questions in the hopes of producing cost-saving measures in the future. Questions considered were those such as: which forensic samples most consistently produce Combined DNA Index System (CODIS)-eligible DNA profiles, what factors predict whether or not a kit will contain a DNA profile foreign to the victim, as well as others. The results of the initiative were published in Kerka, Heckman, Maddox, Sprague, & Albert (2018).;This thesis expands upon the work in the aforementioned article. In the article, a logistic regression model was constructed to predict whether or not an SAK would contain a CODISeligible DNA profile. It was estimated to have a misclassification rate of 34.2%. This thesis compares three other models to the logistic regression model to determine if any improvements in performance can be made. The models tested were decision trees, bagged trees and random forests. The decision tree had an estimated misclassification rate of 29.7%, thus offering a moderate improvement over the logistic regression model. In addition, the same models were compared for their ability to predict which SAKs would contain duplicate DNA profiles across multiple forensic samples (vaginal sample, anal sample, etc). No model was able to do a satisfactory job of predicting this response.
Keywords/Search Tags:Sexual assault, DNA profiles, Predict, Logistic regression model
Related items