This thesis investigates the modeling of human emotion perception in music, placing particular emphasis upon techniques based on, and results obtained from, quantitative analysis. Time-varying emotion annotations of a data set constrained in style (mostly Romantic) and timbre (mostly piano) form the basis for analyses addressing the relatively unexplored problem of modeling individual annotators' responses. Existing and newly developed techniques are presented for extracting loudness, spectral centroid, onset density, articulation, and mode features. Multiple linear regression analyses of each individual's annotations against the extracted audio features reveal similarities and differences in annotation detail and utilization of audio cues across annotators. Further analyses evaluate the effectiveness of the feature set for predicting each annotator's response. A proposed color-based emotion visualization method provides a basis for future exploratory investigations in automatic emotion detection and tracking. |