Font Size: a A A

Multimodal Affect Modeling in Task-Oriented Tutorial Dialogue

Posted on:2015-12-25Degree:Ph.DType:Dissertation
University:North Carolina State UniversityCandidate:Grafsgaard, Joseph FFull Text:PDF
GTID:1477390020452417Subject:Computer Science
Abstract/Summary:
Recent years have seen a growing recognition of the central role of affect and motivation in learning. In particular, learning-centered cognitive-affective states such as anxiety, confusion, engagement, and frustration may co-occur with positive or negative cycles of learning. Just as highly effective human tutors pay attention to more than whether a student is simply correct or incorrect, automated approaches may be used to identify and understand students' nonverbal behaviors. Observed channels such as posture, gesture and facial expression provide key insights into students' affective and motivational states.;As students engage in computer-mediated task-oriented dialogue with a human tutor, their affective states are expressed through posture, gesture, and facial expression. Posture and gesture were tracked using novel algorithms on Kinect depth images. This allowed for automated labeling of whether a student was shifting posture or contacting the lower face with a hand. Facial expression was recognized using an existing tool, the Computer Expression Recognition Toolbox (CERT), which identified fine-grained facial movements at each frame of video. Prior analyses within this body of work have identified associations of nonverbal behavior and learning-centered affective states, such as anxiety, effortful concentration, engagement, and frustration.;Preliminary work used hidden Markov models (HMMs) to analyze sequences of affective tutorial interaction. Descriptive HMMs were machine-learned from the combined context of facial expression, tutorial dialogue, and task progress in an unsupervised approach using the Baum-Welch algorithm. Subsequent work focused on development of multimodal differential sequence mining, a technique that handles the large state space inherent in automatically generated multimodal data streams. This novel extension of differential sequence mining was applied to the multimodal data streams of student task actions, dialogue, and nonverbal behavior, including facial expression, gesture, and posture. Multimodal sequences were found to be associated with tutorial outcomes of engagement, frustration, and learning gain. Additionally, incoming student characteristics of general and computer science self-efficacy were linked to other multimodal sequences. Among these results, one-hand-to-face gestures were found to occur more frequently with positive phenomena of engagement, learning, and general self-efficacy, while two-hands-to-face gestures occurred more frequently with frustration. This line of research improves automated understanding of learning-centered affect, with particular insights into how affect unfolds from moment to moment during tutoring. This may result in systems that treat student affect not as transient states, but instead as interconnected links in a student's path toward learning.
Keywords/Search Tags:Affect, Multimodal, Tutorial, Facial expression, Student, States, Dialogue
Related items