Font Size: a A A

Generating Gestures from Speech for Virtual Humans Using Machine Learning Approaches

Posted on:2015-05-25Degree:Ph.DType:Thesis
University:University of Southern CaliforniaCandidate:Chiu, Chung-ChengFull Text:PDF
GTID:2478390017993503Subject:Computer Science
Abstract/Summary:
There is a growing demand for animated characters capable of simulating face-to-face interaction using the same verbal and nonverbal behavior that people use. For example, research in virtual human technology seeks to create autonomous characters capable of interacting with humans using spoken dialog. Further, as video games have moved beyond first person shooters, there is a tendency for gameplay to comprise more and more social interaction where virtual characters interact with each other and with the player's avatar. Common to these applications, the autonomous characters are expected to exhibit behaviors resembling a real human.;The focus of this work is generating realistic gestures for virtual characters, specifically the coverbal gestures that are performed in close relation to the content and timing of speech. A conventional approach for animating gestures is to construct gesture animations for each utterance the character speaks, by handcrafting animations or using motion capture techniques. The problem with this approach is that it is costly in time and money and is not even feasible for characters designed to generate novel utterances on the fly.;This thesis applies machine learning approaches to learn a data-driven gesture generator from human conversational data that can generate behavior for novel utterances and thereby saves development effort. This work assumes that learning to generate from speech is a feasible task. The framework exploits a gesture classification scheme about gestures to provide domain knowledge about gestures and help the machine learning models realize the generation of gestures from speech. The framework decomposes the overall learning problem of generating gestures into two tasks: one realizes the relation between speech and gesture classes and the other performs gesture generation based on the gesture classes. To facilitate the training process this research has used real-world conversation data involving dyadic interviews and motion capture data of human gesturing while speaking. The evaluation experiments assess the effectiveness of each component by comparing with state-of-the-art approaches and evaluate the overall performance by conducting studies involving human subjective evaluations. An alternative machine learning framework has also been proposed to compare with the framework addressed in this thesis. Overall, the evaluation experiments show the framework outperforms state-of-the-art approaches.;The central contribution of this research is a machine learning framework capable of learning to generate gestures from conversation data that can be collected from different individuals while preserving the motion style of specific speakers. In addition, our framework will allow the incorporation of data recorded through other media and thereby significantly enrich the training data. The resulting model provides an automatic approach for deriving a gesture generator which realizes the relation between speech and gestures. A secondary contribution is a novel time-series prediction algorithm that predicts gestures from the utterance. This prediction algorithm can address time-series problems with complex input and be applied to other applications that require classifying time series data.
Keywords/Search Tags:Gestures, Machine learning, Using, Speech, Human, Data, Virtual, Characters
Related items