| Natural communication between humans is not limited to speech, but often requires simultaneous coordination of multiple streams of information---especially hand gestures---to complement or supplement understanding. This thesis describes a software architecture, called CLAVIUS Whose purpose is to generically interpret multiple modes of input as singular semantic utterances through a modular programming interface that supports various sensing technologies. This interpretation is accomplished through a new multi-threaded parsing algorithm that co-ordinates top-down and bottom-up methods asynchronously on graph-based unification grammars. The interpretation process follows a best-first approach where partial parses are evaluated by a combination of scoring metrics, related to such criteria as information content, grammatical structure and language models. Furthermore, CLAVIUS relaxes two traditional constraints in conventional parsing---namely, it abandons forced relative ordering of right-hand constituents in grammar rules, and it allows parses to be expanded with null constituents.;The effects of this parsing methodology, and of the scoring criteria it employs, are analyzed within the context of experiments and data collection on a small group of users. Both CLAVIUS and its component modules are trained on this data, and results show improvements in performance accuracy, and the overcoming of several difficulties in other multimodal frameworks. General discussion as to the linguistic behaviour of speakers in a multimodal context are also described. |