Font Size: a A A

Aligned corpus exploitation for developing bilingual document composition environments

Posted on:2001-01-04Degree:DrType:Thesis
University:Universidad de Deusto (Spain)Candidate:Casillas Rubio, ArantzaFull Text:PDF
GTID:2465390014457978Subject:Language
Abstract/Summary:
This thesis is about multilingual specialized document composition. We propose a composition methodology that combines different areas of NLP: corpus, structured document processing, markup languages, machine translation and multilingual generation. From an aligned corpus we create two linguistic resources, translation memories and document structures. These resources are employed in the composition of similar documents.; To implement this methodology we need to develop new algorithms for processing SGML documents, manipulating and creating translation memories from an aligned corpus.; In this thesis we demonstrate that SGML is the best markup language to annotate structured specialized documentation that will be used to compose similar documents.; To demonstrate the methodology effectiveness we have implemented a prototype called BiGentor. This prototype is a bilingual composition environment and uses all the implemented algorithms, we combine them to bring out administrative bilingual documents composition process.; The methodology and algorithms have been evaluated with an administrative bilingual (Spanish Basque) corpus.; We can conclude that our methodology can be applied to structured documents that are written with specialized language.
Keywords/Search Tags:Corpus, Document, Composition, Methodology, Bilingual, Specialized
Related items