Nowadays, as there is a great demand for vehicles, the popularity for vehicles has a rapid growth. Thus, the information about users experience and communication on faults, defects and the non-standard service have multiplied at an astonishing rate along with this growth. How to make information processing from the user evaluation information about vehicles and find out the key information for the automobile manufacturers to improve the quality of vehicles so as to let the maintenance men position the trouble point in a more fast and correct way and finally get the goal of fast service. This will not only promote the healthy development of the auto industry but also improve people’s living quality.In order to search, reconfigure and reuse the information, people used to use information extraction technology. The main task of this technology is getting information that the users need from the unstructured or semi-structured documents and forming a machine-readable format. And then apply it in the related fields by means of statistics and data analysis.Ontology technology has a clear concept hierarchy. It makes the computer understand the priori-knowledge of the human by describing the concept and relations between the two classes. Ontology technology plays a guiding and comparison role in many areas, especially in the area of information extraction.This thesis proposes an effective model to solve text classification. This model can deeply investigate the relation between users evaluations and automotive faults by using the mean of integrated classification approach to analyze the vast quantities of user evaluation information. First of all, collect user evaluation from the Internet and transform the unstructured document to structured documents by natural language processing technology. In this process, the particularity of Chinese text and automobile terms need to be considered. Then, expand vocabulary of this model and at the same time, reduce the total number of words text by using matrix reduction technique so that we can keep the core vocabulary. In the process of text classification, based on the Hownet repository, adding user-defined vehicle ontology vocabulary as a supplement will make more accurate classification results. Using Hownet to calculate the similarity between Hownet and vehicle ontology vocabulary so that we can make classification among the document collection according to the similarity of complaints. With the development of natural language processing, information extraction and the application of semantic web technology, we can classify a large number of documents efficiently by automobile domain ontology. By this technology, we can track the existing fault and safety hazards in the auto parts. This study will have a practical value in certain researches. |