Text Mining Algorithms And Their Applications In Knowledge Management

Posted on:2009-05-31

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Z G Xuan

Full Text:PDF

GTID:1119360272470445

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

With the advent of knowledge-based economy, the Knowledge Management(KM) contributes much more than before in the social and economic lives. Most of the researchers focus on the ones on the enterprises, and there are little research works aiming at solving the KM problems in Scientific Management Departments(SMDs). In this dissertation, the KM of SMDs of China is studied. KM in SMDs is different from those in the other domains. For instance, SMDs of China holds many research proposals with lots of knowledge. Obviously, the activities to mine and utilize the knowledge in research proposals can strongly provide decision support for the SMDs in the following levels: the whole discipline, the sub-domain of the discipline and the research projects.Knowledge is contained in the contents of research proposals. In order to discover knowledge from the proposal's contents, there are several problems should to be solved, including knowledge representations of research proposals cannot fully rely on the thesaurus; the contents of research proposals are not completely consistent with the submitted subject field; and the structure of subject coding system is not entirely identical with that of actual research field. In terms of the aforementioned issues, the following three folds are carried out.Firstly, a bridge-connection pattern filtering algorithm is presented for extracting high-frequency words without thesaurus. The frequencies of co-occurrence patterns of the Chinese characters are counted from documents. The supported frequencies of patterns are obtained by eliminating the bridge-connection frequencies. Based on the supported frequencies, the words can be better identified and extracted than the ones obtained by using the primary appearing frequencies. This algorithm can be applied to the Chinese information processing, which is sensitive to the word frequencies. Using this algorithm, the new features which don't exist in the thesaurus could be extracted from the proposals and added into the thesaurus.Secondly, a revision algorithm for noise texts is presented to study the effect of the noisy data to the clustering results. In the algorithm, the document similarity network is constructed firstly based on similarities of the document's contents. The categories constitute the corresponding community structure in the network, and modularity is used to evaluate the quality of categories. The noise texts can be revised by optimizing the modularity. This algorithm can be used in the preprocessing of text mining or taxonomy building. In this dissertation, the research proposals belonging to subject codes are regarded as texts with noise. Using the presented algorithm, the proposals that are submitted into the wrong subject codes can be transferred to the correct ones. By using the modified data, the models of the subject codes are built, and the intension and extension of each research area, expressed by code, can be confirmed. Moreover, the relationships between codes can be analyzed.Finally, inspired by the node similarity of social networks, a new definition, named community similarity, is presented based on the common connecting strengths. Based on this definition, a clustering algorithm is designed. In the initial stage each document is treated as a cluster. At each step, two clusters with the largest similarity are combined. Because the relations between and within the clusters are taken into account, some combining errors can be avoided and better clustering results are obtained. Based on this algorithm, the research proposals are clustered into subject categories, and the relations between subject categories and codes are analyzed.According to the theoretical research results, in this dissertation, some application issues on funds management of National Natural Science Foundation of China are conducted. More specially, we analyze the whole trends and regulations of basic discipline research, the current situations of all the subject fields and their relations. These works can afford powerful decision support for establishing of development programs and development strategies, and adjusting of subject coding system and management of projects.

Keywords/Search Tags:

Knowledge Management, Knowledge Discovery from Texts, Text Categorization, Text Clustering

PDF Full Text Request

Related items

1	Research On The Application Of Knowledge Discovering In The Customer Service System Of Communication Industry
2	Study On Support Vector Machines Classification Methods And Their Application In Text Categorization
3	Study On Methods And Strategies Of Knowledge Management In The Software Development Process Of Vipinfo Company
4	The Study Of Knowledge Discovery Based On Rough Set And Its Application In CRM
5	Study On The Value Fit And Corporate Culture Texts:the Conceptsã€Survey And Their Relationship
6	Listing Corporation Annual Report Text Knowledge Discovery Based On LDA Topic Model
7	Study On Customer Knowledge Management Of B2C Online Review
8	Study On Text Categorization Based On Decision Tree And K Nearest Neighbors
9	Research On Knowledge Discovery Of Coal Mine Safety Hidden Peril Management
10	Research And Application On Voyage Knowledge Discovery