Font Size: a A A

Text Mining Algorithms And Their Applications In Knowledge Management

Posted on:2009-05-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z G XuanFull Text:PDF
GTID:1119360272470445Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the advent of knowledge-based economy, the Knowledge Management(KM) contributes much more than before in the social and economic lives. Most of the researchers focus on the ones on the enterprises, and there are little research works aiming at solving the KM problems in Scientific Management Departments(SMDs). In this dissertation, the KM of SMDs of China is studied. KM in SMDs is different from those in the other domains. For instance, SMDs of China holds many research proposals with lots of knowledge. Obviously, the activities to mine and utilize the knowledge in research proposals can strongly provide decision support for the SMDs in the following levels: the whole discipline, the sub-domain of the discipline and the research projects.Knowledge is contained in the contents of research proposals. In order to discover knowledge from the proposal's contents, there are several problems should to be solved, including knowledge representations of research proposals cannot fully rely on the thesaurus; the contents of research proposals are not completely consistent with the submitted subject field; and the structure of subject coding system is not entirely identical with that of actual research field. In terms of the aforementioned issues, the following three folds are carried out.Firstly, a bridge-connection pattern filtering algorithm is presented for extracting high-frequency words without thesaurus. The frequencies of co-occurrence patterns of the Chinese characters are counted from documents. The supported frequencies of patterns are obtained by eliminating the bridge-connection frequencies. Based on the supported frequencies, the words can be better identified and extracted than the ones obtained by using the primary appearing frequencies. This algorithm can be applied to the Chinese information processing, which is sensitive to the word frequencies. Using this algorithm, the new features which don't exist in the thesaurus could be extracted from the proposals and added into the thesaurus.Secondly, a revision algorithm for noise texts is presented to study the effect of the noisy data to the clustering results. In the algorithm, the document similarity network is constructed firstly based on similarities of the document's contents. The categories constitute the corresponding community structure in the network, and modularity is used to evaluate the quality of categories. The noise texts can be revised by optimizing the modularity. This algorithm can be used in the preprocessing of text mining or taxonomy building. In this dissertation, the research proposals belonging to subject codes are regarded as texts with noise. Using the presented algorithm, the proposals that are submitted into the wrong subject codes can be transferred to the correct ones. By using the modified data, the models of the subject codes are built, and the intension and extension of each research area, expressed by code, can be confirmed. Moreover, the relationships between codes can be analyzed.Finally, inspired by the node similarity of social networks, a new definition, named community similarity, is presented based on the common connecting strengths. Based on this definition, a clustering algorithm is designed. In the initial stage each document is treated as a cluster. At each step, two clusters with the largest similarity are combined. Because the relations between and within the clusters are taken into account, some combining errors can be avoided and better clustering results are obtained. Based on this algorithm, the research proposals are clustered into subject categories, and the relations between subject categories and codes are analyzed.According to the theoretical research results, in this dissertation, some application issues on funds management of National Natural Science Foundation of China are conducted. More specially, we analyze the whole trends and regulations of basic discipline research, the current situations of all the subject fields and their relations. These works can afford powerful decision support for establishing of development programs and development strategies, and adjusting of subject coding system and management of projects.
Keywords/Search Tags:Knowledge Management, Knowledge Discovery from Texts, Text Categorization, Text Clustering
PDF Full Text Request
Related items