Font Size: a A A

Research And Implementation Of Myanmar Web-text Mining

Posted on:2014-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:D J CunFull Text:PDF
GTID:2268330422952536Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Computer and internet become more and more popular in recent years inMyanmar, and many people get information through internet. Consequently, websitesin Myanmar language increase sharply. How to efficiently search the information inneed from that complex information is a huge problem in front of Myanmar. Since theeconomic and science develop slowly in Myanmar, many technologies are not mature,and still in a developing phase. Inside the country there is no research deep into thisfield, where exists a new challenge of Myanmar Text Mining.This paper aims to research the Myanmar Text Mining. Firstly it introducesthe background, objectives, significance and the overseas and domestic status of theresearch, and analyses the characteristics of Myanmar, as well as problems it bringsto Myanmar Text Mining. Subsequently, it introduces all algorithms like theMyanmar word segmentation, Myanmar word stemming, Myanmar stop word andthe modified Myanmar text cluster algorithm, which are deployed in the MyanmarText Mining in terms of research, analyses and realization.In the end of this paper, the Myanmar Web text retrieval system and MyanmarWeb text clustering system are designed and realized based on the algorithmsmentioned above. After html tag filtration of Myanmar web text, Myanmar wordsegmentation processing, stemming, remove stop word, Vector Space Model is usedto stand for the text, Okapi similarity method is used to calculate the relationshipbetween Myanmar text and the key words. In the text retrieval test, the experimentalresults show that the proposed algorithm can quickly and effectively mining the htmldocuments on the web. In the Text Clustering test, the experimental results show thatthe modified clustering algorithm is improved in terms of stability, accuracy andreliability.
Keywords/Search Tags:Myanmar text retrieval, VSM, Okapi, K-means, Hierarchical Clustering
PDF Full Text Request
Related items