Font Size: a A A

Classification Method Research Of Text Primary And Secondary School Teaching Resources Based On KNN Algorithm

Posted on:2019-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:S Q WangFull Text:PDF
GTID:2417330563452994Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the proposal of the concept of “Internet plus”,the education information construction in our country has gained rapid development.The media of teaching communication is not restricted to books any more,as the new supporter,online teaching resources have provided richer extracurricular knowledge selections for teachers and students.However,at the same time,the online teaching resources have presented geometric quantitative growth and they are of a great variety,in order to utilize them effectively,we urgently need to classify and arrange the resources effectively.At present,teaching resources mainly include many categories such as video,audio,and text and so on,while most of which belong to text resources.Therefore,in teaching resources,the research on text classification is of great importance.In this paper,it takes the text primary and secondary school teaching resources as the research object to classify and study,based on KNN algorithm,the text improves the algorithm by combining with the domain characteristics,and thus improving the efficiency and effect of the classification.Firstly,the thesis expounds the contents of the research such as the related background,meaning and research status and so on,and introduces the related theories such as text preprocessing,feature selection,classification method,classification performance evaluation and weight calculation method and so on.Then,the thesis formulates the classification standards,constructs the corpus,and summarizes the characteristics of text primary and secondary school teaching resources.Combining with the resource characteristics,the thesis rectifies and improves the process of text pre-processing.At the same time,we combine with the characteristics of text primary and secondary school teaching resources,analyze the weight calculation method of TF-IDF and the classification algorithm of KNN deeply and propose the effective improvements.(1)Improvement on the weight calculation method of TF-IDF.The traditional TF-IDF algorithm only considers the term frequency(TF)and the inverse document frequency(IDF)of the appearance of the feature term,that is to say if the frequency of the appearance of a feature term is higher and the text in which the feature term intensively appearing in the training is less,this feature term is more important.Based on this,we propose TF-IDF_ATC weight calculation method,add the parameter ATC to help to determine the inter-class distribution and inner-class distribution of the feature term appearance frequency,and better evaluate the accurate weight value of the feature term.(2)Improvement on the classification algorithm of KNN with density cutting.In the text primary and secondary school teaching resources,the number of liberal arts resources is far greater than that of the science resources,there are problems of uneven sample density distribution and the distribution effect is severely affected.By measuring the spatial density of the sample,we find out the text in the high-density region.Focusing on the two conditions of intra-class region and inter-class region,we put forward different cutting methods respectively,pay special attention on the cutting problems of borderline junction regional space,and while ensuing the uniform distribution of the sample,we reduce the classification computing time.Finally,on the Weka platform,the effectiveness of the improved algorithm is proved by the comparison experiments.
Keywords/Search Tags:Text Classification, KNN, Primary and Secondary School Teaching Resources, TF-IDF, Sample Cutting
PDF Full Text Request
Related items