Font Size: a A A

Research On Data Mining Methods For University Official Websites

Posted on:2019-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:S S WangFull Text:PDF
GTID:2438330542464314Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The development of data mining is gradually generated on the basis of the development of the Internet.For information requirements from the initial infor mation acquisition to the current information retrieval,information mining,and gradually deepen the demand for information,people are also increasing the de mand for information.The development of data mining technology is to impro ve the access to information,enhance the understanding of data and informatio n,to dig out the hidden information in the data,to use more powerful data an alysis tools to conduct deeper analysis of these data to obtain the potential sig nificance,provide more valuable data.Data mining technology has merged the technologies of statistics,artificial intelligence and database to make it possible to mine unknown data in huge amounts of data and use data mining algorithm s to mine and analyze the data to enhance the intelligence of data.In the calculation of word similarity,words are first converted into word e mbedding,and the similarity of words is calculated by calculating the similarit y of embedding.In the calculation of similarity,the method different from wor d2 vec reduces the training process of the neural network.Through the analysis of synonym word forests,word forest coding is used to embedding the words,and the vector transformation of the local sensitive hash algorithm.In the proc ess,words are converted to 64-bit binary,and the similarity between words ca n be calculated using the Hamming distance.In order to improve the accuracy of word similarity calculation,we also proceed from the structural characteristic s of word forest,combine the word path information in word forest with embe dding,and use the weighted method of word forest tree structure to achieve th e word forest corpus.The similarity calculations of the words obtained very go od results in comparison experiments.The campus network in colleges and universities appeared in the developm ent of the Internet and played a key role in the development of the campus a nd provided strong data support for the construction of the campus.In the pro cess of mining and analyzing college profile data,this paper proposes a phrase similarity calculation method based on the combination of phrase tree structure and CilinSim Hash algorithm,which first converts the phrase into a tree structure with numbers as the root node.Secondly,the similarity computation based o n CilinSim Hash algorithm is realized by combining synonym Cilin with Sim Ha sh algorithm.Finally,the similarity between phrase structure based similarity an d similarity based on CilinSimHash algorithm is weighted to realize the similar ity of phrases calculate.The algorithm is applied to the university official web site data analysis process,and then the university official website data clusterin g analysis to study the university official website data and the relationship bet ween the evaluation indexes of colleges and universities;the structured data ob tained from the official website data of colleges and universities,and the analy sis of the related index data by the clustering algorithm show that the develop ment of colleges and universities is still unbalanced at different educational levels.
Keywords/Search Tags:Data mining, word similarity, clustering analysis, the campus webs it
PDF Full Text Request
Related items