Font Size: a A A

Research And Implementation Of Topic Discovery System For MOOC Comments

Posted on:2023-09-03Degree:MasterType:Thesis
Country:ChinaCandidate:C DengFull Text:PDF
GTID:2558307073491444Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The development of the mobile Internet is in full swing,and various social platforms and online self-media platforms are blooming everywhere.The vigorous development of social networks has spawned massive amounts of short text data in various forms,such as food reviews,movie reviews,short microblogs,and user questions.These data contain a wealth of information,including user emotions,descriptions of things,current events,social knowledge and other information.How to extract these hidden information automatically and efficiently is very important for tasks such as user sentiment analysis,recommendation system,and public opinion detection.In this context,based on the field of educational knowledge,by comprehensively using deep learning and text data mining and other technologies,the short text topic discovery for comments and questions is carried out,and a topic discovery system is designed and implemented.The main work consists of the following four parts.1.Tools such as Scrapy,Selenium,and Redis are comprehensively used to collect the course information,evaluation information and popular question information of the Zhihu platform from the MOOC website of Chinese college students.The collected data goes through processes such as data deduplication,cleaning,and filtering to construct a MOOC course comment dataset and a Zhihu popular question dataset.2.A short text clustering algorithm based on sentence embedding and stacked denoising autoencoder is proposed.This method considers the sparsity,irregularity and lack of semantics of short text content.The task-fine-tuned pretrained language model represents sentence embeddings for short texts.Sentence embedding representation can effectively extract the location features,word features,semantic features,and context features of short texts.Stacked denoising autoencoders can further extract high-dimensional and abstract features represented by sentence embeddings,and enhance the robustness and generalization of the model through Dropout noise simulation technology.The results of multiple rounds of experiments show that the proposed algorithm outperforms the comparison model on public datasets and self-built datasets.3.Aiming at the shortcomings of the existing short text topic discovery methods,a method for Chinese short text topic discovery that combines word frequency and semantic features is designed.The introduction of pre-trained models brings a large amount of linguistic knowledge learned from external corpora to the method.In order to solve the problem that the traditional topic discovery method adopts a single feature,this method comprehensively uses the TF-IDF method and the semantic similarity calculation method to effectively extract the statistical features of text words and text semantic features,and then combines these two features to calculate the topic importance score.Finally,the experimental results confirm that the method can well discover the hot topics contained in the comments and questions,and the comprehensive performance is better than the baseline model.4.Using the collected data and the proposed method,a topic discovery system for comments and questions is designed and implemented.Through the collected data and the proposed algorithm,the system obtains the hot topics of comments and questions,and constructs a short text topic database.The system adopts the Flask back-end framework,the Boostrap front-end framework,and the ECharts visualization library,and provides external functions such as the visual display of MOOC comment topics,the visual display of Zhihu question topics,and the topic discovery interface,providing users with intuitive topic discovery services.
Keywords/Search Tags:Short text clustering, Topic extraction, Topic discovery, Sentence embedding, Deep learning, Autoencoder
PDF Full Text Request
Related items