| The excavation of bamboo slips has greatly improved the shortage of historical materials in the Qin Dynasty,provided rich and reliable resources for the study of the history from the late Warring States period to the Qin Dynasty,and has irreplaceable historical value for the current historical research.Bamboo slips and Chinese books account for a large proportion.Compared with the static rule system,practical documents often contain more abundant information and interpret the social appearance and norms at that time from different angles.How to mine effective information from bamboo slips corpus and analyze it,the text mining technology,which is gradually widely used,puts forward a better solution for us.This paper focuses on the use of text mining means to study the bamboo slips,introduces the text mining analysis method into the research field of bamboo slips,analyzes the text with the first and second volumes of Li-ye Bamboo Slips as the corpus source,and explores how to improve the research efficiency of bamboo slips and change the research methods of bamboo slips,so as to realize the rapid grasp and content disclosure of bamboo slips,and also promote the development of the research of historical knowledge discovery of the Qin Dynasty.The main work of this paper includes: first,the digitization and preprocessing of bamboo slips.Together with the members of the bamboo slips team,we digitized the contents of the first and second volumes of the Li-ye Bamboo Slips,and ensured the accuracy of the bamboo slips data through two rounds of verification.Combined with the characteristics of the bamboo slips and the actual needs of subsequent experiments,this paper carried out preprocessing steps such as word segmentation,de deactivation words and feature item weight calculation.The second is to analyze the co-occurrence of keywords in the first and second volumes of Li-ye Bamboo Slips by using the methods of keyword extraction and word frequency statistics.Taking the top ten keywords as the index,extract all relevant corpora and make word frequency statistics.The analysis results show that there are apprenticeship documents,food management documents and debt documents in riyer Volume I and II.The third is to extract the text summary of bamboo slips by using textrank text summary model.By compressing the text information of bamboo slips,the key information is extracted,so as to obtain and grasp the subject content of bamboo slips from the perspective of events.A total of6 abstracts are obtained in the experiment,involving debt documents,food management documents and legal documents.Fourth,the topic model is used to calculate the topic word probability distribution of bamboo slips.From the perspective of probability distribution,one of the themes of the first volume is to grasp the theme of the second volume,that is,the theme of the second volume is to reflect the content of the text.The confusion degree is introduced as the evaluation index of the model.According to the confusion degree and manual repeated experiments,12 topics are selected as the best topics,and 4 of them have obvious topic tendency.This paper holds that the first and second volumes of Li-ye Bamboo Slips are mainly expressed in the form of official documents,and the specific contents involve documents,postal transmission,debts,grain,apprentice books,cases,corvee and so on.The calculation results can better summarize its main contents,and correspond to the subject contents given in the first volume of Liye’s proofreading.This paper has certain theoretical significance and reference value for the text mining research in the field of bamboo slips research.In particular,it is the first attempt to combine bamboo slips and text mining technology,which effectively improves the processing efficiency and information acquisition efficiency of bamboo slips,provides new ideas for the research of bamboo slips,and proves that China’s rich cultural heritage can be understood and interpreted with the help of text mining methods,It has certain feasibility and practical significance. |