Font Size: a A A

Study On Human Computer Collaborative Construction Method Of Literature Database Of Acupuncture Clinical Basic Research

Posted on:2023-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:H Y LiuFull Text:PDF
GTID:2544306842998699Subject:Acupuncture and Massage
Abstract/Summary:PDF Full Text Request
PurposeBased on the literatures of clinical basic research of acupuncture,a manual annotated data set was constructed,and named entity recognition technology was used to explore methods suitable for information extraction and structured processing of literatures of clinical basic research of acupuncture,laying a foundation for large-scale structured processing of literatures of clinical basic research of acupuncture.To solve the problems of time consuming,labor consuming,information extraction omission and classification error in the process of establishing the literature database of clinical basic acupuncture research manually,and improve the standardization of the database.Contents and methods1.Construction of manually annotated data set of acupuncture clinical basic research literatureThe methods of literature research and expert consultation were used to determine the entity types that need manual annotation in the data set.On this basis,manual annotation was performed on the literatures of clinical basic research on acupuncture in 2020 to generate the data set of manual annotation for the literatures of clinical basic research on acupuncture,providing corpus for the training of information extraction model.2.Establishment of entity extraction model of acupuncture clinical basic research literature based on deep learning and standardization of resultsTo entity recognition difficulty in acupuncture clinical basic research literature of traditional Chinese medicine(TCM)and text of named entity recognition method,developed a variety of named entity extraction model,by comparing the experiment analysis,selecting suitable for acupuncture clinical basic research literature entity extraction model,and build acupuncture acupoints,disease,clinical basic research literature effect difference and noun library,the standardization of the extraction results processing.3.The design and implementation of a structured literature processing system for clinical basic research of acupunctureTo study the content of two selected model for entity extraction model,from the overall architecture,technical architecture and system function in the face of acupuncture clinical basic research literature structured system design,system design after the success to extraction of literature,the extraction results comparing with the results database artificial extraction,check up the accuracy of the model in the research content two.Results1.Defined the manual labeling entity and constructed the literature data set of clinical basic research of acupuncture(1)There are 5 categories and 25 types of entities artificially labeled in the literature on clinical basic research of acupuncture,including acupuncture site,acupuncture method,needling method,retention time,treatment course,frequency,acupoint matching,corresponding syndrome of acupoint matching and corresponding symptom of acupoint matching;Electroacupuncture treatment entity category,including electroacupuncture instrument,electroacupuncture frequency,electroacupuncture waveform;Categories of disease entities,including disease name,case source,diagnostic criteria,inclusion criteria and exclusion criteria;Experimental effect entity category,including effect,effect change;Research Methods Entity category research method,total sample size,treatment group sample size,statistical method,age group,gender group.(2)The manual annotation data set of 472 literatures of acupuncture clinical basic research with entity label was constructed,with a total of 9121 entities labeled,including 2971 entities related to acupuncture treatment,486 entities related to electroacupuncture treatment,2190 entities related to disease,2759 entities related to research methods,and 715 entities related to experimental effect.2.Constructed the entity extraction model and glossary of clinical basic research literature of acupuncture(1)Experiments on the above labeled data sets show that the P-value(Precision)and Rvalue(recall rate)F1-Score and bi-directional long short-term memory networks(Bi LSTM)were 34.94%,30.38% and 30.59% respectively.on P,R and F1 values were 45.22%,37.17%and 37.07%,respectively.Bi-directional Long short-term Memory Networks-Conditional Random Fields;The results of Bi LSTM-CRF on P,R,and F1 values were 54.99%,55.56%,and 54.16%,respectively,which reflects the Bidirectional encoder representations from transformers.The results of the BERT-Bi LSTM-CRF on P,R and F1 values were 51.81%,48.87% and 49.01%,respectively.Through comparative analysis,it can be seen that among the four models,the three indicators of Bi LSTM-CRF model all reach the highest level.Therefore,this model is selected as the entity extraction model of clinical basic research literature of acupuncture.(2)2121 disease names,17454 acupuncture points and 5224 effects were manually extracted from 2854 clinical basic research papers on acupuncture;882 disease names,1216 acupuncture points and 2044 acupuncture effects were included in the glossary as different names.498 standard disease names,592 acupuncture points and 1068 effects were obtained.The one-to-one correspondence between different names and standard names forms a glossary of acupuncture points,diseases and effects in clinical basic research literature of acupuncture,which can be used to standardize the extraction results of acupuncture sites,points,disease names and effects.3.Designed and verified a structured literature processing system for clinical basic research of acupuncture(1)The design of the acupuncture clinical basic research literature structured processing system,the system support documents to import,to set up the label,data annotation,the generated annotation sets,computer automatically extract,artificial proofreading,export data,extracting results and training,and other functions,can be achieved without the help of professional programmers,structured process for acupuncture clinical basic research literature.(2)The overall accuracy of the structured literature processing system in the identification of 228 randomly selected literatures on clinical basic research of acupuncture was 67.89%,and182 entities were identified that were omitted or incorrectly classified during manual extraction.From the entity category,the top three entities with the highest accuracy were statistical method,exclusion criteria and electroacupuncture waveform,which were 94.09%,91.85% and 83.33%,respectively;the three entities with the lowest accuracy were symptoms corresponding to acupoint matching,disease name and sample size of treatment group,which were 50.00%,50.68% and 52.43%,respectively.In terms of publication time,the accuracy rate was the lowest(60.87%)in 2005,and the highest(85.68%)in 2018.ConclusionBased on the deep learning model Bi LSTM-CRF,the structured literature processing system for clinical basic research of acupuncture can accurately extract the entities in the literature,which solves the problems of manual extraction time,energy consumption and entity omission,and provides the possibility for the annual addition of clinical basic research literature of acupuncture in the future.
Keywords/Search Tags:literature database of acupuncture clinical basic research, natural language processing, entity recognition, data indicate
PDF Full Text Request
Related items