Font Size: a A A

Literature Classification Of Plant Phenotypic Genes Based On Machine Learning And Its Application

Posted on:2020-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:C WangFull Text:PDF
GTID:2370330578463404Subject:Agriculture
Abstract/Summary:PDF Full Text Request
With the development of bioinformatics,a large amount of literature is published every day in various journals.In the face of the rapidly growing life science literature,it is difficult to achieve efficient management by means of manual labeling,and text mining technology is applied to the field of biomedicine.It can solve the efficiency of literature classification well,and the literature classification technology can extract the content of interest to researchers from the disorderly information.In this dissertation,the literature is classified by machine learning classifier,and the plant phenotype and gene-related literature are selected to improve the efficiency of classification.The specific work is as follows:(1)Completed data acquisition and pre-processing workThe literature related to plant phenotype genes in the MEDLINE database was collected by the reptile software,and the literature on the collected plant phenotype genes was pre-processed,including:literature cleaning,document segmentation,stem extraction and deletion of stop words..(2)Constructing the word bag model,TF-IDF model and Word2vec model for feature processing of documentsAiming at the problem of feature processing in plant phenotype gene literature,the pre-processed document features are given different weights,and the content of the literature is converted into a vector form,mainly based on the frequency of occurrence of words,inverse document frequency and text similarity in the literature.Attributes,select reasonable hyperparameters through experiments,and finally evaluate the classification effects of different feature extraction methods.(3)Classification of plant phenotypic genetic documents using machine learning classifiersBy comparing the advantages and disadvantages of existing text categorization algorithms,support vector machine,naive Bayes and random forest method were used to classify plant phenotypic gene literature,and convolutional neural network was combined to obtain different categorizers of plant phenotype corpus.Classification effect.The experimental results show that the classification effect of the convolutional neural network and the classification effect of the support vector machine are similar to each other,and the accuracy is about 90%.The classification effect of support vector machine classifier is better than that of random forest and naive Bayes classifier.The accuracy of random forest and naive Bayes classifier is also over 85%.The classification research of plant phenotypic gene literature improves the retrieval efficiency,helps researchers to explore the value behind the literature,and screens high-quality crop varieties,which is very important for the new theory.
Keywords/Search Tags:literature classification, Support vector machine, Naive bayes, Random forest, Convolutional neural network
PDF Full Text Request
Related items