Parallel Machine Learning Algorithm For Large-scale Forestry Text Classification Based On Spark

Posted on:2020-06-05

Degree:Master

Type:Thesis

Country:China

Candidate:D Y Shi

Full Text:PDF

GTID:2393330575497725

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the gradual integration of new information technology and forestry fields,forestry-related texts had shown the characteristics of large-scale and difficult to be integrated.However,through the related researches,the relevant research process of forestry text classification was inconsistent with the current forestry text field requirements.The shortcomings were mainly manifested in two aspects:1)the classification labels set in the existing classification system were not scientific and the classification algorithm was mostly based on small batch data training,which leads to the poor practical application ability of the classification model.2)The classification algorithms were mostly based on stand-alone environments and lacked the ability to deal with actual large-scale data classification scenarios.This paper intended to combine big data analysis technology with forestry text analysis and established new classification labels.Then the feature weight calculations were performed by using TF-IDF and Word2vec.After that,there realized an XGBoost parallelizatrion algorithm based on Spark computing framework,which was compared with three parallel machine learning algorithms.The results showed that:1)the classification performance of XGBoost and TF-IDF was significantly better than that of the other seven parallel systems;2)the efficiency and accuracy of each algorithm under TF-IDF algorithm was better than Word2vee,Which showed that the features contained in the word vector obtained by using the TF-IDF algorithm in the Internet were more representative of the forestry characteristics;3)the XGBoost algorithm was ran more effectivce than the stand-alone version based on the performance of Spark,which could deal with the problem of classification upon the massive forestry texts.

Keywords/Search Tags:

Spark, Chinese text classification, forestry text, machine learning, XGBoost

PDF Full Text Request

Related items

1	Research On Agro-short Text Mining Method Using Deep Learning
2	Application Research Of Agricultural Information Set In Jilin Province Based On Text Classification Algorithm
3	Research On Improved Swarm Intelligence Algorithm And Its Application In Agricultural Text Classification
4	Research And Implementation Of Aquatic Text Classification System Based On Deep Learning
5	Semi-supervised Agricultural Text Classification Based On Semantic Diffusion Kernel And Support Vector Machines
6	Research On Key Information Extraction For Forestry Text
7	Research And System Implementation Of Automatic Text Summarization For Forestry News
8	Research On Text Detection Algorithm Of Agricultural Materials Image Based On Segmentation
9	Research On Plant Image Recognition Based On Attention Mechanism
10	Application Of NLP Technology In Agricultural Public Opinion Analysis System