Research And Implementation Of Multi-classification Algorithms For Domain Applications

Posted on:2022-03-08

Degree:Master

Type:Thesis

Country:China

Candidate:Y P Deng

Full Text:PDF

GTID:2492306317996579

Subject:Master of Engineering (in the field of Transportation Engineering)

Abstract/Summary:

PDF Full Text Request

Mobile Internet emerging information technologies such as the IOT(Internet of Things),AI(artificial intelligence),big data,and cloud computing are developing rapidly.The Internet is deeply integrated with various fields,and the resulting digital assets have become one of the important assets in various fields.Digital assets are included in massive electronic information files,which contain a wealth of valuable information waiting to be discovered.In the field of civil aviation,the NOTAM contains important key information for flight crews,ground control personnel,and maintenance logistics support personnel.However,the current manual processing methods have increased repetitive human labor and caused more human errors.As the types of NOTAM are relatively fixed and suitable for computer processing,the use of artificial intelligence methods to automatically complete the notice text classification of NOTAM can greatly reduce the workload of the staff and reduce errors.At present,there are many characteristics NOTAM.The Chinese and English are mixed,the structure is poor,and there are many categories,usually hundreds or even thousands of categories.There are many categories with very few data samples.According to the characteristics of the NOTAM data set,this paper has conducted some research and practice on multiple types of text classification.The main research contents are as follows:1)Aiming at the characteristics of mixed Chinese and English original data and poor structure of the original data,after cleaning the original data,it is proposed to use word2 vec and Glove word vector methods to preprocess the original data in text representation to complete the text representation.After experimental comparison,the use of Glove word vector than word2 vec method can effectively express the meaning of the airborne notice data.2)Aiming at the problem of unbalanced category distribution in the NOTAM message data set,it is proposed to use the SMOTE algorithm for data enhancement of the tail data samples of the message data.In the experiment,the ability of the text classification model to recognize and process the tail samples of the message data is effectively improved.3)In order to improve the ability of the text classification model to recognize and process tail data samples,it is proposed to decouple the training process of the model from the perspective of the algorithm as a whole,and split the overall training process into two stages,feature learning and classifier learning.The weights of the two stages adopt different strategies to overcome the influence of the head data and tail data of the unbalanced data set on the classification model.Decoupling features and classifiers can effectively improve the model’s ability to identify and process tail data samples in experiments.Experiments have proved that the use of data enhancement methods and decoupling features and classifier methods based on the neural network classification model can complete text multi-class classification tasks in the field of civil aviation,and at the same time can improve the recognition accuracy of the tail samples in the data set.The practical application value.

Keywords/Search Tags:

NOTAM, unbalanced data set, text categorization, word vector, data enhancement, decoupling features and classifiers

PDF Full Text Request

Related items

1	Model Of Domain Natural Language Text Categorization And Its Application On Requirement Analysis Of Mechanical
2	Research On Visualization Of NOTAM Based On Natural Language Processing
3	Research On Retrieval Technology Of Unstructured Text Data In Two-Ticket Training System
4	A Study On The Relationship Between Online Search Data,Online Word-of-Mouth And Chinese Automobile Sales
5	Data Analysis Of Air Traffic Management System Hazards Based On Natural Language Processing
6	Study On Text Mining Based Fault Classification For Turnout
7	Research On Text Categorization Methods And Its Application To Analysis Of Civil Aviation Safety Reports
8	Data Acquisition And Management For The J-text Tokomak Experiment
9	The Upgrade Of Data System For The J-TEXT Tokomak Experiment
10	Data Acquisition And Management For The J-TEXT Tokomak Experiment