Mobile Internet emerging information technologies such as the IOT(Internet of Things),AI(artificial intelligence),big data,and cloud computing are developing rapidly.The Internet is deeply integrated with various fields,and the resulting digital assets have become one of the important assets in various fields.Digital assets are included in massive electronic information files,which contain a wealth of valuable information waiting to be discovered.In the field of civil aviation,the NOTAM contains important key information for flight crews,ground control personnel,and maintenance logistics support personnel.However,the current manual processing methods have increased repetitive human labor and caused more human errors.As the types of NOTAM are relatively fixed and suitable for computer processing,the use of artificial intelligence methods to automatically complete the notice text classification of NOTAM can greatly reduce the workload of the staff and reduce errors.At present,there are many characteristics NOTAM.The Chinese and English are mixed,the structure is poor,and there are many categories,usually hundreds or even thousands of categories.There are many categories with very few data samples.According to the characteristics of the NOTAM data set,this paper has conducted some research and practice on multiple types of text classification.The main research contents are as follows:1)Aiming at the characteristics of mixed Chinese and English original data and poor structure of the original data,after cleaning the original data,it is proposed to use word2 vec and Glove word vector methods to preprocess the original data in text representation to complete the text representation.After experimental comparison,the use of Glove word vector than word2 vec method can effectively express the meaning of the airborne notice data.2)Aiming at the problem of unbalanced category distribution in the NOTAM message data set,it is proposed to use the SMOTE algorithm for data enhancement of the tail data samples of the message data.In the experiment,the ability of the text classification model to recognize and process the tail samples of the message data is effectively improved.3)In order to improve the ability of the text classification model to recognize and process tail data samples,it is proposed to decouple the training process of the model from the perspective of the algorithm as a whole,and split the overall training process into two stages,feature learning and classifier learning.The weights of the two stages adopt different strategies to overcome the influence of the head data and tail data of the unbalanced data set on the classification model.Decoupling features and classifiers can effectively improve the model’s ability to identify and process tail data samples in experiments.Experiments have proved that the use of data enhancement methods and decoupling features and classifier methods based on the neural network classification model can complete text multi-class classification tasks in the field of civil aviation,and at the same time can improve the recognition accuracy of the tail samples in the data set.The practical application value. |