| The importance of information is becoming more high and all walks of life are covered by a large amount of information.This large amount of information contains various modes of information carriers,such as text,video,pictures,voice,etc.The text is one of the most important carriers of this information.However,in the shipping field,a set of reasonable classification procedures and methods have not yet been formed.Manual comparison is required when distinguishing highly specialized ship text information,which will cost a lot of human resources and time.This thesis proposes a set of feasible solutions to this problem,and innovates on the basis of the original algorithm.This thesis mainly studies the application of text classification related technologies to the ship field,which can save a lot of time and cost for manual comparison.The commonly used text classification algorithms are analyzed and compared,and the experimental method of feature extraction in the field of shipping is targeted.Make improvements based on the shortcomings so that the operating point is more reasonable in the weight calculation.Analyze the fasttext algorithm,a fast text classification algorithm,combined with the application of the ship field,aiming at the impact of keyword weights and inaccurate calculations caused by the differences of ship equipment and regions,and directed transformation of the fasttext model.A new algorithm model which name is C-fasttext is proposed.And through front-end and back-end separation technology to achieve classification results display,in line with product requirements.The goal of this thesis is to ensure that the accuracy and recall rate of automatic classification can be greater than 90% at the same time when corpora with different naming rules arrive,control the false alarm rate not to exceed 5%,and the corpus coverage rate to exceed 95%.The proposed C-fasttext model is compared with the traditional support vector machine algorithm,the naive Bayes algorithm,and the original fasttext algorithm,and the comparison results are analyzed.The experiment shows that the improved C-fasttext algorithm has 4% classification than the traditional fasttext algorithm.The effect is improved.Compared with the naive Bayes algorithm,the classification effect is improved by 15.3%,and the classification effect is improved by 31.5% compared with the support vector machine algorithm. |