| As an important part of railway signal equipment,track circuit has been widely used in China’s rail transit operation lines.Due to the complex and diverse causes of track circuit faults,the track circuit fault analysis based on manual experience has the problems of time-consuming and low efficiency,which has a great impact on railway transportation.A large number of track circuit fault text record data accumulated in the process of continuous railway operation is a true reflection of the on-site fault situation,which contains a lot of valuable fault information.At present,the fault text data of track circuit mainly depends on manual processing and analysis,which has the problems of low efficiency and insufficient mining and utilization,resulting in the waste of a large number of valuable fault data resources.Therefore,the automatic and intelligent analysis and mining of historical fault text data,obtaining valuable fault information and assisting manual fault analysis and decision-making will help to improve the efficiency of track circuit fault analysis and the reliability of track circuit application.Aiming at the above problems,this thesis studies the track circuit fault text mining and knowledge graph construction for the track circuit fault text data.The specific research contents are as follows:(1)From the perspective of fault cause types,based on the characteristics of track circuit fault text and the distribution characteristics of characteristic words of different cause categories,a track circuit fault text classification model based on improved TF-IDF algorithm was studied and constructed.Based on a variety of evaluation indicators,through the comparative analysis of a variety of text classification models under fault text data balance processing,different text representation feature methods and different word segmentation modes,the experiment showed that the fault text classification model based on improved TF-IDF + Synthetic Minority Over-sampling Technique(SMOTE)+ Support Vector Machine had the best classification effect,improved the extraction effect of cause category feature words,and realized the accurate and efficient automatic classification processing of track circuit fault text.(2)Aiming at the low utilization of track circuit fault text data,considering the diversity of track circuit fault causes,a fine-grained fault cause clustering analysis method based on two-layer feature extraction was studied.In the semantic layer,the Laplacian matrix was calculated according to the fault text feature matrix represented by Word2 vec feature,and the fault cause topic clustering analysis was realized based on the spectral clustering algorithm.Through the word item layer,the fault cause phrases under the fault cause topic type were extracted and described in a unified and standardized manner,so as to realize the automatic statistical analysis and mining of the causes of high-frequency faults in track circuits,it provided valuable information for on-site track circuit maintenance and fault analysis,and had auxiliary guiding significance for track circuit fault analysis and formulation of preventive maintenance measures.(3)Based on the track circuit text mining results,the knowledge graph framework of track circuit fault field was proposed by using Neo4 j diagram database,and the historical fault case graph,signal equipment concept and entity graph and fault disposal logic graph were constructed.Provided efficient fault domain knowledge query and information management for on-site maintenance personnel through visualization,and realized the recommended application of track circuit fault based on statistical analysis,so as to provide intelligent auxiliary guidance for on-site track circuit fault analysis and disposal. |