Font Size: a A A

Research On Text Mining Of On-Board Signal Equipment Maintenance Log Based On Deep Learning

Posted on:2023-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:L XueFull Text:PDF
GTID:2542307073483194Subject:Traffic Information Engineering & Control
Abstract/Summary:PDF Full Text Request
As the core of the train operation control system,the on-board equipment is related to the safe and efficient operation of the train.Due to its important role,complex structure and diverse characteristics of faults,after a vehicle fault occurs,the fault content and processing condition are recorded in detail in the form of text logs,which contains rich experience and knowledge.However,this type of text has not been fully utilized due to the lack of unified expression and a large amount of redundant information.Therefore,it is of great significance to mine the key information to guide fault maintenance.This thesis takes the on-board equipment fault maintenance log as the research object,aiming to fully mine the fault information.By analyzing the fault data,according to the coexistence of structured and semi-structured data types of logs,a targeted fault text data processing method is adopted.For the semi-structured text that is difficult to use,a text mining scheme for on-board equipment faults is proposed,and a fault analysis system is constructed based on this.The root cause relationship and distribution of faults are visually displayed in the form of charts.Specific research contents are as follows:(1)The method of named entity recognition is adopted to extract the required key fault information by identifying the boundary and type of the fault entity.For the on-board equipment fault named entity recognition task,the fault data set OBE-Fault-Corpus is constructed,and the HMM model,CRF model in machine learning and Bi LSTM model and Bi LSTM-CRF model in deep learning model are applied to the recognition task respectively.On this basis,a fault entity recognition model based on multi-task learning is proposed,which improves the fault entity recognition ability by integrating the boundary features of the part-of-speech tagging task.It is verified by experiments that the effect of Bi LSTM-CRF is better than the machine learning model,and the multi-task learning model is better than the Bi LSTM-CRF single-task model.F1 value can reach 84.70%.(2)In order to solve the redundancy and error problems caused by the homology and heterogeneity of on-board equipment logs,this thesis designs a named entity disambiguation scheme to disambiguate the identified entities.Firstly,ALBERT-Co SENT model is proposed to obtain the vector representation of the faulty entity,which can effectively deal with the problem of polysemy.Then,the Auto Encoder is used to further extract features to reduce the dimension of the obtained high-dimensional entity vector.This way can avoid the problems with high complexity and low accuracy generated by direct clustering.Finally,the dimensionality-reduced data is inputted into the DBSCAN clustering model to identify similar redundant entities and erroneous entities.Experiments show that the F1 value of this scheme is 14.03%,which is higher than that of the benchmark scheme.According to statistics,a total of 183 fault phenomenon entities,419 fault cause entities,and 56 fault processing entities are obtained.(3)Based on the above model and combined with the structured data in the maintenance log,an on-board fault analysis system is constructed to record the fault content in real time.The system not only visually displays the root cause relationship,distribution characteristics and fault trend of faults in the form of diagrams and tables,but also realizes the management and storage of fault data,which effectively improves the knowledge discovery and use ability of on-board equipment fault logs.
Keywords/Search Tags:On-board equipment, fault maintenance log, named entity recognition, named entity disambiguation, visualization
PDF Full Text Request
Related items