| With the development of medical question answering communities,public begin to search for online question answering on these medical service platforms.At present,the medical communities mainly provides question retrieval services with search engines,which can not fully understand the semantic of users’ queries.Besides this,it is difficult to deal with diverse aliases of medical entities.Supervised learning is the common method for semantic matching,however,due to the lack of labeled data for question matching in Chinese health domain,it is difficult to take supervised semantic matching model to improve the performance of question retrieval in medical communities.Thus,this paper puts the emphasis on supervised semantic matching approaches to deal with question retrieval problem in Chinese medical domain.We alleviate the problem of short of labeled data with corpus construction,model improvement and transfer learning method.Main contributions of this paper are as follows:· Construction of semantic matching corpus in Chinese medical domain For the lack of labeled data for medical question matching task in Chinese,we propose a semi-automatic method for corpus mining and dataset construction.With the QA data of online medical communities,we build a large-scale similar question pair dataset in Chinese medical domain(CMSQP dataset).In order to exactly understand and distinguish diverse medical aliases,we acquire a lot of medical entities and their aliases from open-source knowledge base and encyclopedic of medical websites,then we build a large medical terminology dictionary in Chinese.· Improved semantic matching model based on Transformer Since that most deep semantic matching models based on LSTM have the problem of high complexity and slow computation speed.In order to improve the performance of the deep semantic matching models,we propose a TMTransformer model for semantic matching tasks.Based on Transformer model,we take use of Multi-Head Attention to learn both semantic representation and interaction features.Experiments on CMSQP dataset and other opensource dataset show that TMTransformer is more effective and less complex than existing work,which verify the effectiveness and efficiency of our proposed model.· Semantic matching method based on transfer learning we design two different transfer learning methods to improve the performance of semantic matching asks: the cross-domain transfer method and the cross-task transfer method.The ross-domain transfer learning method realizes knowledge transfer between different datasets,while the cross-task transfer strategy improves the effectiveness of emantic matching model with a multi-classification task.Experiments on multiple atasets demonstrate the effectiveness of the two transfer strategies for improving emantic matching task in Chinese medical domain.We also compare the performance,convergence,and application scenarios of two transfer strategies. |