| With the rapid development of artificial intelligence and natural language processing technology,intelligent customer service systems have gained more and more attention in academic and industrial circles.Among them,the intelligent customer service system based on frequently asked question(FAQ)is widely used in many commercial services due to its advantages of simplicity,high efficiency and high accuracy.At present,the short text classification technology that the FAQ-based intelligent customer service system relies on has made great progress,especially under the premise of large-scale balanced training data.However,the actual FAQ data is often imbalanced,that is,some standard questions have multiple extended questions,while others have only a few or no extended questions.This leads to the traditional intelligent customer service system that relies on the short text classification technology of large-scale balanced data,which often performs poorly in practice.This thesis proposes a short text classification technique based on common and differential transfer learning.It finds the common and differential features between the large sample and the small sample,generates virtual small samples,thereby alleviating the problem of data imbalance,improving the accuracy of short text classification,and finally improving the performance of the intelligent customer service system.First,use the text similarity measure to find large samples that can be transfered for small samples.Next,construct a sample generator based on the word template to get the base virtual sample.Then,use small samples and similar large samples as input and basic virtual samples as standard output,to train a virtual sample automatic generator based on encoder-decoder framework and to generate a large number of virtual samples for more small samples.Finally,add virtual data as small samples of extended samples to the training corpus for training the final short text classifier.Experiments show that the proposed method improves the performance of short text classification,especially for small samples,in English data sets and Chinese data sets.The classification accuracy of the small sample in English increased from 7.46%to 59.34%,and in Chinese increased from 1.96%to 42.67%.In addition,this thesis also deeply studies and compares the quality of virtual samples generated under different hypotheses and their impact on the final classification performance,and verifies the effectiveness and robustness of the method from multiple angles.Based on the above research,this thesis implements a FAQ-based intelligent customer service system.The test shows that the system can provide more accurate answers to small sample questions through transfer learning technology. |