| Chinese input method uses an internal input method engine(IME)to analyze and transform the text data entered by users using algorithms.It provides users with candidate options for selection.Common Chinese IME include Pinyin-to-character(P2C)and Automatic completion of whole sentences(ACWS).P2 C can be further categorized into complete P2 C and abbreviated P2 C.The purpose of P2 C is to convert the user’s inputted Pinyin character sequence into corresponding Chinese strings and recommend them to the user.ACWS,on the other hand,predicts and recommends candidate sentences based on the preceding part of the user’s input.The development of deep learning has promoted the development of IME,but in previous research,P2 C and ACWS models based on neural network models were highly dependent on the composition of the training dataset,and a trained neural network model could not maintain high performance when applied to different users and domains.To solve this problem,this thesis proposes using a method of dynamically storing and updating representation vectors,utilizing the user’s historical information to improve the efficiency and adaptability of neural network models.Two representation vector-based algorithms are designed for P2 C and ACWS to improve the neural network model’s adaptability to different data,the details are as follows:(1)The representation vector-based adaptive P2 C algorithm uses a pre-trained Transformer model to generate representation vectors with semantic information for the Pinyin and Chinese characters in the training set,which are then stored.In the actual user usage process,the retrieved results are normalized by probability and weighted into the output of the Transformer model to form the final probability distribution,which is recommended to the user using the beam search algorithm.Finally,the user’s confirmed input text data is transformed into a representation vector and updated in the data storage to achieve user adaptation.Experimental results on four different style domain datasets confirm that the algorithm-generated IME can effectively track user behavior without the need for further network model training,has strong domain adaptability,and outperforms traditional Pinyin conversion frameworks and commercial IME in multiple metrics.(2)The representation vector-based adaptive ACWS algorithm uses a pre-trained GPT model to convert sentences in the training set into representation vectors for data storage.When the user enters the first half of a sentence,the IME sends it to the GPT model to generate the corresponding representation vector,which is then compared with the stored representation vectors using similarity retrieval.The retrieved results are then normalized by probability and weighted into the output of the GPT model to form the final probability distribution,which is recommended to the user using the language model’s autoregression and beam search algorithm.Finally,the user’s confirmed text data is transformed into a representation vector and updated in the data storage to enhance the neural network model’s adaptability.Experimental results on four different style domain datasets show that the representation vector-based ACWS algorithm can effectively adapt to user behavior while maintaining the performance of the neural network model and improving user experience. |