The goal of Lexical Simplification(LS)is to reduce the complexity of text by replacing complex words in the text content,which can facilitate the comprehension of text by children,foreigners,and people with reading disabilities.Existing approaches focus on how to generate simplified alternatives for complex words without considering the user’s knowledge level.Considering that excessive simplification may bring semantic loss,the most ideal simplification method needs to generate alternatives according to the user’s knowledge level.Also,the most studied word simplification methods are now in English,and there is a lack of word simplification studies in Chinese.For this reason,research based on the knowledge level of the target audience and Chinese word simplification was carried out.In this paper,we propose an English neural language model lexical simplification scheme based on the target audience’s level of incorporated language knowledge,construct a performance corpus ChineseLS and realize automatic evaluation of Chinese lexical simplification schemes,and design a Web application system for implementing text annotation functions in the text simplification process.The main research of this paper is as follows:(1)A method of English lexical simplification based on the level of the target audience is proposed.Without focusing on the target audience level,the pursuit of simplified text consisting of simple words can cause excessive semantic information loss.Based on the information of users’ English vocabulary levels,a lexical simplification method based on the target audience level is proposed by combining the pre-trained language model BERT and existing lexical simplification methods.Relative to the existing methods,a balance between simplicity,fluency and semantic information maintenance is achieved for lexical simplification.(2)A Chinese vocabulary simplification method based on pre-trained language models is proposed.First,a Chinese lexical simplification benchmark corpus is created manually to solve the problems of difficulty in obtaining annotations and automatic evaluation of lexical simplification methods.Then a pre-trained language model and its hybrid approach are used to filter candidate words and generate complex words for substitution to achieve Chinese vocabulary simplification.Finally,three different types of substitution generation methods,such as synonym-based,word embedding-based,and sense element-based,are compared in the experiments,and the advantages and disadvantages of each scheme are discussed.(3)A text simplification corpus construction,annotation,and evaluation system was developed.A text simplification corpus construction,annotation,and evaluation system was implemented to better construct an evaluation corpus for text simplification and to evaluate existing methods.This is an open source web application for manually creating and evaluating parallel corpora to simplify texts.The system can be used for 1)sentence alignment,2)rating alignment pairs(e.g.,grammaticality,meaning preservation,etc.),3)annotated alignment pair simplification transformations(e.g.,lexical substitution,sentence segmentation,etc.),and 4)manual simplification of complex documents. |