| As the programming language of database,SQL plays a very important role in the query and management of database.Natural language to structured query language(NL2SQL)is a technology which can reduce the cost of learning professional database knowledge and improve the efficiency of query.Moreover,the application of this technology in the question answering system can enable users to directly ask questions about the required forms,Add the function of question and answer system.use this technology in question answering system,users can directly ask questions to database,increasing the function of question answering system.With the development of deep learning technology and the emergence of large-scale annotated datasets,NL2SQL technology has also made great progress.However,most of the existing technologies are designed for English datasets,and it is not effective to directly transfer English models to The Chinese domain.In addition,the current open-source Chinese models are relatively rough in semantic extraction and feature fusion.They can not pay enough attention to the structural information,which leads to poor performance of Chinese models.Based on the deep learning model,this paper improves the network structure of the existing model and explores the influence of semantic matching method and data enhancement method on the Chinese NL2SQL task.Finally,a QA system based on Chinese NL2SQL is implemented.The main research work of this paper is as follows:Firstly,this paper introduces the related technologies and models about NL2SQL.Both Chinese and English scenarios are introduced in detail.And their characteristics and problems are pointed out.Then,the CX-SQL model is proposed to solve the existing problems of Chinese open-source model.The model is based on the improvement of the X-SQL model.S-num and W-CONN subtasks are added to the overall structure to adapt to the Chinese dataset.In the natural language coding stage,multi-flag bits are added to distinguish field attributes and represent more different semantics.In the information enhancement stage,attention mechanism was used to fuse global information with flag bit information to fully extract structured information.Data preprocessing and postprocessing are added in the input and output stages of the model to solve the problem of complex semantics of Chinese data.Experimental results show that compared with the best open source model,the logical form accuracy and execution accuracy of CX-SQL model on TableQA data set are improved by 2.2%and 0.8%respectively,and the logical form accuracy and execution accuracy of CX-SQL model on test set are improved by 2.5%and 0.9%respectively.Then,based on the CX-SQL model,this paper explores the influence of semantic matching method and data enhancement method.To solve the problem of inaccurate conditional value prediction in CX-SQL,the task of conditional value prediction is separated from the whole model and the candidate conditional value is selected by semantic matching method.Experimental results show that compared with the original model,the logical form and execution accuracy of the CX-SQL model with semantic matching in TableQA dataset are improved by 0.7%and 0.2%in the validation set,and 0.5%and 1.7%in the test set.The prediction accuracy of conditional value increased by 1.4%.Aiming at the TableQA itself,this paper proposes to enhance the dataset from scale and quality.Seven different scale datasets and two different quality distribution datasets are obtained on TableQA dataset.Then the paper explores the influence of data enhancement on CX-SQL model.Finally,in order to practice the application of NL2SQL technology in QA system,this paper designes and implements a Chinese NL2SQL QA system according to the requirements. |