| With the development of the information age,a large amount of personal and corporate digital data is generated every day in real life,which is usually stored in databases.The barrier to use is high.The goal of Natural Language to SQL(NL2SQL)task is to convert natural language questions into corresponding SQL statements that can be executed in the same database,which can improve the efficiency of information retrieval and reduce the barrier to use.The existing NL2SQL task models are mainly for English text,and there are many problems such as column name reuse,inconsistent text representation and pronoun coreference in Chinese text.Thesis takes Chinese NL2SQL tasks as the research object,and constructs several NL2SQL models to implement Chinese natural language generated SQL queries from the perspective of improving the accuracy of SQL query generation for two different query scenarios,namely single-round and multi-round.The research work and innovation points of thesis are as follows:(1)For the Chinese NL2SQL task in single-turn and cross-domain scenarios,a single-turn NL2SQL model based on relational graph attention network(RGAT-SQL)was proposed.RGAT-SQL consists of a context encoder,a question-schema interaction graph,a relational-aware graph encoder and a decoder.The problem of model generalization across domains is solved by constructing interrogative-pattern interaction graphs.The interaction graph oversizing problem is solved by using the graph pruning module to prune the interrogative-pattern interaction graph according to the correlation between nodes.The relational graph attention network is introduced to incorporate the edge information in the interaction graph into the model to solve the alignment problem between natural language interrogative sentences and database schemas.An exact matching rate of 66.2% is achieved on the dataset CSpider,which is 1.7% improvement on the best baseline model and validates the RGAT-SQL model.(2)For the Chinese NL2SQL task in multi-turn and cross-domain scenarios,a multiturn NL2SQL model based on historical information augmented network(HIAM-SQL)was proposed.HIAM-SQL consists of a multimodal encoder,a historical information augmentation network,and a decoder.The long-term dependence problem is solved by introducing the context information in the interaction history utterances and the last predicted SQL query.The multi-modal encoder is used to encode natural language and SQL queries respectively,and the semantic information and context information are incorporated into the word vector to enhance the representation of structured data.A context-dependent pattern link graph is constructed to represent the relationship between the current utterance,the interaction history utterance,the last predicted SQL query,and the corresponding database schema.It achieves 46.4% question matching accuracy and24.2% interaction matching accuracy on the dataset Chase,which are 5% and 4.2% higher than the best baseline model,respectively,which verifies the effectiveness of the HIAMSQL model. |