| Artificial intelligence is a discipline that studies how to endow computers with human-like intelligence,including image processing,speech processing,natural language processing and other technologies.Among them,natural language processing bridges the language communication between humans and machines.Facilitating language interactions between humans and machines in a natural and immersive manner has been one of the key and long-standing tasks in the field of artificial intelligence.Scientists are committed to building intelligent dialogue systems or social chatbots such as Apple Siri,Google Now,Microsoft Cortana,and Amazon Alexa.It is a fundamental problem for dialogue systems to understand the semantics of the conversation context so that it can predict the next response reasonably and naturally.One of the main approaches is to selecting the most reasonable response from a given set of candidates,which is well-known as retrieval-based dialogue systems.Retrieval-based dialogue systems can be categorized into multiple sub-tasks such as personalized conversations,knowledge-grounded conversations and multi-party conversations depending on the application scenarios.In addition,with the rise of deep learning technology,the method of constructing retrieval-based dialogue systems has also transitioned from traditional methods based on rules and statistics to those based on neural network model learning.However,there are still problems in the current deep retrieval-based dialogue systems due to insufficient consideration of specific application scenarios.For example,there is difficulty in capturing the complex semantic matching information between conversation contexts and response candidates effectively,no explicit long-term memory of the consistent personality of an interlocutor,lack of background knowledge that a dialogue is grounded on,and difficulty in modeling complex interactions between utterances and interlocutors in multi-party conversations,which influences the performance of response selection.Therefore,this thesis focuses on deep retrieval-based dialogue systems,and studies multi-turn conversations,personalized conversations,knowledge-grounded conversations and multi-party conversations respectively.Specifically:First,this thesis studies multi-turn conversation in retrieval-based dialogue systems.For the lack of utterance-level semantic matching between conversation contexts and response candidates in the existing work,this thesis proposes fine-grained utterance-to-utterance interactive matching network for response selection.This model selects the most relevant information in conversation contexts for response candidates.Also,a response is decomposed to a set of utterances,and the prior information of the inter-utterance distance between any two utterances in a conversation context and in a response candidate is incorporated into the matching process.Furthermore,this thesis proposes a pre-training method for response selection that integrates speaker representation and domain adaptation into pre-trained language models.This method reflects the dialogue property of speaker alternation in pre-trained language models,and improves the representation ability of pre-training models in dialogue.The above study improves the recall of response selection on four public multi-turn response selection datasets,and achieves the state-of-the-art performance at that time.Secondly,this thesis studies interlocutor persona-based personalized dialogue systems.As existing research only focuses on the self persona of a respondent and ignores the partner persona,four persona fusion strategies for personalized response selection are proposed based on whether the interaction between personas and contexts,and that between personas and response is considered.These strategies are implemented into three different models to examine the effect of self and partner personas on personalized response selection on a public interlocutor persona-based response selection dataset.On the other hand,to alleviate the cold-start problem in personalized dialogue systems caused by the lack of pre-defined interlocutor personas,a method of speaker persona detection to search for approximate personas based on early conversation contexts is proposed.A dataset for speaker persona detection is constructed,and employed to verify the effectiveness of retrieving approximate interlocutor personas by fine-grained matching when interlocutor personas are not pre-specified.Thirdly,this thesis studies knowledge-grounded conversation in retrieval-based dialogue systems.To model the semantic matching relationship between background knowledge and response candidates directly,a dually interactive matching network for response selection is proposed to conduct interactive matching of semantic representations between conversation contexts and response candidates,as well as that between background knowledge and response candidates simultaneously.Since there exists independence and redundancy between conversation contexts and background knowledge,and a single time of interaction can only capture shallow matching features,a method of filtering before iteratively referring is proposed for knowledge-grounded response selection.The method pre-establishes the interaction perception mechanism between conversation contexts and background knowledge,and select relevant background knowledge effectively.Then,iteratively interactive matching is conducted to obtain the deep matching information between conversation contexts and response candidates,as well as that between background knowledge and response candidates.The above study improves the recall of response selection on two public knowledge-grounded response selection datasets,and achieves the state-of-the-art performance at that time.Finally,this thesis studies multi-party conversation in retrieval-based dialogue systems.A multi-party conversation always contains complicated interactions between utterances,between interlocutors,as well as between an interlocutor and an utterance.Meanwhile,the tasks of speaker identification,addressee recognition and response selection in multi-party conversations are complementary among each other.Therefore,this thesis proposes a method for modeling multi-party conversations based on multitask self-supervised learning.Multiple self-supervised learning tasks are designed to tackle the core issue of "who says what to whom" in multi-party conversations.In this way,models can compute better interlocutor representations and utterance representations containing richer semantics.Furthermore,it can deepen the understanding of multi-party conversations,and enhance the generalization ability across multiple downstream tasks.The above study improves the accuracy and recall of speaker identification,response selection and addressee recognition on two public multi-party conversation datasets,and achieves the state-of-the-art performance at that time. |