Font Size: a A A

Research On Knowledge Mining With Representation Learning

Posted on:2018-10-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:M FanFull Text:PDF
GTID:1368330566987971Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the face of the explosive growth of text information on the Web,people has become increasingly aware of the importance of automatic knowledge mining in use of computers.The knowledge mentioned over this thesis refers to triplets composed of <head_entity,relation,tail_entity>,which convensionally interpret the facts or beliefs in the real world.The research of knowledge mining mainly includes extracting structured knowledge from the free / semi-structured free text,and using the existing structured knowledge to infer undiscovered knowledge.Knowledge extraction and inference methods can help people to automatically extract the core information in massive natural language text and store them in the form of triplets.This makes structured knowledge,whether for information storage and representation,or further enhancing many kinds of applications have broad prospects.For instance,some productized knowledge graph has brought great improvement on user experience and platform performance of many applications on the Internet,such as search engines,QA systems,and even recommender systems.However,the construction of most large-scale knowledge graphs generally benefits from huge number of users on their platform,in terms of crowdsourcing method to directly obtain knowledge from massive manual editing.The research of automatic knowledge extraction and inference is limited by expensive annotated data,sparse text features,heterogeneous data that is difficult to be consolidated.Recent studies on representation learning theory enlighten us that some characteristics of representation learning may help solve the three problems above.Therefore,this thesis will explore a series of representation learning methods on knowledge extraction and inference which aim at mining new knowledge automatically.The main work and contributions are:Free text information extraction based on low-rank matrix representation learning.In the aspect of unstructured / semi-structured text,we leverage the assumption of distant supervision to carry out entity linking and automatic construction of weakly labeled training samples for relation extraction,and propose a low-rank matrix completion method to extract relations between entities in text.The highlight of this method is to reconstruct large-scale sparse text features and noises caused by distantly supervised annotation via using the transductive learning method of low-rank matrix completion,while the distant supervision paradigm automatically obtains a large number of weakly annotated data.The principle components of the relationships between text entities represented by a low-rank matrix are learnt simutanously with the noise stripping,so that the performance of relation prediction under distant supervision is improved.Knowledge representation learning and belief inference based on lowdimensional vectors.In structured knowledge bases,low-dimensional vectors are used to represent and deduce the entities and relations in a knowledge base.We propose geometrical and probabilistic models to conduct the task of belief inference on complete and imperfect knowledge bases,respectively.At the same time,the effect of the multirelational beliefs that is widely existed in knowledge bases,is analysed and further considered to improve the performance via adapting the learning rate.This method compresses the representation of knowledge from matrix to vector level,not only achieves incremental computation on the algorithm,but also gives the belief inference to the mathematical explaination.Joint low-dimensional representation learning of knowledge bases and free text.The use of entity descriptions or relation mentions as a link to acquire the lowdimensional vector representations of vocabularies and entities,relationships,bridges the gap between the two kinds of heterogeneous data of free text and knowledge base.The series of approaches proposed not only covers the semantics of entities,relationships,and vocabularies,but also enhances the performance of tasks such as belief inference,entity classification,and relation prediction.Moreover,it shines a light on handling large amount of heterogeneous data for many Web applications such as search engines,QA systems etc.
Keywords/Search Tags:Knowledge base, Free text, Representation learning, Relation extraction, Belief inference
PDF Full Text Request
Related items