Font Size: a A A

Research On Social Network Character Attribute Extraction Method Based On Statistical Learning

Posted on:2022-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:C LiuFull Text:PDF
GTID:2518306524992579Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
The development of the Internet has changed many aspects in human life.In terms of social interaction,Social Network Service(SNS)has become one of the most important communication methods for contemporary people.People write,share,communicate and exchange ideas in social networks,and generate a large amount of high-value data.All of these make the Person Portrait technology,a technology of modeling people on social networks based on relevant data,has become an important research direction in the Internet field.It plays an essential role in precise push,personalized service,recommendation system,etc.Person attribute extraction is the basis of person persona,and its goal is to retrieve person-related documents from various platforms on social networks based on known information,then to extract person-related attributes(for example: birthday,occupation,etc.)from these documents.There are some problems in existing person attribute extraction methods,one is the insufficient use of information when associating person documents across platforms,especially in the case of name repeating and lack of information,which easily leads to a mismatch.The other is the extraction algorithm has a high requirement on the structure of web pages,making it difficult to handle web pages with uncertain structures when extracting person attributes from web documents.In response to the above problems,this thesis has launched the research on person attribute extraction methods.The main innovations are summarized as follows:First,for the problem of person document association,a cross-social media account matching method that integrates account multi-modal characteristics is proposed.The method obtains the three modalities of accounts including avatars,profiles,and search rankings of characters in multiple social media,then extracts the features respectively,and uses random forest to match.This method uses integrated learning ideas to integrate account multi-modality,makes full use of account information,and has strong robustness against duplicate names and lack of identity information.In comparison experiments,it has achieved higher results than single feature matching methods and other multi-feature matching methods.Second,for the problem of person attribute extraction on web pages,a Tree Conditional Random Fields model based on text lexical and syntactic features is proposed.This method models the extraction of person attributes on web pages as a text sequence labeling problem.The text of the web page was taken as input,and the morphological and syntactic features of the words in the text were extracted to train a Tree Conditional Random Fields model for labeling person attributes.The method in this thesis does not rely on the specific web page structure,and requires less training data.Under the same training set in comparison experiment,it has achieved higher labeling accuracy than other sequence labeling methods.
Keywords/Search Tags:account matching, multi-modal fusion, Tree Conditional Random fields, person attribute extraction
PDF Full Text Request
Related items