Font Size: a A A

Multi-Label Classification Method For WMS Metadata Text Based On Semi-Supervised Learning

Posted on:2020-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhangFull Text:PDF
GTID:2370330590976753Subject:Cartography and Geographic Information System
Abstract/Summary:PDF Full Text Request
With the development of geographic information network sharing and Volunteer Geographic Information(VGI),a large number of Web Map Services(WMS)resource with diverse topics have emerged,providing a wealth of data resources for geoscience research and application.However,the existing metadata standards lack explicit,finegrained,and domain-oriented content description mechanisms,which make it impossible for domain experts and service users to quickly locate resource data within the target topic.The service retrieval requirements of the target domain place urgent need on the multi-label classification of service metadata.But WMS metadata is different in length,language,complex in feature vocabulary,and lack of train data with labeled domain topics,which leads to huge challenges in accurate multi-label classification of WMS metadata text.This paper proposes a WMs metadata text multilabel classification method based on semi-supervised learning.This method realizes the WMS metadata double-layer multi-label topic matching under the premise of relying on a small amount of labeled train data.This method includes three main processes: feature selection,multi-label classification and secondary topic extraction.Firstly,Societal Benefit Areas(SBAs)are regarded as the coarse-grained domain topics,while knowledge bases are applied to extract typical words closely related to SBAs semantics.The spatial distance between feature and typical word is computed based on Word2 vec algorithm as a measure to achieve optimal domain feature subset selection.Secondly,this paper proposed a multi-label classification base model,ML-CSW,defining feature weight by semantic path in ontology dictionary between feature and SBAs,and training the topic prediction model.Based on the theoretical basis of semisupervised learning,Multi-label K Nearest Neighbor(ML-KNN)and ML-CSW are combined to collaborative training to achieve multi-label classification.Finally,LDA algorithm is used for secondary topic extraction to construct a double-layer multi-label topic catalog based on the classification results of coarsegrained domain topics.In order to verify the feasibility of the multi-label classification method,this paper takes WMS and layer metadata as the research objects,and carries out the feature selection accuracy,the collaborative training base model accuracy,and SML-SWKNN accuracy,semantic rationality,and applicable scene experiments.The experimental results show that the feature selection algorithm proposed in this paper can effectively improve the classification performance.SML-SWKNN algorithm has a greater improvement than the classic multi-label classification algorithm and the algorithm is more suitable for long English texts with rich domain information.Multi-label classification and double-layer topic matching also have semantic rationality.Thus,the method can be applied to the geographic information portal or service catalogue to help retrieval WMS resource within domain topic.
Keywords/Search Tags:OGC WMS, Multi-label Classification, Collaborative Training, Word2vec, ML-KNN, LDA
PDF Full Text Request
Related items