Font Size: a A A

Tourism Text Classification And Tourist Destination Popularity Mining Based On Granular Computing

Posted on:2021-01-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y X ChiFull Text:PDF
GTID:1360330620961958Subject:Cartography and Geographic Information System
Abstract/Summary:PDF Full Text Request
User-generated content(UGC)is an important data source for tourism geography research.However,no effective approach exists for identifying hidden spatiotemporal patterns within multi-scale unstructured UGC.Therefore,we developed an algorithm named Tourism Text Classification and Tourist Destination Popularity Mining based on Granular Computing(T&T-GrC).T&T-GrC solves the problems of text data granulation,CrC model realization,dataset construction,text correlation feature selection,automatic data granularity classification and tourism heat calculation in the process of tourism text mining based on particle computing.Taking those scenic areas of Jiuzhaigou,Taishan,Huangshan,Pingyaogucheng and Lijianggucheng as the case,this paper makes a detailed study of the text features and TDP and makes a comprehensive comparison of multiple cases.The main works are as follows:1.Design and implement T&T-GrC model.T&T-GrC model introduces CrC to accurately locate and granulate spatial and temporal information of tourism text: To accurately granulate the spatial and temporal information of tourism text,tourism text data granules are used to represent landscape objects.These granules are unified objects that possess multiple attributes,such as spatial and temporal dimensions.The multi-spatiotemporal scales are characterized by the multi-hierarchical structure of CrC,and transformations of granular layers and data granule size are achieved by scale selection in the spatial and temporal dimensions.Therefore,all scales between the spatial and temporal dimension are related,which allows for the comparability of the data granules of all spatial-spatial,temporal-temporal and spatial-temporal layers.This approach achieves a quantitative description and comparison of the popularity value of granules between adjacent scales and cross-scales.Therefore,the TDP with multi-spatiotemporal scales can be deduced and calculated in a systematic framework.The model consists of the following components:(1)Tourism text data granulation method.The concept of "information granule" is introduced into tourism geography,and the landscape object is represented by the tourism text data granule,so as to accurately locate and granulate the spatial and temporal information of tourism text.(2)The tourism text CrC model based on inclusion degree is implemented.By using the inclusion degree theory to realize the CrC model,the TDP can be quantitatively inferred from different granularity under multi-spatiotemporal scale,and the coupling relationship and evolution law of TDP between different spatiotemporal scales can be analyzed.(3)Dataset construction method based on CrC.The effective integration and extensive sharing of tourism geospatial data is an important prerequisite to promote the research and application of tourism geospatial data.A dataset construction approach for the text GrC model is proposed to provide a feasible scheme for reorganizing large-scale unstructured text and constructing public spatiotemporal UGC tourism datasets.(4)Text correlation feature selection and data granule classification method.Firstly,T&T-GrC model was used to construct training sets based on text content and spatial position coordinates.Then,a multi-scale tourism text feature selection method based on correlation degree is proposed,which uses the correlation degree of features and data granules to carry out feature classification and weighting,so as to improve the classification effect of data granules.Finally,the selected features are applied to the SVM classifier to realize the automatic classification of multi-spatiotemporal scale tourism data granules.The classification results were input into the dataset structure framework to realize the automatic construction and update of the dataset,and at the same time to improve the TDP calculation efficiency of T&T-GrC model.(5)TDP calculation method based on CrC.The calculation methods of TDP of landscape data granules at different spatial scales and the rules of deduction of TDP at adjacent scales and across scales are described in detail.2.Case study.(1)A detailed study of a single caseTaking Jiuzhaigou as a case,this paper makes an in-depth analysis of text data classification performance,feature correlation degree visualization and TDP of T&T-GrC,and proves the superior performance of T&T-GrC.(1)Text data granular classification performance.According to the experiments on the public dataset Reuters-21578 and Jiuzhaigou tourism text dataset,experimental results show that classification performance of T&T-GrC is better than the classical and new methods commonly used at present.(2)Landscape visualization based on feature correlation degree.Based on the feature correlation degree in the text granule,the landscape of all scales is visualized by means of cloud tag,and the landscape features are quantitatively,delicately and intuitively described.(3)T&T-GrC model can achieves a quantitative description and comparison of the popularity value of granules between adjacent scales and cross-scales.The research results based on T&T-GrC were basically consistent with the existing research results,which confirmed the feasibility and effectiveness of the method.In addition,through quantitative analysis of the contribution of lower data particles to the TDP in the upper space,more detailed characteristics of TDP are revealed than existing studies,and quantitative data support is provided for the driving forces of some phenomena.(2)Comprehensive comparison of multiple casesT&T-GrC model can realize the comparability of text features and TDP of multiple scenic areas at different spatiotemporal scales.The comprehensive comparison results of multiple cases show that:(1)Toponymy plays an important role in tourists' identification of their own tourist locations,and most tourists' cognition of tourist destinations lies at the scenic areas scale.The different popularity of different routes or sub-scenic areas in the same scenic area is significant.Most microblog users who can pay attention to the route or sub-scenic area scale will directly describe the specific scenic spots,and all of them mainly describe the single scenic spot,less describe the name of route or sub-scenic area.Tourists are more likely to describe their complete tour in natural scenic spots and focus more on the well-known scenic spots.(2)The hot months in TDP are mainly in summer and autumn,and the peak of popularity usually occurs in February,April,August and October.Natural scenic spots are affected by the climate.The hot months are concentrated in the season of "warm spring-golden autumn" with the suitable climate.The peaks of hydrologic and mountain scenic areas are in October and April respectively.Cultural scenic areas are less affected by the climate and have a large span in hot months.Monthly peaks of popularity are usually associated with tourism incentives or holidays.(3)There is no obvious "weekend effect" in the TDP of Jiuzhaigou and Lijianggucheng,while there is an obvious "weekend effect" in the TDP of Taishan,Huangshan and Pingyaogucheng.(4)The daily variation pattern of the scenic areas present commonness and characteristics: the popularity of all scenic areas showed obvious monthly difference and seasonal,and the popularity of hot summer and autumn tourist season was significantly higher than that of winter and spring.Peak-seasons show the patterns of three peaks and three valleys,three peaks and two valleys,while the off-seasons show the patterns of double peaks,single peaks or non-significant peaks and valleys.The peaks of mountain-natural scenic areas appear in the early morning,while the peaks of water-landscape and cultural scenic areas appear in the late night.(5)The popularity change mode of routes or sub-scenic areas are obviously influenced by tourism guidance mode,tourism policy and scenic spot characteristics.(6)The monthly changes of TDP show four patterns: one peak,double peaks,triple peaks and quadruple peaks.February,April,may,August and October are the months with more hot peaks.(7)The thermal contribution of the route or sub-scenic area to the scenic area and the numerical characteristics of TDP of each scenic area all conform to "Bartlett's law".(8)The level,number and change pattern of hot spots contribute to the popularity change pattern of route,sub-scenic area and scenic area.(9)The intraday variation patterns of scenic spots all show the undulatory pattern of multi-peaks and multi-valleys.From the perspective of scenic areas,there are significant differences between natural and cultural scenic spots in terms of peak pattern,peak starting time,rising mode,peak occurrence time and peak duration.The peak shapes of natural scenic areas are “n” or “w”,while the one of cultural scenic area is “?”.The popularity of natural scenic areas rises earlier than that of cultural scenic areas.The popularity of scenic spots in natural scenic areas rose rapidly,reaching the peak at 12 pm,while those in cultural scenic areas rose slowly,reaching the peak at 22 pm.The peak duration of natural scenic areas is longer than that of cultural scenic areas.The popularity pattern of each scenic spot has a significant influence on those of the route or scenic area.The novelty of this paper are as follows.?We introduce the GrC model into tourism geography through T&T-GrC,which constructs a quantitative model of TDP at multi-spatiotemporal scales based on GrC using the inclusion degree.All scales between the spatial and temporal dimension are related in T&T-GrC,which allows for the comparability of the data granules of all spatial-spatial,temporal-temporal and spatial-temporal layers.The proposed T&T-GrC can describe the TDP at a single spatial or temporal scale as well as the patterns and processes of TDP at multi-spatiotemporal scales.?A dataset construction approach for the text GrC model is proposed to provide a feasible scheme for reorganizing large-scale unstructured text and constructing public spatiotemporal UGC tourism datasets.?The method of feature selection and data granule classification of tourism text is proposed,which provides an efficient solution for the multi-scale automatic classification of tourism text.
Keywords/Search Tags:granular computing, tourist destination popularity, dataset, text granule classification, correlation feature selection, multi-spatiotemporal scale, tourism text mining
PDF Full Text Request
Related items