| In the era of big data,all kinds of information data are gradually becoming an important basis and driving force for scientific research,of which the information related to people is called character information.By analyzing the temporal and spatial information contained in the character information,the life course and activity track of the character can be reproduced,and then the characteristics of the group activity and social evolution of the character can be explored.It has become one of the current research hotspots.The open Internet is an open,free and accessible part of the Internet,which contains a wealth of character information and is an important source of character information.However,the character information in the open Internet has some problems,such as information dispersion,redundancy and noise,etc.,which is an obstacle to obtain the mass character information.At the same time,text information such as place names and addresses is often used to describe a spatial location in the open Internet.This kind of information lacks space and is difficult to be directly analyzed and used by GIS system,which is not conducive to the research of character information from the perspective of space.In view of the difficulty and lack of spatial access to character information in the open Internet,this paper researches character information access and spatial access oriented to the open Internet.The main research contents and results are as follows:Taking scientific and technological figures as an example,this paper studies the method of character information acquisition and spatialization oriented to the open Internet.The main contents and research results are as follows:(1)Design the method of character information acquisition for the open Internet.The expert evaluation index is constructed,high-quality data sources are selected to obtain the basic information of characters.The webpage information related to characters is obtained from the open Internet by cooperating with the search engine.Based on the features of HTML tag and body text,a method to obtain the body text of web pages is designed.Based on the natural language analysis technology,combined with the relationship between part of speech and semantic,a structured extraction method of character information is designed to realize the extraction of character attribute,character relationship and character experience.(2)The method of spatial and temporal normalization and integration of character information is designed.Based on the analysis of spatiotemporal expression characteristics of character information,the normalization methods of time information based on canonical matching pattern and spatial information based on address tree and spatial relation matching are designed to realize the normalization of all kinds of time information and spatial information in character information.The multi-source heterogeneous characteristics of character information are analyzed,and fusion methods such as abnormal information detection based on life cycle,character experience supplement based on rule reasoning,and character experience deduplication based on spatiotemporal-semantic similarity are designed.(3)The spatial method of character information is designed.Using the geocoding tool,the location of the registered place names and addresses is accurately located.Combined with the address classification,the rough location of the unregistered place names and addresses is realized.The reference point-radius model is used to spatialize the description of spatial relations based on spatial semantic analysis.Based on the two aspects of people’s experience and geographical location,the disambiguation method of place names based on correlation degree is designed to correct the spatial results.Combining the characteristics of open Internet and character information,this paper designs a character information acquisition and spatial method oriented to open Internet,which helps promote the further development of rich character information resources in the open Internet,provide scholars with huge amounts of spatial character information,and for extending the application of GIS in characters related research field is of great significance. |