| Over the years,foreign militaries have frequently carried out military activities in China’s offshore waters,such as maritime surveillance,exercises and demonstrations,and even cross-border provocations,posing a serious threat to China’s sovereignty and interests.At the same time,a large number of reports on offshore foreign military activities have been produced.In the face of these unstructured data containing multiple objects and complex relationships,intelligent technology is urgently needed for analysis and mining.With the rapid development of information technology,heterogeneous information network has become a research hotspot in the field of data mining.As a semi-structured representation,heterogeneous information network can not only abstract different types of objects and their relationships in reality,but also retain rich semantic information.Therefore,this paper used heterogeneous information network to carry out research on offshore foreign military activities according to the ideas of "data collection and preprocessing" → "network modeling" → "analysis and mining".Firstly,news reports of offshore foreign military activities were collected and preprocessed to form networked data.Then the network of offshore foreign military activities was constructed.Finally,analysis and mining was conducted based on the network,the specific work is as follows:(1)We proposed a cross-lingual ship entity matching algorithm based on mixed similarity measure.Faced with the challenges of inconsistent description of ship names,limited attribute information and the use of different languages in standard ship databases,the attribute information of ships was extracted and matching rules were designed.Secondly,machine translation tools were used to translate ship names into Chinese.Thirdly,a hybrid similarity measure was designed to capture pronunciation and character characteristics,and entity matching was performed by calculating the similarity between ships’ names.The ship entity matching algorithm proposed in this paper didn’t depend on parallel corpus between different languages,and also overcomed the problem of unstable quality of machine translation,which provided an algorithm basis for modeling offshore foreign military activities.(2)We proposed a modeling framework of attribute heterogeneous information network for multi-source heterogeneous data.At present,a large number of researches are based on the construction of heterogeneous information network,but there are few researches on the construction of heterogeneous information network.In this paper,the concept of attribute heterogeneous information network was improved,the attribute information of objects was integrated into the network,and a modeling framework of attribute heterogeneous information network was proposed.Finally,the news reports of offshore foreign forces activities in the Internet and the standard ship database were used to carry out the application research,and the offshore foreign forces activities network was constructed to verify the feasibility of the modeling framework.(3)We proposed a clustering algorithm of offshore foreign forces based on ships’ ranking.Faced with the problems of various kinds of offshore foreign military activities,unknown labels and large amount of information were to be excavated,this paper first designed ranking function of ship.Secondly,the probability generation model of offshore foreign military activities was constructed.Thirdly,reasonable clustering results were obtained through mutual iteration of ship ranking and offshore foreign military activity clustering.Then the application research was carried out in the offshore foreign military activity network constructed by ourselves,and three categories of activities including freedom of navigation operations in the South China Sea,crossing the Taiwan Strait and exercises and drills were found,and the sub-networks were constructed respectively.Finally,we analyzed each sub-network and ships’ ranking results by combining the attribute information in the network,and obtained valuable information. |