| Internet is one of the most important social media in the world,with a huge user group.In Weibo,We Chat public platform and all kinds of official news websites,there are a lot of valuable but scattered travel text information,and most of them are often ignored by us.Therefore,as a place of text data collection,it is of great significance to extract valuable information from the Internet and do real-time analysis.The research of Internet travel text big data first needs to transform unstructured text into structured data,which involves the extraction of key entities.Most text information on the Internet has long text and complex semantics,which makes it difficult to extract key entities including time,place and event.Therefore,the accuracy of traditional text-based event extraction method is not high.Then,after extracting the structured triple information,the travel emotion dictionary is constructed,and further processing is done based on the dictionary and the extracted structured triple data to mine the text value and provide travel suggestions to users.With the increasing number of Internet users,the scale of travel text on the Internet is also growing.For travel information,real-time is also one of the important indicators.We need to consider the real-time problem of travel text processing in the big data environment.In view of the above problems,this paper makes the following research:1.Aiming at the difficulty of extracting triples from Internet travel text data,this paper proposes the idea of extracting triples from complex text first and then generating text summary.A generative text summarization model DSNN-GSM based on document structure neural network is designed.The model introduces document structure,divides the text into word coding layer and sentence coding layer,and constructs a top-down hierarchical structure to avoid the back-propagation error caused by too long input sequence in the traditional encoder decoder model.Attention mechanism is added to each layer,The granularity of attention mechanism is refined.Finally,the results of comparative experiments show that the proposed model can achieve higher accuracy for other methods based on encoder decoder model.2.This paper proposes an Internet travel text stream processing model based on Flink(ITTSPM).DSNN-GSM is embedded into the stream computing framework to process the text in real time and extract the structured triples of time,place and event.Secondly,the travel emotion dictionary is constructed,and PMI algorithm and word2 vec are used to expand the constructed travel emotion dictionary.Then the downstream operator is designed,and the triple data flowing into the downstream are processed by CEP according to the constructed travel emotion dictionary,and finally the travel suggestions are output in real time.Experiments show that DSNN-GSM has higher efficiency in Flink cluster.Through Kafka simulation real-time scene,the feasibility of the complete travel text big data flow processing model is verified. |