Font Size: a A A

Traffic Session Identification Based On Statistical Language Model

Posted on:2015-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LouFull Text:PDF
GTID:2268330431456958Subject:E-commerce and information technology
Abstract/Summary:PDF Full Text Request
A session is defined as a sequence of queries or requests made by a user or an application for a specific task. Session identification has attracted a lot of attention as it is of great significance on discovering useful patterns. Many researchers made unremitting exploration in Web or database session identification. The most commonly used methods are timeout method and statistical-language-model-based method, which have achieved good performance in specific applications. A traffic session is a sequence of camera locations orderly passed by a vehicle to achieve a certain task. Traffic session identification has important implications for route prediction, congestion detection as well as location based service. However, research in traffic session identification is rather limited. In this paper, we first propose the problem of session identification in traffic applications.We firstly utilize timeout method as well as the method based on statistical language model to identify traffic sessions. Timeout method mainly considers the influence of time intervals between neighboring locations on session identification. In this method, a session drift is identified between two camera locations if the time interval between them is larger than a predefined threshold. The method based on statistical language model does not rely on any time information when identifying session boundaries. Instead, it uses the change of information in the sequence of camera locations. Given a camera location sequence, we can measure its frequency by calculating the probability of its occurrence. Then, the quality of the language model can be measured by its information entropy on the given sequence. The increase in entropy caused by the addition of a new location can be served as the signal for session boundary identification. Both the timeout method and the statistical language model only consider one influence factor when identifying session boundaries. In other words, the timeout method just relies on the time factor, while the language model only takes the regularity of location sequences into account. Temporal information has a significant impact on the behavior of people, and people’s driving routes usually display a great degree of time-related regularity. Intuitively, the larger the time interval between two camera locations, the smaller the probability that they belong to the same session. We thus propose a time influence function to measure the impact of time factor on session identification. In real life, most people prefer to choose a route that they are familiar with. Therefore, the camera location sequences passed by the user show a certain degree of regularity. Based on the mutual effect of time factor and location sequence regularity, we put forward an improved statistical language model, which integrates the mutual influence of these two factors. Assuming that there exists a sequence of camera locations in a session, and they are frequently visited in order without a long stay in between, the change in entropy as well as the time influence function of this sequence will be relatively low. However, when the vehicle passes a new location that is not relevant to the original session, the introduction of this new location will cause an increase in the entropy of the sequence. Also, when the vehicle has a long stay between the new location and the previous sequence, the time influence function value will be large. Both the increase in the entropy and the rise of time function value can be served as the signal for session boundary identification.Extensive experiments are conducted on a real traffic dataset to testify the effectiveness of our proposal, and the result demonstrates its effectiveness compared to other alternative methods including the timeout method and the classic language model. In addition, the comparison and analysis also shows that the time factor affects the session identification more than the regularity of location sequence does.
Keywords/Search Tags:session identification, timeout method, statistical language model
PDF Full Text Request
Related items