| Information retrieval is one of the important research topics in computer field.With the continuous development of computing technology and equipment,the traditional information search based on keywords has been gradually replaced by more and more emerging search methods,such as interactive search,multi-modal search,personalized search and recommendation.Under the background of information retrieval field,user query subtopic mining has become an important research direction,and has become one of the important bases for the realization of new search methods.A user query subtopic is a set of terms that cover the subintent information of a query,usually revealing what a user might want to search for next after a search.In the existing academic research,rules or machine learning methods are usually used to extract sub-title items from search logs or search result documents,or text information in web pages is used to generate sub-title items.However,previous studies often ignore the important information in the web pages of search results,including structured information such as lists and tables,as well as unstructured information,mainly plain text,and fail to integrate these two kinds of information into the generation of query subtopics.This paper mainly studies the generation of query subtopics using the improved algorithm and the unstructured and structured information in the search results,including six aspects:Improve and optimize BM25 and CRAT algorithms to generate sub-topics based on the query of unstructured information on the web page,generate sub-topics based on the query of structured information in the web page,generate sub-topics based on the query of structured and unstructured information in the web page,and generate sub-topics based on the query of various aspects of the web page..And the multi-result subtopic aggregation optimization algorithm combines multiple advantages to generate subtopics.This paper uses BART model as the basis,inputting structured and unstructured information from user queries and search results,and making it output query subtopics one by one.In order to incorporate structured information,this paper obtains HTML tree structures from search web pages,and incorporates these tree structure experimental results by modifying BART location coding.Therefore,it can be applied to online systems to generate potential sub-intentions of user queries,and can be further applied to more complex search tasks.Therefore,this paper implements a Socket-based system that when a user enters a query,the system can display the subtopics of the user’s query. |