| In the context of today's big data era,with the rapid development of information technology in various industries,the information data of various vertical fields has exploded.Whether in work or in life,people have become accustomed to relying on the Internet to obtain effective information.However,massive data is likely to cause information overload.How to quickly and efficiently obtain useful information for users has become a major problem faced by search engines.Aiming at the poor scalability of traditional search engines and the poor search efficiency and performance,this paper proposes a distributed search engine based on ElasticSearch.Based on the improvement of retrieval efficiency and retrieval accuracy,the user history search records are analyzed and utilized.To make search engines more intelligent and friendly to interact with users.On the basis of in-depth analysis of system requirements,the system is mainly divided into offline data processing process and real-time search display process.'The offline data processing process mainly includes data preprocessing,data storage,index update,and extended thesaurus.The real-time search and display process mainly includes search word error correction,search word prompting,search result sorting and display.The related technologies used in the design and implementation of the search engine include ElasticSearch framework,text segmentation technology,message queue,new word discovery algorithm,sorting algorithm,N-Gram language model and shortest edit distance algorithm.Among them,the N-Gram language statistical model and the shortest edit distance algorithm are used to realize the search word error correction function;the BM25 algorithm is used to sort the search results,so that the search results are more in line with the user's actual needs;the new word discovery based on statistics is adopted.The algorithm implements the expansion of the thesaurus,and periodically analyzes the user's behavior log to perform new word discovery,thereby improving the accuracy of the word segmentation.Through the multi-faceted testing and analysis,the practicability,effectiveness and real-time of the system are verified.Through the expansion of the thesaurus,the accuracy of the word segmentation is improved,and the search results are reordered to return more satisfactory results to the user,which improves the user experience and improves the paid video click rate and turnover.At present,the system has passed the test and delivered to the user,and has received positive feedback from the user.The search engine system has not experienced any major anomalies.This paper designs and implements a distributed search engine based on ElasticSearch in the video field.First of all,this paper expounds the research background and significance of the project,and analyzes the research status of the search engine at home and abroad.Then,this paper introduces the technology involved in the implementation process of the system.Secondly,the paper expounds the functional and non-functionality of the system.Requirements,summary design,detailed design and implementation,and finally the system was tested and performance analysis. |