| With the development of Internet technology,the number of website news shows a rapid growth trend,but the overall page view number of the website often does not match the growth,which is largely because the content and release of the news articles are not targeted and cannot be Get the favor of netizens,so you can't bring the corresponding traffic.Based on the above background,this thesis attempts to establish a historical content and access association model by means of the website historical database and historical access data,and analyzes the new news content features,and compares and matches with the historical content features,according to the matching data results,Use time series analysis method to predict the page view number of the content.Such predictions can provide a reference for the website staff.If the website editor can estimate the possible visits number of the news before the news is published,the revision and optimization of the news content will have a clearer direction and goal,and the release effect will be better.If the website technician can based on the editor's work and evaluate the traffic trend of the website in advance,the basis for deploying the storage and bandwidth resources of the website will be more sufficient,and the optimization of the website service configuration in advance can also be targeted.Referring to the software engineering system and method,combined with the author's engineering practice,this thesis describes the following:In the extraction and filtering of Internet news samples,it is necessary to analyze a large number of historical news and historical access data,filter and clean the data,and store them in a dedicated database.Since the amount of raw data is too large,which is not conducive to rapid analysis,the statistical sampling principle is used for sampling,and a sample library that effectively reflects the overall characteristics and is refined for searching and use is established.Design the database model,and each historical sample is separately text classified and keyword extraction,and stored the result as a foreign key of the sample,which is convenient for retrieval and comparison.According to the characteristics of the system,a targeted database engine is selected,and an efficient cache layer is built on the upper layer of the database to improve the speed of retrieval.Text classification and keyword extraction are performed before the news page view prediction,and the results are matched with the sample library respectively.According to the matching progress of sample data,the time series analysis method is used for prediction,and the system interface is designed to display the prediction results and related information quickly and concisely.According to the above objectives and requirements analysis,the author follows the international HTML/CSS standard,selects PHP,Python,JavaScript and other programming languages,customizes the configuration of Bootstrap,CodeIgniter and other frameworks,develops and implements the system,and achieves the demand goal. |