| As network editors are the designer and builder of website content, the quality of theirworks will have a direct impact on the visits and brand building of website. However, anetwork editor considers website construction only from the perspective of draft selection,editing, modifying, writing and so on, not attaching importance to website promotion. Mostnetwork editors begin website content construction not from the perspective of search engineeasy access. However, about70%of the website traffic is from major search engines.Therefore, this reduces website traffic to a large extent and this is not conductive to theimplementation of sale and brand building of website.SEO is the technology to study on improvement of website ranking, which includingthree core aspects: keywords, content and links. Currently, the research on SEO is indevelopment at home and abroad. The research in automatic keywords selection and analysismethods needs to be enriched. So how to achieve automatic keywords selection and realizecompetitive strength analysis on the selected keywords are the core issues this subject faced.After full consideration that SEO is playing an important role in the work of websiteediting as well as aforementioned problems, this thesis mainly studies automatic keywordsselection and analysis methods, and on this basis completes the development of the assistantsoftware for network editors. The main work in this thesis includes:(1) In the keywords selection, this thesis proposes a keywords expansion method.Through analysis on the page structure of search results in search engines, it can be found thatthe listed keywords in “related search†are generated based on the amount of historicalsearches. Moreover, the keywords in “related search†are arranged in graphic structure.Accordingly, large keywords that meet user needs can be collected using backward spidertechnology to travel the “related search†in Baidu.(2) The C4.5decision tree algorithm is applied to the analysis process of keywordscompetition degree. This section mainly analyzes the major factors which impact onkeywords competition degree and create predictive models through data mining of optimizedhistorical data. By assessing the accuracy of the models, the classification rules generated bythe decision tree can be applied to predict the degree of keywords competition. (3) An information collection subsystem is designed based on web page reptile and textextraction technology. This section is mainly divided into two parts. The first part isinformation capture on the industry websites, and it is achieved by web page reptile base onHTTP protocol as well as non-repetition and periodic re-visit strategy. The second part is textinformation extraction on the captured original web pages, and it is realized using thecharacteristics that variance can reflect the degree of data dispersion.(4) The assistant software for network editors aiming at SEO is designed andimplemented based on the study in the aforementioned sections in this thesis. In the earlycreation stage, theme keywords suitable for the creation are analyzed for website editing bythe system. In the creation process, the system conducts real-time SEO factors analysis for thetext. In addition, an information collection subsystem is implemented in the thesis, which willlargely improve the efficiency of web editors in collecting industry information. |