| With the website's maintenance and development, in particular, the link maintenance and deletion of web pages, more and more isolated web files (IWF) are generated; these isolated files exist in WEB service directories, with complete and effective URL addresses. In general, these files can't be accessed by the normal hyperlink between web-pages or the website search results of non-disk file traversal, and only by entering the exact URL address for a visit.The isolated files not only waste the server's storage space, but may lead to breach of confidence, copyright disputes, invasions of privacy or other problems. Unexpected consequences may be caused if the misinformation provided by the isolated files is used to provide a direction for practice; isolated Trojan-horse files, bad information and others pose threat to the data and service, and may have an adverse impact on the community.Isolated files are judged by the following criteria:the HTML source code of any static page and source database of the website does not contain reference of the file; the files are Relative Isolated Web Files (RIWF) when WEB access log file contains their effective reference, otherwise, Absolute Isolated Web Files (AIWF).When isolated files are to be judged, the release directory is traversed first, obtaining a list of all the files and saving the list in the database; then the HTML source codes and database records of the static pages on the server are traversed, and all the URL are stored in the database; and then the WEB files list and URL records are compared; the WEB files unfound in the URL records are Isolated files. At last, the isolated files and WEB log are to be compared. If the isolated file has its access record in the log, it's a Relative Isolated Web File (RIWF), otherwise, an Absolute Isolated Web File (AIWF). The web page source code needs to be analyzed by regular expression, therefore, regular expression adequate for is URL is constructed and used in this study.Based on the above theory, experiments were carried out to identify the isolated files in the Microsoft's IIS server WEB environment. The most complex part of the solution has been solved, and the desired effect achieved, with the Microsoft.net development tools. The paper also touches how to avoid the formation of isolated web files and various of problems which may be met when the IWF are handled. |