| Drug design is emerging with the development of medicinal chemistry, which is mainly used in the structure based drug discovery and lead compound optimization. Virtual screening is a method of drug design, which is the extension of innovative drug research, and is also the new methods and technologies for drug design. Virtual screening is using of computing, three-dimensional pharmacophore model or molecular docking approach to find potential active compound from database. Throughout the screening process, the numbers of needed ligands are often several millions. With the development of computational science and computational methods and computer performance, the speed of virtual screening is also continuously improved. At the same time, access to the compounds database has become a major constraining factor for virtual screening. Therefore, to design a database management tool for large-scale virtual screening can effectively improve the speed and efficiency of drug discovery.The number of small molecules in database for virtual screening is often enormous. Normally, a molecular database contains tens of millions of molecules. In addition, the complexity of compounds'storage structures has also become a main factor of the huge amount of database. The traditional data centralized approach is become the bottleneck of the whole system, because of its poor scalability and low reliability. At this time, processing data distributively become inevitable. This paper was carried out because of above issues. The paper first explains the background of the research. Then, the typical chemical database and chemical database management tools are also introduced. Thirdly, the paper studies the compound data processing techniques, including molecular property calculation, format conversion, molecular editing and 2D/3D structure display. Finally, the paper focuses on distributed database technology and distributed database agent framework Amoeba, and on this basis, design and implementation of the compounds retrieval, sub-structure queries, build and download data subsets are given. |