| In recent years,with the rapid improvement of computer computing power,various intelligent algorithms have also developed rapidly,and artificial intelligence technology is being widely used in various fields.In the field of drug research and development,with the continuous accumulation of experimental data,artificial intelligence technology represented by deep learning(DL)has made great progress,and is gradually being applied to multiple directions of new drug research and development,becoming the driving force for innovative drug research.new impetus for development.De novo drug design is an important part of drug development,which enables the use of molecular generation techniques to generate active molecules with specific physicochemical and pharmacological properties.In recent years,how to use DL to generate the desired drug molecules has become a hot spot in new drug development,which has attracted the attention of the academic community and pharmaceutical companies.Although DL-based molecular generation methods have made great progress,there are still some shortcomings.In this dissertation,related methodological researches are carried out on the current problems in the field of molecular generation.The main research contents and results are as follows.(1)In order to solve the problem of low success rate and novelty in multi-objective molecular generation,a multi-constraint molecular generation(MCMG)model is proposed.MCMG takes the conditional Transformer as the main skeleton,and adjusts the probability distribution of the model through knowledge distillation and reinforcement learning.The experimental results show that on two multi-objective tasks,the real success rates of MCMG generating molecules are 89.2% and 70.9%,respectively,which are 16.4%and 19.2% higher than the existing models.At the same time,the diversity of the generated molecules is also improved.(2)In order to solve the problem that the active label data of molecules is scarce and the labeling of active labels is expensive,a fragment-based multi-target molecule generation model called Frag-G/M is proposed.The model first uses the conditional Transformer to train under the conditions of physical and chemical properties,and generates molecules with desired physical and chemical properties;then the molecules are fragmented,and the fragmented molecules are input into the recurrent neural networkbased fragment generation model for training;finally,we use reinforcement learning for fine-tuning to obtain molecules with multiple desired properties.The experimental results show that the number of used active labels is reduced by nearly 50% compared with existing models,which is expected to be used in real drug discovery scenarios.(3)In view of the low synthesizable rate of generated molecules,a new molecule generation algorithm ChemistGA and its improved algorithm R-ChemistGA are proposed by combining traditional heuristic algorithms with DL algorithms.ChemistGA replaces traditional crossing operations with chemical reaction crossing,and innovatively designs backcrossing strategies to increase the diversity of generated molecules.The experimental results show that in the two multi-objective tasks,the synthesizable rates of generated molecules are 22.5% and 47.8% higher than the existing genetic algorithm-based methods.Compared with DL-based methods,the novelty of generating molecules is improved by67.2% and 37.2%.(4)In order to overcome the low accuracy in predicting atomic charges of generated molecules,atom-path descriptor(APD)algorithm is proposed for the extraction of threedimensional atomic features,and APD-based atomic charge prediction models are developed.Experimental results show that the prediction error is reduced by 41%compared to existing methods.Furthermore,in order to reduce the dependence of atomic charge prediction methods on artificial predefined atomic properties,a graph neural network-based charge prediction model Deep Atomic Charge is designed.The model can dynamically learn representations between atoms without artificially defining input features,the average size of the trained model is only 1/300 of existing models,and the average prediction error of atomic charges is reduced by about 30%.The molecular generation and atomic charge prediction methods proposed in this paper provide new methods for drug design. |