| In the social living environment,there are a large number of toxic substances that cause great harm to human survival and development.In addition,the toxicity study of compound molecules is also very important for drug development.Safety factors such as compound toxicity and side effects have become the main reasons for the failure of drug development.There are fewer labeled data samples in the compound molecular data,and there are fewer toxic compound molecules,and there is a category imbalance phenomenon,which has an impact on the classification effect.The method of molecular generation can generate new molecular structures to expand the original data,but most of the existing molecular generation methods are randomly generated,and the generation category is not specified,and the effect needs to be improved.Moreover,the existing molecular toxicity classification methods cannot perform good feature extraction for the characteristics of compound molecular data,and at the same time,they do not make full use of unlabeled data,and the accuracy rate has certain room for improvement.First,this article proposes a molecule generation method based on improved ACGAN network.Using the decoder in AE for feature extraction and training ACGAN with the obtained feature vectors can make it focus on optimizing sampling without worrying about SMILES string syntax.Improve ACGAN,use Wasserstein distance to optimize the loss function,solve the mode collapse problem,and optimize its discriminator structure.At the same time,the real unlabeled data is added in the training process to enhance the discriminator’s discrimination ability,so that the unlabeled data is better utilized.Finally,the output of the generator in ACGAN is decoded by AE to get the new molecular structure of the compound.Secondly,a molecular toxicity classification method based on improved CapsNet network is proposed.First,perform data type conversion on the compound molecular data,and then use the improved SSAE to perform better feature extraction on the high-dimensional compound molecular data.Since the sparsity parameter has a certain impact on the classification effect,the PSO algorithm is introduced to obtain the parameters The optimal value.Improve the dynamic routing in the CapsNet network,optimize the parameter settings and update therein,so that CapsNet can achieve better performance and reduce time overhead.Finally,the vector obtained after feature extraction is input into the output category in the improved CapsNet,so as to realize the classification of the molecular toxicity of the compound.Finally,by designing comparative experiments,it is proved that the molecular generation method proposed in this paper performs well and is improved compared with the existing methods.It can generate novel compound molecules.These generated molecular structures can be added to the original data set to perform data on a small number of categories.Data expansion solves the problem of unbalanced data categories.The proposed molecular toxicity classification method has a good classification effect,and a number of indicators have obtained good results.After using the expanded molecular data,the accuracy rate has been further improved. |