Font Size: a A A

Research On Molecular Virtual Screening Algorithm Based On Block Structure And BERT Algorithm

Posted on:2024-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:H Z LiFull Text:PDF
GTID:2530307058476154Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Molecular virtual screening is a technique that uses computer simulations to screen a library of compounds for potentially active molecules.Its research background stems from the needs of modern life science and drug research.Traditional drug research requires a lot of time and money to screen biologically active compounds through processes such as compound synthesis and laboratory testing,which are very time-consuming and expensive.In contrast,with the development of computer technology and bioinformatics,molecular virtual screening techniques can rapidly screen biologically active compounds from compound libraries,which is of great importance in accelerating the drug discovery process.Molecular virtual screening technology is usually based on computer simulation of molecular structures,and combined with big data technology and machine learning methods,it can predict the biological activity and efficacy of compounds by calculating the interactions between molecules,thus providing assistance in the design and development of new drugs.Currently,molecular virtual screening technology has been widely used in drug research,bringing new ideas and methods to drug research.As the complexity and task volume of molecular virtual screening tasks are increasing,the use of machine learning models to map the relationship between molecular structure and target properties can no longer meet the current needs,and many deep learning models are beginning to take advantage of their own strengths in related tasks.The current experiments related to molecular virtual screening face the following improvement goals:(1)to efficiently utilize molecular information and fully utilize the key information in molecular virtual screening,while avoiding the occurrence of overfitting situations.(2)Reduce redundant information in molecular virtual screening experiments,and reduce the impact of redundant information on experimental results.(3)To improve the speed of molecular virtual screening experiments on large datasets and enhance the adaptation ability of deep learning models to complex datasets.To address the above issues,two deep learning models are designed in this thesis based on the current study.In this thesis,a total of eight publicly available datasets are used for molecular property classification prediction experiments.The two models designed in this thesis achieved a2%-7% improvement in most of the experiments compared with the existing models involved in the comparison,and the ability of the models to classify molecules was significantly improved.The main work of this thesis is as follows.(1)Among the models that have been proposed so far,the traditional attention mechanism used by GAT is unable to focus on the relationships among global information involved in information aggregation,and the atom-centered messaging method it uses,molecular information will be collected in a continuous cycle,which increases redundant information and reduces the efficiency of molecular information utilization;SAMPN employs a self-attention mechanism only in the aggregation operation of information between individual nodes of the molecule and ignores the information aggregation operation of individual nodes of the molecule.In order to reduce the molecular redundant information,this thesis adopts the message delivery method centered on the directional bond,whose unidirectional delivery of molecular information can avoid the duplicate collection of molecular information.In order to make the key information in the molecular information play a greater role in the experiment and improve the ability of the model to use the molecular information for molecular property prediction,this thesis incorporates the self-attention mechanism into the operation of aggregating molecular information,especially for the directional bond-centered message delivery method,and this thesis adds the self-attention mechanism to the single node aggregation operation,which can improve the molecular information by adjusting the weight share of different parts of the molecular information.In this thesis,we add a self-attention mechanism to the single node aggregation operation,which can improve the classification performance of the model by adjusting the weight share of different parts of molecular information,especially increasing the share of key information.This model is used for classification experiments on eight publicly available datasets such as Tox21,and the experimental results are compared with other methods to demonstrate the superiority of this method compared with other methods and the effectiveness of the network design.(2)BGNN adopts the block structure of graph neural network for model design in the molecular virtual screening experiments,but it only adopts simple superposition and splicing in the message transfer process to transfer and aggregate molecular information;the set2 set operation used in the molecular information readout operation can also be further improved.Because the neural network block can increase the depth of the model and improve the capability of the model,the model in this thesis adopts the graph neural network block structure as the basis.In the design of the neural network block structure,the model adds the residual connection operation and adopts the edge-level batch normalization and node-level batch normalization respectively according to the molecular information aggregation operation.These operations are used to alleviate the overfitting problem that may occur due to the complexity of the model and to maintain the stability of the molecular information distribution during the information collection process.In order to make the model extract richer molecular feature information,the model in this thesis adopts a multi-head attention mechanism in the message-passing process and molecular information aggregation process,and a BERT module in the molecular information readout module,which are all three types of improvements using multiple attention heads to focus on molecular information from different directed chemical bonds and different nodes.Compared with a single attention head,multiple attention heads can focus on different molecular features using different attention heads,which improves the feature information extraction capability of the model.The model is used for classification experiments on eight publicly available datasets such as Tox21,and the effectiveness of this framework design is demonstrated by comparing the experimental results with other methods.
Keywords/Search Tags:Molecular virtual screening, Deep learning, Graph convolutional neural network, Multi-head attention mechanism
PDF Full Text Request
Related items