Font Size: a A A

Research Of Mining Algorithm For Novel Enzyme

Posted on:2020-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y PanFull Text:PDF
GTID:2370330596976509Subject:Engineering
Abstract/Summary:PDF Full Text Request
Halohydrin dehalogenase is an important class of proteins that not only catalyze the degradation of toxic contaminants through ring-opening reactions,but also produce highvalue pharmaceutical intermediates as nucleophiles.Halohydrin dehalogenases is extremely rare in nature and the expression of perhalo alcohol dehalogenase activity is only found in few strains.Existing biological experiments for the formation of haloalcohol dehalogenases is feasible but costly and inefficient.Therefore,it is particularly urgent to efficiently extract more novel halo alcohol dehalogenases based on the determined halohydrin dehalogenases and enrich the existing halogen alcohol dehalogenase datasets.In the meantime,the deep generative model has achieved amazing results in image processing,speech recognition and text generation,but there are still big gaps in the application of biological sequences generation.According the problem mentioned above,in this thesis,a new approach is put forward by using the deep generation models to generate new halohydrin alcohol dehalogenase and using the discriminative model for the halohydrin dehalogenase mining.In order to generate the novel halohydrin dehalogenases,we first constructed the halohydrin dehalogenases dataset,then recoganized the motif of the halohydrin dehalogenase,and then the deep generative model was apply to generate the novel halohydrin dehalogenase sequences.Finally,a classifier was used to find the ‘true' halohydrin dehalogenase sequences.The details are as follows:1)Proposed a motif recognition of the protein sequence which considering the discrimiantion of the motif.The MEME algorithm was used to identify the motif contained in the positive sample;then the discriminative features(MSC,MOR and MRE)of the identified short motif were calculated and finally filtered the motifs set.2)Apply the deep generative model to generate the halohydrin dehalogenase sequence.In this paper,the LSTM neural network was used to generate the halohydrin dehalogenase sequences first.The sequences generated by LSTM have a poor diversity and the sequence length generated was too short.The SeqGAN model was then apply to the generation task and a feedback loop architecture were added to optimize the synthetic protein sequences.Although the model preforms better than LSTM in sequence diversity,the sequence length still has the problem of low mean and excessive standard deviation.In order to solve this problem,this thesis implements the LeakGAN model with a feedback-loop architecture.The results demonstrate that the LeakGAN model with feedback-loop is efficient and effective when generating biologically significant halohydrin dehalogenase sequences.3)The g-gap feature tree was constructed to ensure the extraction of features,and then the features were selected and then discretized.Finally,the Multinomial Naive Bayes model was used to predict the halogen-alcohol dehalogenase generated by the LeakGAN model with feedback loop.
Keywords/Search Tags:Halohydrin dehalogenase, Motif recognition, Sequence generation, Sequence prediction
PDF Full Text Request
Related items