Font Size: a A A

Site-directed Integration By HIV-1 Integrase (S119D)/E2C Fusion Proteins And Prediction Of HIV-1 Integration Sites

Posted on:2012-08-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:K K SuFull Text:PDF
GTID:1114330371956866Subject:Biology
Abstract/Summary:PDF Full Text Request
Our first aim is to construct some fusion proteins of HIV-1 integrase (S119D) and polydactyl zinc-finger protein E2C to perform a targeted integration. The second aim is to establish a support vector machine model to predict the potential integration sites by learning current integration sites obtained by experiments. Methods1. To study the targeted integration, we introduced a mutation S119D to integrase following fusion with a polydactyl zinc-finger protein E2C, which binds to a unique site in human genome. The INS119D/E2C fusion proteins were packaged into virus in trans. After infection, using biotin labeled primers, streptavidin magnetic beads, ligation-mediated PCR and finally 454 sequencing, we obtained the virus integration sites for those fusion protein.2. In order to establish the HIV-1 integration sites prediction model, we collected the most reliable sequences deposited in Genome Survey Sequences database. Based on the support vector machine algorithm, tools like LibSVM, Matlab and some programs written by ourselves were employed to perform the training and prediction of the integration sites. Results1. First, the viral infectivity were not damaged by the integrase packaged in trans. And globally the fusion proteins can direct the integration to certain sites (e2c sequence-like sites). Also the mutation S119D make the choice of local DNA near integration sites less specific.2. By establishing the prediction model of HIV-1 integration, we achieved a 80% accuracy of prediction (AUC 0.8678). And to achieve a decent prediction,500 to 1000 reliable integration sites is enough. The study of location to transcription start sites and the GC content nearby the predicted integration sites showed that the predicted ones have similar characteristics to the ones obtained in experiments. Besides the integration hot spots, we also predict some integration cold spots, which cannot be obtained by the experiments. ConclusionsGlobally, the targeted integration could be achieve by using fusion proteins of integrase and the polydactyl zinc-finger protein E2C. Locally, the mutant S119D in integrase can slightly decrease its specificity of the choice of DNA sequence.Support vector machine method learned the current integration sites well, and gave a good prediction. The establishment of support vector machine can overcome some obstacles in experiment, and provide some good information for the experiment design and verification.
Keywords/Search Tags:HIV-1, Integrase, E2C protein, targeted integration, Support vector machine
PDF Full Text Request
Related items