| Objective:HIV infection is a chronic viral infectious disease,infections usually have8-10 years of asymptomatic period,identifying whether the newly diagnosed HIV infected person is recent HIV infection is an important issue in condition assessment,prognosis judgment,epidemic surveillance and intervention effect evaluation.The gold standard for the diagnosis of recent HIV infection is to obtain the exact antibody positive time of HIV-infected people through close follow-up of negative high-risk groups.However,this approach cannot be widely used because of its limitations such as time-consuming effort,attrition bias,and behavioural interventions.Serological methods are the most widely used methods for identifying recent HIV infection,and commonly used techniques include BED Capture Enzyme Immunoassay(BED)and Limiting Antigen Avidity Enzyme Immunoassay(LAg-Avidity EIA).BED method has been widely used in various countries,and its specific antibody levels are easily affected by disease progression,and there are inconsistent situations in the mean duration of recent infection(MDRI)between different virus subtypes.The above problems are partially upgraded with LAg-Avidity EIA,and the False Recent Rate(FRR)in chronic HIV infection is also smaller than that of the BED method.However,its sensitivity and specificity are still not ideal.With the global application of HIV genotypic drug resistance detection technology,the Sequence Ambiguity Method(SAM)based on Sanger sequencing has also been used to determine recent HIV infection,but this method is highly susceptible to HIV-1 multiple infection.In addition,CD4+T count and viral load(VL),as the basic data of HIV-infected patients,also have some help in the identification of recent HIV infection,but the acquisition rate of viral load is relatively low due to its relatively high test price.In our team’s previous research,a combined model based on LAg-Avidity EIA and SAM method was established,which improved the sensitivity and specificity compared with LAg-Avidity EIA alone for identitying recent HIV-1 infection,but its applicability in real scenarios still needs to be improved.On this basis,through the systematic study of the acute HIV infection research cohort series of samples,the recent HIV infection identification model is further optimized,the diagnostic efficiency is improved.And according to the integrity of the basic data of HIV infections,a comprehensive judgment process for recent infection is established,and the simulation application of city-scale data is carried out.Methods:1.Study subjects: From 2009 to 2016,the First Affiliated Hospital of China Medical University recruited a research cohort of patients with acute HIV-1 infection.178 plasma samples and laboratory data such as CD4+T count,VL,drug resistance genotype detection sequence and next-generation sequencing sequences of pol gene region(HXB2:2868-3320 base pairs,bp)from 88 acute HIV infections were selected for the optimization of the recent HIV infection identification model.In addition,this study collected the pol gene sequence,LAg-Avidity results,CD4+T count and VL data of 2153 newly diagnosed HIV infections in Shenyang from 2016 to 2019,which were used to simulate the process and effect of recent HIV infection.This study was approved by the Ethics Committee of the First Affiliated Hospital of China Medical University.2.Determination of multiple HIV-1 infection: The quasi-species sequences of viruses obtained by next-generation sequencing sequences of the pol gene region were analyzed.First,the Fast Tree software is used to construct the Maximum Likelihood Tree(ML Tree)for preliminary screening,if the evolutionary tree contains two or more clusters and the posterior probability value is greater than 0.8,it can be preliminarily divided into multiple infections between subtypes or suspected multiple infections within subtypes.If multiple infections within the suspected subtype are suspected,the HIV sequence database(https://www.hiv.lanl.gov/content/sequence/ HIGHLIGHT/highlighter_top.html)online program can be used to visually analyze the evolutionary relationship between nucleotide polymorphisms and viruses,and preliminarily determine that they are multiple infections.Furthermore,we can construct a single phylogenetic tree for the viral sequences obtained from samples at different time points,and if the genetic distance between the two strains exceeded 5%(the minimum genetic distance between the two strains is larger than the average genetic distance of one strain plus 2 times the standard deviation),it is identified to be multiple infection.3.Interpretation of HIV-1 pol gene mixed bases: For patients identified to be infected with a single strain by multiple infection screening,the Pol gene sequence(HXB2:2253-3278 base pairs)obtained by drug-resistant genotype detection was interpreted by mixed bases.Phylogenetic tree analysis was used,combined with reference sequences in the Los Alamos database,to exclude laboratory contamination.Sequencher 5.4 software was used to interpret mixed bases according to IUPAC rules.Each base position peak map satisfies any of the following conditions and is interpreted as a mixed base: the forward and reverse sequences clearly contain secondary peaks,and the secondary peak height exceeds twice the local noise peak height;Only the unidirectional sequence fragment contains a secondary peak,and the secondary peak height is more than 30% of the main peak height,and it is triple the local noise peak height.A sequence fragment contains a secondary peak and is more than 30% of the height of the primary peak,and the reverse sequence fragment also verifies the existence of this peak level.Use EXCEL to calculate the number and proportion of mixed bases in each sequence.Proportion of Ambiguous Base(PAB)= number of mixed bases/1025*100%.4.LAg-Avidity EIA: Using the LAg-Avidity EIA kit(Maxim Biomedical,Inc.,USA)to obtain the Optical Density(OD)value.And then standardizing the original OD value to obtain the Normalized Optical Density(ODn),which means the original OD value divided by the median OD value of the standard.At the initial screening experiment ODn≤2.0,3 confirmatory experiments were performed,and ODn≤1.5 was considered as a recent HIV infection.5.Statistical analysis: Logistic regression was used to construct a joint model containing different parameter combinations of PAB,ODn,CD4+T and VL,so as to distinguish between recent HIV infection and long-term infection.The Receiver Operator Characteristic Curve(ROC curve)was used to analyze the area under the ROC curve(AUC),sensitivity,specificity and FRR of different models to evaluate the predictive performance of different models.The chi-square test is used to compare the recent infection rate of different methods of identifying recent infection.P<0.05 is considered statistically significant.Missing data were excluded from the analysis.All statistical analyses were performed using SPSS software(version 26.0).Results:1.Basic information of the research subjectsAs of September 2016,a total of 178 time-point plasma samples from 88 patients with acute HIV infection were included in this study,and the corresponding affinity detection ODn value,VL,CD4+T count.The median ODn was 2.14(interquartile range,IQR,1.06,2.79),the median VL was 4.38 log10 copies/m L(IQR,3.86,4.80 log10 copies/m L),and the median CD4+T count was 441 cells/μL(IQR,319,587 cells/μL).After excluding multiple infections by next-generation sequencing technology,a total of 42 pol gene sequences from infections with single strain were obtained,including 25(59.5%)CRF01_AE subtypes,9(21.4%)B subtypes and 8(19%)CRF07_BC subtypes.The detection data of 178 follow-up samples were used to establish a multiparameter model not containing SAM method,and 42 pol gene region sequences and detection data were used to establish a multiparameter model containing SAM method.2.Obtain the SAM method to distinguish the PAB cut-off value of the recent HIV infectionThe pol gene sequence of 42 samples infected with a single strain was obtained in this study,with a median PAB value of 0.48%(IQR,0.19%,1.12%).The samples of 42 single strains were divided into two groups with infection time > 1 year and ≤ 1 years,and the interpretation results were marked as 0 and 1,respectively,and the ROC curve analysis was introduced into PAB.The AUC was 0.832(95% confidence interval,95%CI: 0.687-0.977,P<0.001).When the Jordon index is the largest,the sensitivity and specificity are 88.9% and 66.7%,respectively,and the corresponding PAB threshold is0.81%.That is,when the PAB≤0.81%,a multiparameter model including PAB parameters can be used to identify recent HIV infection.3.Construction of logistic regression model and establishment of discriminant method of composite model of recent HIV infection3.1 LAg-Avidity EIA identifies the likelihood of recent HIV infection within 1 year: In this study,the ODn value and infection time of 178 plasma samples included were analyzed and found that within one year of infection,the ODn value showed a linear upward trend with the extension of infection time(P<0.05),an inflection point occurred at about 1 year,and the growth showed a nonlinear upward trend after 1 year.Therefore,this study tried to extend the detection time of recent HIV infection from six months to 1year.The 178 plasma samples were divided into two groups,infection time > 1 year and≤ 1 year,and ODn values were introduced for ROC curve analysis.The AUC was 0.901(95% CI: 0.857-0.944,P<0.001).When the Jordon index was the largest,the sensitivity and specificity were 74.2% and 90.1%,the FRR was 9.9%,the false negative rate was25.8%,and the corresponding ODn threshold was 1.995.We also defined the time of recent HIV infection as within six months according to the kit instructions.The AUC was0.890(95% CI: 0.842-0.938,P<0.001),and when the Jordon index was maximized,the sensitivity and specificity were 87.5% and 79.5%,respectively,FRR was 20.5%,and the corresponding ODn threshold was 1.69.By comparison,we found that if recent HIV infection was set within 1 year,AUC was larger and FRR was lower.Therefore,when there are only ODn parameters,the ODn threshold of 1.995 is applied in this study to determine whether the infected person is recent HIV infection(infection time ≤ 365days).3.2 Construction of binary logistic regression model for identifying recent HIV infection using different data combinations: In real-world application scenarios,it is often necessary to identify whether a newly diagnosed HIV-infected person is recent HIV infection when various test information is often incomplete.In this study,four parameters,PAB,ODn value,CD4+T count and VL,were combined as covariates,and five binary logistic regression models were constructed with recent infection(infection time ≤ 365days)as the dependent variable to meet the identification needs under different data conditions.Among them,model A/B/C contains PAB parameters,while model a/b does not contain PAB parameters.The parameters introduced by each model are Model A:PAB,ODn,CD4+T,VL,Model B: PAB,ODn,CD4+T,Model C: PAB,ODn,Model a:ODn,CD4+T,VL,Model b: ODn,CD4+T.The AUCs corresponding to each model were0.973,0.946,0.938,0.900,0.900.When the Jordon index is maximum,the corresponding sensitivity,specificity,and logit(P)thresholds are Model A(100%,93.3%,-0.174),Model B(88.9%,93.3%,0.211),Model C(92.6%,86.7%,0.031),Model a(72.2%,91.4%,0.871),Model b(72.2%,91.4%,0.879),respectively.When there are only ODn parameters,the Model c(74.2%,90.1%,1.995)built in result 3.1 is selected to identify recent HIV infection.3.3 Establishment of discrimination process of recent HIV infection: In this study,a composite model discriminant method was established.The detection process is as follows,if the newly diagnosed infections has a genotype resistance sequence and the PAB≤0.81%,the model A/B/C can be selected for identification according to the presence or absence of CD4 or VL parameters.If PAB>0.81% or no genotypic resistance sequence,the model a/b/c can be selected for identification according to the presence or absence of CD4 or VL parameters.4.Application effect of multi-parameter joint modelIn this study,the information of 2153 newly diagnosed HIV infections in Shenyang from2016 to 2019 was selected to identify the application effect of multiparameter joint model,including 670 cases in 2016,374 cases in 2017,418 cases in 2018,and 691 cases in 2019.The use of composite model discriminant versus LAg-Avidity EIA alone was compared for the determination of recent infection in different years.From 2016 to 2019,the number of recent infections identified by composite model discriminant method,with1.5 as the LAg cut-off value and 1.995 as the cut-off value,respectively,was 2016: 377(56%)vs.226(34%)vs.295(44%),2017: 244(65%)vs.128(34%)vs.194(52%),2018:250(60%)vs.168(40%)vs.198(47%),2019: 320(46%)vs.196(28%)vs.232(34%).The proportion of recent infections identified by composite model discriminant method was significantly higher than that of recent infections with LAg cut-off value of 1.5(P<0.001).Conclusions:1.SAM is affected by multiple infection when applied to the identification of recent infection,and the case suitable for the composite discrimination process containing PAB parameters can be screened by setting the cut-off value of PAB,so that the identification model of recent infection can be more accurate.2.The construction of a binary logistic joint model of recent HIV infection and the establishment of discrimination process for recent HIV infection improved the diagnostic efficiency of recent HIV infection.3.The discriminant process of the composite model of recent HIV infection established in this study may reveal more recent HIV infections. |