| Cohesin,a highly conserved protein complex,is ubiquitously found in eukaryotes and plays a pivotal role in regulating gene expression and shaping genomic topological structures.Cell type-specific chromatin loops mediated by cohesin are believed to be crucial in governing cell type-specific gene expression.Therefore,the investigation of these chromatin loops holds significant importance in comprehending the mechanisms underlying gene expression regulation and the development and maintenance of diverse cell types.In the field of biology,machine learning algorithms offer a cost-effective alternative to traditional biological identification methods.This article establishes a recognition model based on random forest,comparing its performance with various machine learning algorithms.Moreover,it utilizes the 15-state Chrom HMM model to describe chromatin state information,facilitating the exploration of cohesin-mediated specific chromatin loops.To investigate specific chromatin loops between two cell types,the study divided extracted data on 12 types of cohesin-mediated loops into a training set and an independent test set.Subsequently,random forest,K-nearest neighbors,quadratic discriminant analysis,and naive Bayes algorithms were employed for prediction in the training set.Results revealed that the random forest model outperformed other models in the training set.The established random forest model was then applied to the test set.When utilizing only chromatin states as input features,the average AUROC in the test set was 0.844,with a range of 0.720 to 0.911.Additionally,by incorporating key factors(CTCF,RAD21,YY1,H3K27ac)and their frequencies between and within the anchor points of the chromatin loop as features,the model achieved an average AUROC of 0.911 in the test set,ranging from0.850 to 0.960.In the study of specific chromatin loops across multiple cell types,a dataset comprising12 different types of cohesin-mediated specific chromatin loops was divided into training and corresponding test sets for four cell types.The model trained on the training set was then applied to the test sets of the remaining two cell types.When utilizing only chromatin state features,the model achieved AUROC values in the corresponding test sets ranging from 0.918 to 0.960,with an average of 0.937.Upon incorporating the frequencies of key factors as additional features,the model’s AUROC values in the corresponding test sets ranged from 0.919 to 0.971,with an average of 0.951.These findings demonstrate the effectiveness of the model in accurately predicting specific chromatin loops in diverse cell types,and emphasize the importance of incorporating key factor frequencies to enhance predictive performance.The experimental findings indicate that the utilization of chromatin state features enables accurate prediction of cohesin-mediated specific chromatin loops among various cell types.Incorporating key factor frequencies further enhances the predictive accuracy.In the context of studying cohesin-mediated specific chromatin loops,the random forest model demonstrates superior stability and accuracy compared to other machine learning approaches. |