| Proteins fulfill their functions mainly by binding to ligands,protein-ligand interactions play fundamental roles in biological activities.Many proteins that don’t display global sequence or structure similarities can bind to ligands sharing same or similar chemical fragments or functional groups(FGs)via conserved 3D binding motifs.The deciphering of conserved 3D protein-ligand binding patterns at the basis of FGs shared by a variety of small molecules can not only provide important insights into biorecognition but also greatly benefit drug discovery and development.Currently methods of predicting conserved binding motifs for the FGs of ligands are two categories:the first category is by the effort of the experts who could make a detailed analysis for specific kinds of FGs shared by a variety of small ligands by comparing multiple binding sites in protein structures;the second category is leveraging the different rules to split ligands into chemical fragments(or FGs),further discovering the conserved interaction patterns for the ligand-derived FGs across a lot of proteinligand complexes using statistical models.However,the above approaches have some limitations:the first one is the limited applicability as a result of few types of available target FGs.Although conserved binding patterns for a few commonly used FGs have been reported in the literature,large-scale identification and evaluation of FG-based 3D binding motifs are still lacking.Moreover,the limited number of available determined protein crystal structures in complex with ligands containing certain types of FGs has a serious impact on the quality of conserved binding motifs for FGs;the second one is the heavily dependency on the predefined splitting schemes which solely pay attention to the chemical composition of a ligand,leading to undervalue the effects of the whole ligand binding pocket shape on the FGs’ geometrical conformation.Also,the similarity standards should be carefully evaluated because of the extensive usage of similarity measures in the process of deriving conserved binding motifs.In order to address all the issues mentioned above,we propose a new computational method AFTME(Automatic FG-based Three-dimensional Motif Extractor)for automatic mapping of 3D motifs to different FGs of a specific ligand.AFTME enables the splitting one ligand into FGs and matching the binding motifs to corresponding FGs in one step,linking the fragment chemical space to binding environments in proteins.Comparing to the existing methods,AFTME was developed independently of both predefined cleavage schemes and of similarity measures,which makes it can be applied on large scales and easily integrated into machine learning workflows.Applying our method AFTME to 233 nature-existing ligands,we defined 481 FGbinding motifs that are highly conserved across different ligand-binding pockets,we showed that our FG-motif map can be used to nominate FGs that potentially bind to specific drug targets.Systematic analysis further revealed four main classes of binding motifs corresponding to distinct sets of FGs.Combinations of FG-binding motifs facilitate proteins to bind to a wide spectrum of ligands with various binding affinities.Finally,the FG-motif can not only provide useful insights and guidance for rational design of small molecule drugs,but also facilitate a better understanding of proteinligand recognition. |