| Zeolitic imidazolate frameworks(ZIFs)are a class of metal-organic frameworks(MOFs)materials with zeolite topology.ZIFs materials have excellent properties such as very high specific surface area,exceptional structural diversity,customisable organic linkers and tunable pore structure,as well as relatively high thermal and chemical stability.It is regarded as a potential material for gas adsorption and separation applications under practical industrial conditions.However,facing a variety of adsorption and separation applications,high-performance ZIFs materials are still limited.The development of more high-performance ZIFs for specific applications is the target of many researchers.Efficient computational methods and extensive structural databases are the basis for the development of new materials for specific applications.A large number of studies have reported the great potential of the combined approach of model-building and machine learning(ML)screening techniques in the design and exploration of new materials.ML methods can assist us in quickly evaluating the performance of structures and efficiently identifying high-performance structures with application potential in databases.However,ML models need to be adequately trained and learned from a large number of high-performance structures in order to have a good prediction ability for them.Since the number of high-performance structures in the database is usually very small,traditional ML methods often need to compute the properties of most of the structures in the database to ensure that the training data contains a sufficient number of high-performance structures,which leads to the problems of high training cost and low computational efficiency.When faced with large databases containing hundreds of thousands of structures,traditional ML methods are unable to achieve rapid screening and identification of high-performance structures in databases due to the problems of excessive computational quantity and computational cost.To efficiently and accurately screen high-performance structures from massive data,it is essential to develop an approach that can inexpensively develop ML models for efficient databases screening.Using an efficient ML method to evaluate the performance of new structures in the database and to screen them can help us to quickly identify materials with potential applications and provide a wealth of targets for experimental synthesis.Nevertheless,the size of the existing ZIFs structure database is relatively small and suffers from the problems of low topological diversity and inferior structure functionalisation,which cannot meet the increasing application demands.Therefore,the construction of a large number of topology-rich and structure-diverse ZIFs structures is of great significance for the discovery of high-performance materials.It is imperative to develop a large-scale database containing rich ZIFs structures.In order to solve the above problems,this work develops a new pattern for the ML model construction,constructs a topology-rich ZIFs structure database,and successfully employs the developed new pattern to screen a large number of new materials with excellent ethane/ethylene separation performance from the ZIFs database.The main results are summarized as follows:1.Aiming at the problems of high training cost,low computational efficiency and the inability of efficiently searching large databases of traditional ML methods,we propose a new general pattern for efficient ML model construction-the "iterative boosting" model.By continuously introducing the high-performance structures identified by the ML model into the training data for model iteration and boosting,the pattern can develop an ML model that can quickly identify all the high-performance structures in large databases with very low computational cost.The validity and reliability of the pattern is well demonstrated in the efficient screening of carbon capture materials.Using this pattern,we construct the ML model with high predictive power at low computational cost.Almost all high-performance structures in the database can be obtained by performing property calculations on the top 5% of target structures predicted by the ML model.The use of this pattern reduces the computational effort for ML model building and filtering of high-performance structures in the database by an order of magnitude,making it feasible to filter large search spaces that are difficult to handle.This pattern provides a new solution for fast identification of high-performance structures in large databases.2.Targeting the problems of the existing ZIFs structure database,such as low structural topology diversity,poor structural functionalization and small database scale,we construct a database containing more than 600 thousand hypothetical ZIFs structures by utilizing more than 130 thousand kinds of topologies and 5 kinds of organic linkers containing functional groups with different affinities.The database contains structures with abundant topologies and functional groups,and the scale of the data is enormous,effectively compensating for the shortcomings of the lack of structural diversity and data richness of the existing ZIFs.We write software for highthroughput construction of ZIFs structures,software for batch optimization of structures and extraction of charges based on density functional theory,and software for batch optimization of structures using molecular mechanics,and successfully apply them to the construction and optimisation of hundreds of thousands of ZIFs structures.The development of these software not only makes large-scale structure construction and optimization tasks feasible,but also improves the efficiency of researchers in constructing and exploring structures such as ZIFs.Furthermore,we also analyze the properties of ZIFs with different topologies and organic linkers in our database and summarize the effects of different functional groups on the structures,providing guidance for the theoretical design and experimental synthesis of new materials.3.In response to the lack of high-performance ethane-selective adsorbents for industrial applications of ethane/ethylene separation,we use the "iterative boosting" pattern to develop an ML model for predicting the applied adsorption performance scores of ethane/ethylene separation using a very small amount of data.Finally,186high-performance ZIFs structures with good application potential were identified from136,670 ZIFs structures.Among them,12 structures have better performance than all already-realised ZIFs materials,and 6 structures have better performance than all MOFs materials without open metal sites,which provide abundant targets for the experimental synthesis of novel ethane-selective adsorbents.In addition,we develop the ML model for predicting ethane adsorption capacity using the "iterative boosting" pattern and identify several ZIFs with excellent ethane adsorption capacity through highthroughput screening.The successful application of the "iterative boosting" pattern in the screening of high-performance ethane/ethylene separators demonstrates the universality of the pattern,and it can be used to achieve satisfactory goals with minimal computational resources. |