| With the rapid development of machine learning(ML)in materials science,the materials development paradigm is transforming into a more efficient,faster and more flexible data-driven paradigm.Unlike traditional trial-and-error methods,ML methods can quickly predict the properties of new materials by building predictive models using existing data.In materials science,ML methods have been widely used to accelerate materials simulation and discovery.Combined with first principles approaches,this paradigm not only ensures high accuracy,but also significantly reduces computational time,thus greatly improving the efficiency and feasibility of materials research.Even with such success,a number of challenges remain.(1)Material data:The quality and quantity of data is constantly challenged due to difficulties and limitations in the acquisition and processing of first-principles computational and experimental data.For example,due to the diversity of material structures and chemical components,there is a high degree of imbalance in the dataset,resulting in trained models that may be too biased towards particular structural types.In addition,material data is also subject to quality issues,such as containing incorrect lattice parameters,chemical compositions,etc.,which can affect the performance of ML models.(2)ML Algorithms:ML models need to consider both robustness and interpretability in materials computation.For robustness,the data set is often high-dimensional and noisy,so the generalization ability of the model needs to be guaranteed.For interpretability,scientists need to understand how the model works in order to further optimize the material design.Therefore,how to build robust and interpretable models and how to strike a balance between performance and interpretability is an important challenge for ML in materials science.(3)Descriptors:Descriptors are the key to translating the physical and chemical properties of materials into mathematics that can be processed by ML algorithms.The selection and design of appropriate descriptors can improve the predictive performance and interpretation of a model.Therefore,how to select and design suitable descriptors and how to combine different types of descriptors effectively is an important challenge.In this thesis,the three aforementioned problems in ML-aided materials design and simulation are explored in terms of photovoltaic perovskite materials discovery,two-dimensional materials discovery and crystal structure prediction,respectively.The main research content and conclusions of this thesis are summarized as follows:(1)ML-aided discovery of single-property and single-prototype——hybrid organic-inorganic perovskites for solar cell.Hybrid organic-inorganic perovskites(HOIPs)have emerged as a leading contender among various types of solar cells,with the photoelectric conversion efficiency rapidly increasing from 3.8%to 22.1%in just a few years,attracting widespread attention from researchers.However,a series of HOIPs,like MAPbI3,are toxic and prone to degradation in the environment,which is not conducive to practical applications.To address this issue,we developed a target-driven method to predict undiscovered HOIPs for photovoltaics.This strategy,combining machine learning techniques and density functional theory calculations,aimed to quickly screen the HOIPs based on bandgap and solve the problems of toxicity and poor environmental stability in HOIPs.Successfully,six orthorhombic lead-free HOIPs with proper bandgap for solar cells and room temperature thermal stability were screened out from 5158 unexplored HOIPs and two of them stand out with direct bandgaps in the visible region and excellent environmental stability.Essentially,a close structure-property relationship mapping the HOIPs bandgap was established.We have developed a ML-aided materials discovery framework by combining feature engineering and ML algorithms.This novel framework is applied to complex HOIPs for the first time.Our method can achieve high accuracy in a flash and be applicable to a broad class of functional material design.(2)ML-aided discovery of multi-properties and single-prototype——inorganic ferroelectric perovskites for solar cell.The discovery of the bulk photovoltaic effect in ferroelectric materials provides a new way to overcome the disadvantage of the wide band gap of inorganic chalcogenides.The study proves that the ferroelectric polarization in the system contributes to the separation of photogenerated carriers.Here,a multistep screening scheme was developed by combining high-throughput calculations and ML techniques.Successfully,151 promising stable ferroelectric photovoltaic(FPV)perovskites with proper bandgap were screened out from 19 841 candidate compositions.Two new descriptors were proposed to describe mixed inorganic perovskites’ formability through ML feature engineering.Additionally,phase-transition energy difference was used as a criterion for directly judging whether the compound can expose spontaneous polarization.The ML prediction accuracy of both energy difference and bandgap regressions was over 90% and ML produces comparable results to density functional theory calculations.Moreover,bandgaps of eight selected FPV perovskites were all close to the optimal value of single-junction solar cells.This scheme not only realizes the ML acceleration for targeted multi-properties material design and expansion of materials database,but also opens a way for descriptor development.(3)ML-aided discovery of multi-properties and multi-prototypes——two-dimensional ferromagnetic semiconductors/half-metals/metals.Two-dimensional(2D)ferromagnetic(FM)semiconductors/half-metals/metals are the key materials toward next-generation spintronic devices.However,such materials are still rather rare and the material search space is too large to explore exhaustively.Here,an adaptive framework to accelerate the discovery of 2D intrinsic FM materials was developed,by combining advanced ML techniques with high-throughput density functional theory calculations.Successfully,about 90 intrinsic FM materials with desirable bandgap and excellent thermodynamic stability were screened out and a database containing 1459 2D magnetic materials was set up.To improve the performance of ML models on small-scale datasets like diverse 2D materials,a crystal graph multilayer descriptor using the elemental property was proposed,with which ML models achieved prediction accuracy over 90%on thermodynamic stability,magnetism,and bandgap.This study not only provides dozens of compelling FM candidates for future spintronics,but also paves a feasible route for ML-based rapid screening of diverse structures and/or complex properties.Our approach breaks the shackles of traditional methods in the design of magnetic materials.(4)Active-learning-aided discovery of two-dimensional ferromagnets with high Curie temperature.ML techniques have accelerated the discovery of new materials.However,challenges such as data scarcity,representations without deep physical insights,and uninterpretable models restrict the widespread ML applications in complex systems.Herein,in order to obtain optimal 2D FM materials,we developed an adaptive ML framework to search the chemical space containing over 2×105 candidates.Two key technique breakthroughs drive the progress.(1)An iterative feedback loop method to generate data on-the-fly is proposed.(2)An adaptive representation set,coupling with magnetism,crystal field theory,and atomic environments,is built.Consequently,ML models achieve a prediction accuracy of over 90%on the key FM properties.After screening,we finally found 9622 2D FM candidates from the chemical space.Among them,722 compounds have a ferromagnetic-antiferromagnetic energy difference greater than 0.5 eV per cell,and these materials have great potential for high Curie temperature.We discovered a class of room-temperature 2D FM semiconductors,CrMX2(M=Ga,In;X=S,Se,Te),from them.Furthermore,the“black box”of ML models is opened and general design principles are extracted.Our framework offers an easy way to facilitate efficient search of chemical space with regard to data scarcity and enables the model interpretability.(5)The robustness of uncertainty estimates in ensembles of neural network potentials.Machine-learning potentials(MLP)trained on first-principles datasets are becoming increasingly popular since they enable the treatment of larger system sizes and longer time scales compared to straight ab initio techniques.A key aspect of the use of these MLPs is the prospect of reliably assessing the accuracy viz.uncertainty of the predictions,e.g.,by training an ensemble of models.Here,we critically examined the robustness of such uncertainty predictions using equivariant message-passing neural networks as an example.We trained an ensemble of models on liquid silicon simulated at the gradient-corrected density-functional-theory level and compare the predicted uncertainties with prediction errors for various test sets,including liquid silicon at different temperatures and out-of-training-domain data such as solid phases with and without point defects as well as surfaces.These studies reveal that the predicted uncertainties are often overconfident.This is ascribed to the insufficient diversity in the members of the ensemble,as measured via error correlations. |