Research On Source Code Level Parallelization Based On Machine Learning | | Posted on:2024-09-23 | Degree:Doctor | Type:Dissertation | | Country:China | Candidate:Y Y Shen | Full Text:PDF | | GTID:1528307334977729 | Subject:Computer Science and Technology | | Abstract/Summary: | PDF Full Text Request | | As multi-core processors continue to evolve,parallel programming is becoming popular and is being used by an increasing number of designers to improve the efficiency of application execution.In OpenMP parallelization,researchers face the first problem of discovering parallelism,followed by converting parallelizable targets into OpenMP parallel programs,and finally detecting whether there are data races in parallel programs.One of the ways to guarantee performance is to parallelize programs based on manual efforts,but this is professionally limited and the analysis efficiency will not be guaranteed once the program size is large.Automatic OpenMP parallelization and data race detection tools help researchers to implement the parallel transformation of programs and data race detection in parallel programs.However,these tools are built based on explicit rules and show large limitations in program analysis scope,time and memory overhead.This dissertation proposes a machine learning approach to discover the inherent parallelism characteristics of programs from the source code level to solve the complex program parallelization problem.The main contributions of this dissertation are detailed as follows:Firstly,to cope with the problems of narrow scope and high time overhead of program analysis by traditional methods,this dissertation proposes a parallelism discovery method based on graph neural networks.The method establishes a parallelism discovery framework based on code embedding and graph classification techniques,which includes a graph-based data structure for representing code semantics and a deep convolutional graph neural network model for learning the graph-based data structure.To address the problem that existing datasets are insufficient for application to machine learning methods,a dataset generator is constructed,and a polyhedral program dataset is generated based on this generator.The dataset covers open-source code repositories for several applications,including benchmark programs,mathematical libraries,and common programs.The feasibility of this machine learning method is verified by comparing the performance of conventional static and dynamic methods,where the static methods include neural network models and static parallelization tools,and the dynamic methods include manually extracted feature models and a dependency profiling tool.The experimental results show the method has a higher discovery rate than the static methods and uses less time than the dynamic methods.Secondly,to obtain more parallelism discovery performance gain,this dissertation proposes a parallelism discovery method based on multi-graph learning.In the code representation phase,considering that a single graph structure lacks a complete understanding of the program semantics and syntax,a multi-graph representation strategy is proposed to represent the code in a complementary way.A program is represented using a control flow graph,data flow graph,contextual flow graph,and abstract syntax tree while generating separate graph vector representations for each graph.In the discovery phase,considering that the deep convolutional graph neural network model only accepts one graph as input,a multi-graph learning framework is proposed to learn different graph representations in a targeted manner.A decision fusion is performed on multiple graph representations to avoid irrelevant errors that multiple classifiers are prone to,where decision fusion follows the principle of the highest precision rate.In the dataset evaluation phase,datasets containing program diversity are generated to demonstrate the generality of the machine learning methods.Experimental results show that the parallelism discovery problem can be solved more accurately and efficiently using the multi-graph learning method.Thirdly,to cope with the problem of difficult identification of variable attributes during parallel program transformation,this dissertation proposes a variable classification method based on an attention mechanism.In this dissertation,the variable classification problem in OpenMP parallelization is formulated as a type annotation inference task.The method constructs a neural network learning architecture that does not require costly rule design or feature production.The architecture uses an attention mechanism to focus on information about the semantics of data sharing and to understand the data environment attributes of variables in specific contexts and relationships.To take advantage of the machine learning method,an alignment corpus is proposed to predict the attributes of variables defined in the target loops.The corpus consists of the lexical and OpenMP attributes of the source codes.The architecture support prediction of reduction attribute.Experimental results show that the machine learning-based method has the ability to solve variable classification problems.Fourthly,to deal with the problem that data race is difficult to detect in OpenMP parallel programs,this dissertation proposes a data race detection method based on contrastive learning.Considering the high structural similarity between parallel programs containing different OpenMP features,a unique positive sample is constructed for each program by combining label information while the negative samples are mutual between programs.The purpose of detecting data race is achieved by using the properties of the contrast learning method,i.e.,minimizing the distance between positive samples and maximizing the distance between negative samples.Transfer learning is used to provide a robust vector representation for the original code.Different optimization rules are established for positive and negative sample pairs using the contrastive loss function,which achieves the goal of bringing positive sample pairs closer and pushing negative sample pairs farther.Experimental results show that the proposed contrastive learning method can effectively detect data races in the programs and outperforms existing methods on different datasets. | | Keywords/Search Tags: | OpenMP parallelization, machine learning, graph neural network, multi-graph learning, variable classification, data race, contrastive learning | PDF Full Text Request | Related items |
| |
|