| Given a heterogeneous platform,the more and more data tranferring overhead,the unbalanced system resource utilization and the difficulty of achieving system peak performance,are the three prominent problems.So multiple streams(or streaming mechanism)has been introduced,which combines the pipelined technology and the thought of spatial sharing,hiding the data transferring overhead and improving the system resource utilization.Prior works on multiple streams mainly focus on GPUs,and with the popularization of MIC,the corresponding requirements become more and more urgent.Besides,multiple streams based on MIC provides the interface for programmer to control the system resource(but the GPUs is transparent for users),and then how the resource partitioning affects system performance,how to choose the best parameter values,we need the answer.At last,there are lots of applications which is not suitable to apply multiple streams,our programmers need one mechanism which could help them to judge whether one application is worthwhile to use multiple streams before streaming the code,of course,we also need streaming method.So firstly we evaluate the performance impact of multiple streams on the MIC-based heterogeneous platform.The evaluation work is performed at two levels: the microbenchmarking level and the real-world application level.By the systematical evaluation we conclude the performance benefits of using multiple streams on MIC,besides we also quantify the performance factor and present a set of heuristics to reduce the search space when determining a proper performance factor.To conclude,our evaluation work provides lots of insights for runtime and architecture designers when using multiple streams on Phi.Then we present one judging mark which could help users to estimate whether the program is worthwhile to use streaming mechanism,by which programmers could give up nearly half of the applications.In the meantime,for the heterogeneous codes,we identify two types of non-streamable codes and three types of streamable codes,for which a streaming approach has been proposed.Our experimental results on the CPU+MIC platform show that,with multiple streams,we can improve the performance by up 90%.Our work can serve as a generic flow of using multiple streams on heterogeneous platforms.At last,we present one performance tuning model based on the machine learning method,by which programmers could rapidly get the optimal parameter values.Resource granularity and task granularity are the two mainly performance factors influencing multiple streams,for which the value range is wide,if adopting manual search method one by one,we will cost lots of time and energy with low efficiency.Sowe generate automatic tuning model with offline learning supervision mode based on the machine learning method.Programmers could get the best values for the above two performance influencing factors by our automatic tuning model.So with our automatic tuning model,we will greatly reduce performance tuning effort,and effectively improve the work efficiency. |