Font Size: a A A

Improving the reliability and power-efficiency characteristics of emerging many-core multiprocessors

Posted on:2015-09-09Degree:Ph.DType:Thesis
University:The Pennsylvania State UniversityCandidate:Zhao, HuiFull Text:PDF
GTID:2472390017993743Subject:Computer Engineering
Abstract/Summary:
In recent years, many-core multiprocessors have become the focus of attention in computer architecture design. Designers are permitted by Moore's Law to integrate a large number of cores in a single chip. Processors containing close to a hundred of cores will appear in the near future. Using multiple cores on a single chip can significantly boost the performance of microprocessors. However, as the semiconductor industry has evolved into the deep sub-micron era, power consumption becomes the constraining factor to the further enhancement of the processor's performance. Meanwhile, reliability has become an issue when various power management techniques are applied to reduce the power consumption. These have led to the need for designs that carefully balance the trade-offs of all design options in order to achieve high reliability and power-efficiency without harming performance. This dissertation is a step towards developing many-core multiprocessors that address reliability and power-efficiency issues.;In this thesis, we address two important multiprocessor characteristics - reliability and power-efficiency. Given that caches and interconnection networks in multicores are two structures playing important roles in storing and transmitting program data, their reliability directly affects the correctness of program executions. We first provide a microarchitectural solution that is based on control theory to protect a multicore's caches against transient errors. Transient errors are caused by particle strikes such as neurons or alpha particles. Caches are particularly susceptible to transient errors due to their large sizes. Our scheme takes two input parameters into account: performance Quality of Service (QoS) requirement and reliability Quality of Service requirement. The performance QoS indicates the minimum cache hit rate value acceptable, whereas the reliability QoS represents the desired reliability assurance. By balancing the partitioned cache spaces allocated to data and their replicas, our proposed scheme is able to provide both performance and reliability guarantees.;To ensure correct data transmission, on-chip networks usually employ error correction codes to protect the data. However, prior work has mostly employed fixed data retransmission schemes when an error is detected. We propose a flexible scheme that dynamically chooses the time for error checking and retransmission and takes advantage of both the end-to-end retransmission and the hop-by-hop retransmission in a many-core multiprocessor. Our scheme not only meets the NoC reliability requirements but also improves the power-efficiency.;Besides their reliability design requirements, caches and networks also need careful consideration with respect to their power characteristics. On-chip networks and caches are two major power consumers in many-core multiprocessors aside from the cores. To reduce the power and area costs of on-chip networks, schemes have been proposed to design networks with bufferless routers. However, bufferless routers only bring benefits when network utilization is low. We provide a solution with a heterogeneous NoC design that employs both buffered and bufferless routers in the same network. We explore the design space by evaluating different router placements in order to achieve optimal performance with maximum power-efficiency. In order to fully take advantage of the heterogeneous NoC architecture, we also design algorithms for application mapping and packet routing.;Through our evaluation of parallel programs, we observe that program performances exhibit different scalability characteristics with the number of cores and the size of the caches. Running programs with more cores or larger caches does not always result in better performance but incur larger power overheads. Based on this observation, we propose a dynamic scheme to allocate power to cores and caches. Our scheme first predicts the scalability of parallel programs. Then some cores or caches are selectively gated off in order to save power. By taking advantage of the program scalability, our scheme can achieve near optimal performance with the power available. v.
Keywords/Search Tags:Power, Many-core multiprocessors, Reliability, Performance, Scheme, Characteristics, Caches, Cores
Related items