| As hardware performance and dependability have dramatically improved in the past few decades, the software dependability issues are becoming increasingly important. Unfortunately, many studies show that software bugs can greatly affect software dependability during production runs. To improve software dependability during production runs, this dissertation proposes to address software bugs at multiple levels by leveraging support from the hardware, OS, and runtime.;The proposed multi-level defenses address software bugs and their effects at different stages of program execution. At the first level, this dissertation proposes a low-overhead tool, called SafeMem, to detect memory leaks and memory corruption bugs by exploiting a novel usage of existing ECC memory. The experiments with seven real-world applications show that SafeMem detects all tested bugs with low overhead (only 1.6%-14.4%).;Unfortunately, some bugs may still slip through the first-level defense and may be exploited by security attacks. At the second level, this dissertation proposes a low-overhead, software-only information flow tracking system, called LIFT, to detect a wide range of security attacks that exploiting software bugs. LIFT incurs low-overhead by exploiting dynamic binary translation and optimizations. The experiments show that LIFT can effectively detect a wide range of security attacks and incurs low overhead, only 6.2% for the server application, and 3.6 times on average for seven SPEC INT2000 applications. The proposed dynamic optimizations effectively reduce the overhead by a factor of 5-12 times.;Without any further actions for the detected bugs or exploitations at the previous two levels of defenses, what the target system can do is to shut down itself to prevent potential damages, thus is unavailable to users. At the third-level, this dissertation proposes an innovative technique, called Rx, to quickly recover programs from many types of software bugs, both deterministic and non-deterministic. The experiments show that Rx can survive all tested software failures and provides transparent fast recovery within 0.017-0.16 seconds, 21-51 times faster than the whole system program restart approach for all but one case (CVS).;In summary, it can effectively improve software dependability during production runs by addressing software bugs at multiple levels with support from the hardware, OS, and runtime. |