| Mobile code is a kind of computing model that is widely used not only in computer network and Internet technique, but also in many new techniques, such as mobile agent and Ad hoc network. However, Mobile code poses threats to security, which had never happened in the stand-alone environment. The threats come from two aspects. In one aspect, when mobile code is downloaded on a host, a malicious host will reverse engineer the code to extract secret data and algorithm from it. In some special situations, such as in mobile agent, the mobile agent program and its data state which work as the computing entity to finish the computation are open to the host. But a malicious host might tamper the agents for the purpose of attacking the mobile code. In the other aspect, the widespread of Internet make it more convenient and easier to download and execute pieces of code. But if the code is malicious, it might ruin resource and security of the host. The two aspects are called as malicious host problem and malicious code problem.Code obfuscation is a technique of automatic program transformation for solving the malicious host problem. The different roles played by code obfuscation in these two aspects incur respective problems to be dealt with: In mobile host problem, code obfuscation is a useful method to guarantee the security in mobile agent and protect the program from reverse engineering. At present, the interests on code obfuscation mainly focus on the construction of code obfuscation algorithm. None of these algorithms has provided the efficiency proof. In mobile code problem, code obfuscation is a widely-used tool to construct malware variations, which is a hard problem to the research of malware detection. The metaphor and polymorphic virus constructed by code obfuscation have greatly reduced the protecting capability of current malware detectors.In this dissertation, the relationship is established between the code obfuscation and semantics. Thus, the existing formal semantics theory enlightens the problem which code obfuscation brings up. According to the two problems, our research consists of two parts:(1) As for malicious host problem, code obfuscation is an important method to protect client code, but the lack of a theoretical background is a major drawback of current research about code obfuscation. This dissertation proposes a general method for measuring the efficiency based on semantic foundation:The semantic-based comparable framework measuring obfuscation efficiency includes two parts: formalize the space of code obfuscation and formally define the metrics measuring obfuscation efficiency. In the framework, measuring obfuscation efficiency is based on semantic information, more refined metrics can be defined for measuring obfuscation efficiency and the syntax-based metrics also can be involved into it.The flattening algorithm is a very important code obfuscation method. This dissertation formally defines the flatting algorithm and proves its efficiency in the comparable framework.We find that the weak expressiveness is the limitation of the partial order based comparable framework. The comparable framework can't compare all obfuscation algorithm based on partial order metrics.This dissertation studies the Probabilistic Abstract Interpretation (PAI) and establishes the method to measure program uncertainty. The method can be used for quantitatively measuring obfuscation efficiency. This dissertation also presents the Compact Operator space to extend the PAI in finite-dimension state space to infinite-dimension state space. All the obfuscation algorithms can lift the same concrete domain to a measurable vector space. Thus, the distance defined in the vector space can be used as the metrics to measure obfuscation efficiency.(2) As for malicious code problem, code obfuscation is an important attack tool for effective construction of malware variations, which brings a big challenge to current technique of malware detection. This dissertation studies the semantics-based malware detection.It presents a new semantics-based malware detector (MOM). Code obfuscation is a program transformation satisfying semantics-equivalence. The semantic relationship is the key to detecting whether a program is variation of a malware. The variations of one malware can be detected by the MOM without updating the virus database.It designs a malware detection method based on program verification. The method reduces the problem of malware detection into an equivalent problem of judging the validity of the verification conditions (VC). The specification of obfuscation algorithm can be input to guide the generation and proof of VC.The verification conditions generation of MOM is on the basis of the symbolic execution. So it's a problem to efficiently collect the symbolic state. Different from the traditional symbolic execution, the method proposed in this dissertation tries to solve the low efficiency in state substitution problem, by utilizing the Static Single Assignment (SSA) as the intermediate representation and the dominator tree based symbolic state as collection strategy, and realizes high efficient collection of symbolic state.It provides a MOM prototype which implemented on the compiling framework LLVM. The experiment shows that the MOM is effective. |