| The openness of the Android operating system has attracted numerous developers to join the ranks of Android application development,covering various fields from productivity tools to entertainment games,making Android the mobile operating system with the largest market share.Nowadays,mobile devices have influenced every aspect of our lives,becoming the center for personal information and processing.Due to the openness of the Android system and the imperfection of its own security mechanism,malicious behavior of some applications stealing user privacy information frequently occurs,posing a threat to privacy information stored on mobile devices.Information flow technology,aiming to ensure the security of information,has been widely applied in the analysis and control of Android program behavior.However,due to the increasing complexity and diversity of application programs,it has become increasingly difficult to distinguish between benign and malicious application behaviors.The traditional approach of detecting malicious software using single information flow features may produce false positives.For example,both benign and malicious applications may use the same network interface to send out user privacy information.However,benign applications may only collect the International Mobile Equipment Identity(IMEI)information along with the International Mobile Subscriber Identity(IMSI),while malicious applications collect multiple pieces of information such as IMEI,IMSI,and geographic location.If the relationship between sensitive information flows is not considered,the feature representations of benign and malicious applications will be consistent,making it difficult to distinguish between the two.However,there is currently limited research on the analysis of information flow relationships.Application markets review and require developers to upload privacy policy documents to explain the collection of sensitive information.However,there is a lack of strict regulation,and users often ignore these policies when downloading applications.Even if they do read them,it is difficult to judge whether the actual behavior of the application is consistent with the stated behavior.The current research on the consistency analysis of sensitive behavior in applications and privacy policies is insufficient,which leads to widespread violations of personal information collection,inadequate notice of personal information collection,and over-collection of personal information.Furthermore,Android application programs integrate a large number of third-party components,and some components often obtain system resources beyond the scope and abuse sensitive information.The information flow within these components is very complex,and their direct access to system resources makes it difficult to achieve efficient fine-grained control.Furthermore,in practical applications,developers and users often have different functional and security requirements for different components.It is necessary to ensure the use and processing of sensitive information within a secure range while ensuring the basic functions are not affected.Therefore,a flexible mechanism is required to control their access to system resources.However,traditional access control techniques are difficult to achieve finegrained dynamic information flow control over component-sensitive behaviors.In response to these urgent issues,the main work of this article is as follows:(1)To address the limitations of traditional single information flow feature descriptions in characterizing different behavior patterns between benign and malicious application programs,as well as insufficient research on fine-grained information flow relationship features,a malicious software detection method based on information flow relationship features is proposed.In the sensitive information flow analysis stage,this method further explores the relationship features between information flows and provides a detailed formal description of the relationship between sensitive information flows.Then,a method utilizing dynamic programming is designed to analyze the relationship between sensitive information flows.The analysis identifies five types of relationships between sensitive information flows,including convergence,divergence,inclusion,connection,and crossover.For example,the benign application mentioned earlier has a convergence relationship between the IMEI and IMSI flows to the network interface,while the malicious application has a convergence relationship between the IMEI,IMSI,and location information flows to the network interface.Therefore,the different behavior patterns and information flow relationship features between benign and malicious applications can be characterized.In the feature construction stage,the relationship features are expressed as fivetuples,and the API in the continuous common subsequence is classified and expressed as sixtuples.Then,these two parts of the features are fused.In the detection stage,a machine learning model for malicious software detection is designed using convolutional neural networks.Finally,experimental results show that this method achieved an accuracy rate of 98.5% and 97.6% on the Mal Genome and Andro Zoo datasets,respectively.This demonstrates that the more fine-grained characterization of the relationship features between sensitive information flows plays an important role in distinguishing between benign and malicious applications.(2)It is widely found in application markets that application programs violate personal information collection regulations,have inadequate notification of personal information collection,and collect personal information beyond the scope of consent.However,there is a lack of targeted research on this issue.To address this,an Android application sensitive behavior and privacy policy consistency analysis method is proposed.In the privacy policy analysis stage,key information is extracted from the privacy policy statement document and the third-party information sharing list included therein based on the Bi-GRU-CRF neural network,and transformed into privacy policy three-tuples,namely entities,actions,and data types.In the sensitive behavior analysis stage,to match the granularity of the privacy policy analysis results with those of the sensitive behavior analysis results,the IFDS framework is optimized by classifying sensitive API calls,deleting previously analyzed sensitive API calls from the input sensitive source list,and marking previously extracted sensitive paths to reduce redundant analysis results and improve analysis efficiency.The extracted sensitive information flow is transformed into sensitive behavior two-tuples.In the consistency analysis stage,the semantic relationships between the ontology are defined as equivalence,subordination,and approximation.For these three types of relationships,semantic similarity is defined,and the consistency of sensitive behavior and privacy policy is classified into clear and fuzzy descriptions of consistency,and inconsistent descriptions of omission,incorrectness,and ambiguity.Finally,the proposed semantic similarity-based consistency analysis algorithm is used to analyze the consistency between sensitive behavior and privacy policy.Experimental results show that among the 928 applications analyzed,51.4% of applications have inconsistencies between sensitive behavior and privacy policy statements.(3)To address the problem that traditional access control techniques are difficult to achieve fine-grained dynamic information flow control over component-sensitive behaviors.A dynamic control method of component level sensitive behavior based on decentralized information flow strategy is proposed.This method first extracts the sensitive information flow of the components through static analysis,identifies the involved components as untrusted third-party components,and analyzes the system resources involved in sensitive information flows.Based on SELinux mandatory access control rules,security domains are added to both the components and system resources,and security labels are added to the component domain based on the defined decentralized information flow control model.The component domain is assigned the ability to access system resources,and the security labels are dynamically adjusted during application runtime,achieving dynamic control over component-sensitive behavior in accessing and processing sensitive information at the system level.Experimental results demonstrate that this method can effectively control the sensitive behavior of untrusted third-party components in the application program with an accuracy rate of 98.7% and low performance overhead,ensuring that the use and processing of sensitive information by untrusted third-party components in the program are always within a secure range. |