| In recent years,the growth rate of communication operators' business development has slowed down noticeably,and the focus of ensuring the number of users has shifted from adding new users to how to maintain the stock users.One of the strategies is to identify those users who have an off-grid trend from the massive inventory user data.The method is to analyze the relevant data of those users who are already off-net users i.e.,the users with suspended state or dismantled state).Some of the connections between these users and other transactions are found,and the reason for their departure from the network is found.This is known in the industry as data mining for applicable association rules.The classical association rule algorithm can only mine the positive frequent item set and the positive association rule,so it can't reflect that the transaction A does not have a promotion relationship to the transaction B,i.e.,-A?B,and the transaction A has a suppression relationship with the transaction B,i.e.,A?-B and so on.The operators' data contains a large number of negative items,such as unrealized,unbound packages,and non-subsidized data.At the same time,it is necessary to study the impact of various transactions on negative items such as inactive users,no accounts,and no receipts.Therefore,it is necessary to optimize the classical association rule algorithm so that it can mine positive and negative frequent item sets at the same time,and generate more generalized association rules with positive and negative items before and after,and then apply generalized association rules to mine off-net user data.This paper first analyzes the key ideas of the classical association rules mining algorithm.Then this paper Introduces the so-called "generalized association rules" and proposes three optimization schemes for the characteristics of communication operators' off-net user data based on the traditional association rules algorithm,which are "generalization of association rules algorithm","value attribute discretization" and " optimization of support-confidence framework ".This constructs an algorithm GAPI suitable for data mining of off-net users of communication operators.Then this paper use GAPI algorithm to conduct experiments on a communication operator's off-net user data in 2017,and analyze the efficiency of GAPI algorithm.By referring to the frequent pattern tree architecture in the FP_GROWTH algorithm,the algorithm is further optimized and the algorithm GFP is proposed.Finally,the experiment and analysis are carried out again using the GFP algorithm,and satisfactory results are obtained.The research focus of this paper is: firstly,how to extend the traditional association rules for the characteristics of communication operators' off-net user data,and design corresponding data mining GAPI algorithm suitable for communication operators;secondly,how to realize optimization scheme and propose an algorithm suitable for communication operators' data mining;the third is how to use the real off-net user data of a communication operator to conduct experiments under the premise of ensuring data security;the fourth is how to further optimize the experimental results that are not ideal for improving the time efficiency,thereby improving the time efficiency.The experiment proves that the GAPI algorithm and the GFP algorithm are not only applicable to the operator off-net data mining,but also can effectively obtain the features of the off-net users,and the two generalized association rules algorithm can not only reduce the candidate data set space,but also effectively improve the mining efficiency.And they can effectively suppress redundant association rules generation and have a good running effect. |