| With the development of e-commerce, online shopping has become one of the major consumption way. Comparing to the offline consumption, online shopping provides low cost and variety choices of goods for the consumers with less limitation of opening hours or shopping places. However, it is nothing but the abundant information and variety shopping choices of online shopping that take more time and energy from the consumers to find the appropriate merchandise. Meanwhile, the fierce competition between the e-commerce platforms forces the merchants to refine product demands in order meet the customers’ needs better, which also narrowing range of targeting users. It is an important and noticeable segment for the e-commerce to find target consumers among the crowd rapidly efficiently and to draw up marketing program more specifically, for the coming competition and development. The large quantity of customers’ behavior data on the e-commerce platform makes it possible to analyze their purchasing intention and consumption habits, thus realizes the precise commodity recommend of one-to-one.In this study, Tianchi big data platform was applied to provide real data, in order to predict the interact purchasing between the customer and the products which have interactive relationship. There are four steps of model building. The first step is data pre-treatment, exploring the basic distribution of data and pre-processing the data. This step provides reference and basis for the exaction methods of features and the selection of algorithm. The second step is sample selection. The reason for this step is that problem exists in the sample data that the number of positive sample is far exceeded the number of negative sample. The problem can be solved by three data processing. First, increase the number of positive sample by Moving Window. Second, compress the time window of interactive sample before prediction via timeliness of interactionanalysis to decrease the ratio of positive and negative samples. Third, randomly choose negative sample without replacement, but choosing all the positive sample. The third step is feature engineering. Construct the feature of user, item, item-category and user-item in multi-dimension. Then process and expand the feature group using different methods. That is, to get the second level features which are more applicable to the predicted model by different transformation on the basis of simple features; to get the derivative features which is more capable of showing the data feature and operation requirement by different combination on the basis of single features. Features are the independent variable of model prediction and determine the upper limit of prediction effect in prediction model. The upper limit can be reached by trying different algorithms and adjusting parameters. The forth step is model training and prediction. Logistic regression and GBDT are applied in this study to build prediction model. After the comparison of test set, the prediction of GBDT is found to be better than the other one. In order to improve the model prediction, the result of logistic regression is added as new features to re-predict in GBDT model. The prediction effect is increased to a level which is higher than single model. The reason is that GBDT itself is based on the strong classifier of regression tree. |