| With the rapid development of computer and Internet technology,users increasingly rely on search engines to query and obtain the information they need,and how to help users quickly and accurately retrieve the required information from a large number of Internet information is the primary problem of information retrieval.Data fusion technology combines the results of different information retrieval systems in a certain way to form the final result.In this thesis,several weight allocation strategies and linear and nonlinear fusion methods are studied to improve the effectiveness and efficiency of data fusion technology.The main work of the thesis is as follows:(1)Linear combination is a very flexible and effective fusion method,and weight allocation is the key to determine the success of linear combination.Therefore,the heuristic weight allocation strategy based on performance is studied.The performance of the member system itself is reflected in the weight.The data fusion and evaluation of member systems are carried out by selecting different values of their performance as weights.Experiments on Text Retrieval Conference(TREC)datasets show that the proposed method is slightly better than the other three similar methods.It has improved not only in the comprehensive index such as MAP,but also in the P@10 index.(2)On the basis of(1),this thesis studies the data fusion method based on performance and difference heuristic weight allocation strategy.In this method,MAP,P@10,RP,RR and other indicators are used to calculate the performance values of member systems,and the ranking based method is used to calculate the differences between each member system,and the above two are combined to form the final weight.Experiments on TREC datasets show that the proposed weight allocation method can effectively improve the fusion performance compared with the existing algorithms.(3)The judgment of document relevance in the experimental data set is the most expensive part.This thesis explores a more economical and effective method by only judging the relevance of the top 10 documents in the result list of the member system and using the multivariate regression algorithm to calculate the weight.Experiments show that,compared with the method of determining the relevance of all documents,determining the relevance of only a small number of documents can not only improve the implementation efficiency of the retrieval system,but also improve the effectiveness of the retrieval results,especially the value of P@10 index.(4)A nonlinear combination method based on multivariable regression algorithm is studied.Multivariate regression is used to assign weights,and linear and nonlinear terms are introduced.The data sets of TREC are used as experimental objects.The experimental results show that both linear and nonlinear terms have positive effects on improving the Retrieval performance. |