Linear And Nonlinear Combination-based Data Fusion Technology In Information Retrieval

Posted on:2023-12-10

Degree:Master

Type:Thesis

Country:China

Candidate:W Yan

Full Text:PDF

GTID:2568307025961819

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer and Internet technology,users increasingly rely on search engines to query and obtain the information they need,and how to help users quickly and accurately retrieve the required information from a large number of Internet information is the primary problem of information retrieval.Data fusion technology combines the results of different information retrieval systems in a certain way to form the final result.In this thesis,several weight allocation strategies and linear and nonlinear fusion methods are studied to improve the effectiveness and efficiency of data fusion technology.The main work of the thesis is as follows:(1)Linear combination is a very flexible and effective fusion method,and weight allocation is the key to determine the success of linear combination.Therefore,the heuristic weight allocation strategy based on performance is studied.The performance of the member system itself is reflected in the weight.The data fusion and evaluation of member systems are carried out by selecting different values of their performance as weights.Experiments on Text Retrieval Conference(TREC)datasets show that the proposed method is slightly better than the other three similar methods.It has improved not only in the comprehensive index such as MAP,but also in the P@10 index.(2)On the basis of(1),this thesis studies the data fusion method based on performance and difference heuristic weight allocation strategy.In this method,MAP,P@10,RP,RR and other indicators are used to calculate the performance values of member systems,and the ranking based method is used to calculate the differences between each member system,and the above two are combined to form the final weight.Experiments on TREC datasets show that the proposed weight allocation method can effectively improve the fusion performance compared with the existing algorithms.(3)The judgment of document relevance in the experimental data set is the most expensive part.This thesis explores a more economical and effective method by only judging the relevance of the top 10 documents in the result list of the member system and using the multivariate regression algorithm to calculate the weight.Experiments show that,compared with the method of determining the relevance of all documents,determining the relevance of only a small number of documents can not only improve the implementation efficiency of the retrieval system,but also improve the effectiveness of the retrieval results,especially the value of P@10 index.(4)A nonlinear combination method based on multivariable regression algorithm is studied.Multivariate regression is used to assign weights,and linear and nonlinear terms are introduced.The data sets of TREC are used as experimental objects.The experimental results show that both linear and nonlinear terms have positive effects on improving the Retrieval performance.

Keywords/Search Tags:

data fusion, linear combination, nonlinear combination, similarity measurement, weight assignment

PDF Full Text Request

Related items

1	Research On Fusion-Based Methods For Search Result Diversification
2	Research On Data Fusion In Information Retrieval By Using Intelligent Optimization Methods
3	Research On Application Of Data Fusion Methods In Electronic Medical Record Retrieval
4	Research On Basic Probability Assignment Derivation And Combination In Evidence Theory
5	Research On IP Network Traffic Variable Weight Combination Forecasting Model
6	Research And Implementation Of Frequent Itemsets Mining Algorithm In Linear Table Based On Bit Combination
7	Research On Attitude Calculation Algorithm Of Moving Body Based On Inertial-Geomagnetic Combination
8	Study On Reliable Evidence Combination Methods In High-level Information Fusion
9	Research And Implementation Of Personalized Recommendation Algorithm Based On User Segmentation And Combination Similarity
10	The Design And Implementation Of Multi-features Combination In Sentence Similarity Computation