| The quantitative representation of NSFC proposal form allows for fine-grained grouping of fund projects and helps managers select experts in the right fields.The fund proposal form contains key information about the applicant,the supporting organization,the subject area,as well as the research content,the research methodology,the technical route,the working basis and other key information closely related to the research work.However,the current mainstream text representation techniques suffer from difficulties in core content extraction and low efficiency in large-scale retrieval.Based on text representation methods,automatic text summarization,deep hashing,and other technical methods,the thesis investigates quantitative representation techniques for fund proposal forms and proposes an automatic text summarization generation model and a multi-indexed hash quantitative representation model.The thesis achieves hierarchical characterization and efficient quantitative indexing of largescale fund proposal forms,providing a new idea and method to solve the problem of precise matching of fund evaluation experts due to the longterm reliance on subject codes or keywords.The main contributions of the thesis are as follows:(1)Hierarchical representation of fund proposal forms.To address the problem of difficulty in extracting the core content of the fund proposal forms,the Based on the hierarchical information extraction method and automatic text abstract generation technology,the thesis establishes a hierarchical representation model for the fund declaration documents.Firstly,the thesis uses the logical structure of the fund proposal form to divide the form into multiple levels.Then,based on the abstract generation model that integrates keywords and grammatical knowledge,we extract the abstracts of each level of the fund proposal form.Finally,each level of abstract is combined into a complete abstract to achieve a hierarchical representation of the fund proposal forms.The experiments on LCSTS and NSFCS datasets show that the results of this method are better than those of RNN-Context and Copy Net models in terms of various ROUGE metrics.(2)A quantitative representation method for fund proposal forms based on multi-index hashing.The thesis addresses the problem of inefficient large-scale retrieval of fund proposal forms by combining deep neural networks and multi-indexed hashing algorithms,and proposes a quantitative representation model based on multi-indexed deep hashing.The thesis first generates semantic hash codes suitable for multi-index structure by multi-index deep hashing method and then constructs multiindex hash tables based on the idea of divisional coding.Finally,the thesis designs a k NN model-based quantified retrieval mechanism to achieve a quantified representation and efficient retrieval of large-scale fund proposal forms.Experiments on the THUCNews and NSFCH datasets show that this method outperforms mainstream,classical hashing methods such as BRE and DLBCH in key metrics such as Precision@100 and Accuracy.(3)Quantitative retrieval system for fund proposal forms.In order to meet the practical needs of quantitative retrieval of project declaration documents of NSFC,the thesis builds a quantitative retrieval system for fund proposal forms by applying the hierarchical representation of fund proposal forms and the multi-index hashing quantization calculation method as the underlying technology for large-scale retrieval.The thesis designs an automatic text summary generation module with the fund proposal forms as the underlying data and adopts the multi-index deep hashing method to quantify the fund proposal forms.Finally,the thesis implements the core content extraction and efficient retrieval function for a large number of fund proposal forms and is applied in a similar fund proposal form grouping and expert selection scenario. |