Font Size: a A A

Semantic Primitives Extraction Method For XBRL Domain Ontology

Posted on:2021-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:D YeFull Text:PDF
GTID:2439330647460367Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
XBRL(Extensible Business Reporting Language)has been used in more and more countries and organizations.Although it has achieved a certain degree of success,its application and development have encountered bottlenecks: on the one hand,there currently isn't have professional concept system in XBRL field guiding the use of tags;on the other hand,the semantics of the concepts in XBRL financial reports are weak,which affects their production and data sharing.Therefore,in order to enhance the readability of XBRL financial information by computers,a set of semantic primitives for financial reports are needed to explain the XBRL conceptual system.This paper comprehensively uses scientific theories such as semantic primitives,graph theory,and domain ontology as the research basis.First,the research status and deficiencies of the semantic primitive extraction method are analyzed by combing the relevant literature.Second,the construction of the accounting term relationship network graph from the perspective of graph theory is conducted by using the accounting dictionary and introducing the Page Rank algorithm.Since the original Page Rank algorithm has not taken the characteristics of the text domain into account,this paper proposes an improved Page Rank algorithm(PRFR algorithm)to extract semantic primitives based on the text features of financial reports and element lists.Then,the superiority of the model is analyzed by word frequency and TF-IDF Benchmark qualitative experiments contrast and the effectiveness of the model through blind selection experiments is quantitatively evaluated.Finally,the expression and verification of the element list and financial report knowledge are completed based on the extracted semantic primitives.The innovations of this article are as follows:(1)This article analyzes the language characteristics of financial reports and element lists and summarizes the structural characteristics of the terms in the element lists.First of all,this article combines the qualitative and quantitative methods to explain the characteristics of the financial report in terms of structure and terminology;then,the structural regularity of the terms in the element list is obtained by using the element list as the core corpus and via manual division,which contains the core words for the main information part and express the relevant attributes of the term by additional modifiers.This structural feature provides guidance and basis for the extraction of semantic primitives.(2)This article takes both the comprehensiveness and scale of semantic primitive extraction into account.First,by constructing a directed graph of the accounting dictionary,this paper analyzes that there are only two cases of "yes / no on the loop" at each node.Therefore,the points on the loop are extracted by PRFR value.If not on the loop,the point of degree 0 is selected to ensure the comprehensiveness and scientific of the extraction of semantic primitives.In addition,this article merges the preliminary extracted semantic primitives based on the synonym forest,which ensures the expression efficiency of semantic primitives to a greater extent.The strategy is developed to represent the largest domain knowledge range by the smallest semantic primitive scale.This article regulates the extraction of semantic primitives in XBRL financial reports from the perspective of semantics.The solution of this problem can prompt the computer to better understand XBRL financial reports and will take the applications of XBRL to a higher level.
Keywords/Search Tags:XBRL domain ontology, Semantic primitives, Financial report, PageRank
PDF Full Text Request
Related items