Font Size: a A A

Demonstration Research On Ontology-based Intelligent Search System Of Medical Information And TCM Literature

Posted on:2013-03-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:J WangFull Text:PDF
GTID:1224330374491817Subject:Basic Theory of TCM
Abstract/Summary:PDF Full Text Request
Clinical researchers start to pay attention to the massive literatures of clinical research and the effective literature search to get experience and guidance. This study aims to design a Traditional Chinese Medicine (TCM) literature information ontology based on the intelligent search model, design the search model by using newest semantic web technologies to improve the recall ratio and precision ratio. The output will be sorted according to the correlation.1. BackgroundThe TCM literature database, which is mostly completed based on external characteristics about literatures, can provide full-text search services.It is urgent for the TCM researchers to find out the overall development situation of relevant literature and the regular pattern of clinical treatment.Traditionally, full-text search are mainly carried out through keywords, keyword strings, or subject headings. Users have difficulties to express their real aim for searching, and it is hard to extract common rules from a large number of literature search results. The meaning could not been understood in the inflexible searching process, which causesthe situation that the search results contains a lot of useless information and loses many information about synonymous of keywords.2. ContentOur research object is to implement an intelligent retrieval system based on ontology in order to support the treatment related content of the TCM literature of clinical research.TargetBased on the search requirements of doctors for theliterature of clinical research during the dailyclinical study and treatment, create intelligent search model, through the semantic analysis of content and ontology construction of some TCM literature of clinical research. Improve the recall ratio and precision, and the output will besorted according to the correlation.Requirement InvestigationResearch the content of the Traditional Chinese Medicine literature of clinical researchin2006-2007to find the frequency of the information in the internal diagnosis and treatment and the relationship among them. Summarize the information of the literature can serve for the doctor.Make a research questionnaire to investigatethe demand for search of Traditional Chinese Medicine clinical doctors and researchers about the database of Traditional Chinese Medicine literature of clinical research.Summarizesome commonly used search types.Data collectionResearch resource isthe database of TCM literature of clinical research which was made by Institute of Information on TCM.The literature which basically completed entry and published in journals in2005-2007about the clinical research information is used.The research includeddiseases, syndrome, symptoms, therapeutic principle, formula, Traditional Chinese Medicine, acupuncture points, pharmacological classification which involves in the course of the diagnosis and treatment.Data processingThe research would be performed based on existing resources, and the data in the resource center should be sorted.Reference standards:Data processing must follow some principles, we select Mesh keywords list made in the U.S. national library of medicine of andThesaurus of Traditional Chinese Medicine, and use part of the national standard, the People’s Republic of TCM dictionary andas supplement.Standard process:Records can’t match with standard word list exactly the words and alias will carry outSimilarity algorithm. The similarity calculation formula is Jaccard. Normalize preliminary words according to the results of the similarity calculation and the analysis by researchers.Due to the difference of the collectionand the different level of participation and difficulty,disease, Traditional Chinese medicine, acu-points, formula are relative neat, and syndrome, symptoms, therapeutic principle are relatively complicated.Establishing of ontologyOntology established by researcher is mainly for the actual content of literatures, respect for clinical research real content, it’s notimported fromthe textbooks and dictionariesas ever.Determine the relationship propertiesamong concepts. Determine the association between the entities.Because the research data are found in different tables in database, and the structure of the table is complex,28are related, two field one-to-one relationship need to matchfor many times.Set up the object propertiesand the data properties between each table. Make the column as data properties.Establish the ontology ofdifferent records, adding semantic association of every specific entity.We choose Protege4.1, developed by Stanford University in the USA, as the tool for ontology engineering.The design of intelligent retrieval model and relevance ranking algorithmSearch page:provide search entrance, divided into simple search and advanced search.The results page:show the results and statisticthe frequency of the results.User intent analysis systemSegmentation natural language:with the help of ontology concept and Standard word table to pretreatment the key words and nature language of the query is entered by the user, to improvethe search accuracy of the concept and the combination of concept.The index library based on ontology:three-dimensional group of ontology, basicon the results of word segmentation, matching keywords which user access corresponding ontology, and then turn it into sparql statement.Semantic query system is the core of the whole system, to help expand and inquires the semantic query vector and other key module implementation basic of the concept and the semantic relations between concepts ontology server.DartQuery:Receive SPARQL queries, and turn SPARQL queries into SQL queries using the ontology and ontology-database mapping files.DartMapping:create a mapping relationship between the ontology and database, and produce mapping files, so that DartQuery can use it.Log systemLog recordand log analysis.The backend databases include navigation information database, information resource database and storage information database.Optimize Modular Result:Sort the research results and put the most correct result in the most front, so theSimilarity algorithmis especially important.Relevance ranking algorithmIn this paper, we consider the sort of the output results from two aspects: The similarity of the query language and ontology itselfThe importance of words which user inquires of corresponding ontology in the literature.Finally determine the final similarity ranks basic on both results of calculation.The similarity of the query language and ontology itself is calculated using the formulaThe importance of words which user inquires in corresponding with ontology in the literature is calculated by using the method of TF/IDF.Finallythe formula of search results sort formula reference to the two calculation results, expressed as:SIM=d*Sim+(1-d)*mat (O, t)."d" is weighting factor,can test to adjust and optimize the value. Intelligent retrieval model of TCM literature of clinical researchIncluding the following functions:KeywordsearchInclude disease, syndrome, symptoms, therapeutic principle, formula, single herb medicine, acupuncture point, etc.Navigation searchBoolean Operatorsand Truncation Wildcard Symbols.A search box will opened to suggest the most accurate words to the key words which users entered.Natural language searchThe result of the semantic analysis of ontology foundation can achieve natural language search, because the search process based on concept search not the keywords,Professional answerThe user could ask questions to the administrator. The answer will be automatically sent back to the user via email after being answered.User’s logRecord and show all previous inquires of users. Through the user’s log constant understanding, analysis, make personalized search more accord with each user needs.SortedbyrelevanceThe search results will be precedence ordering according to the correlation. The user can customize the sort according to his own demand in accordance with the correlation or literature published. Statistical functions:Provide direct statistics of the contents of the literature information. System testFor example, search with "Acupuncturefor angina pectoris" and we could get the22relevant literatures. Show that intelligent search model have the function of searching natural language according to semantic relationships.To search literature including the research on "blood stasis syndrome" as an example,1313literature could be searched for the intelligent search system,485articles more than traditional searchsystem, proved the intelligent search model can improve the recall ratio.Due to the knowledge (concept) search technology, clear and narrowed the scope of search, and reduce the search to the scope of useless information, improve the precision.In this study,the intelligent search system combine the two algorithm,we can order the search results according to the correlation, and also order the search results according to the yearwhen literatureswere published.Therefore, this research can achievethe goalthat improve the recall ratio and the precision rate and order the results according to thecorrelation about the TCM literature about clinical research.3. ConclusionThe contribution and the innovation mainly include the following aspects:The subject adopts the semantic web technology, established the data model of ontology on the upper deck of the relational database, and then create intelligent search model, provides reference about the ideas and methods of sharing with heterogeneous resources of heterogeneous database in the same field. Improve the recall ratio, precision ratio when searching the same repository.In this research the similarity algorithm combined results of two algorithms, can be more accurate, more appropriate to order the search results by correlation, and is feasible and useful.Research on the standard of metadata information about TCM literature. Make method about core metadata standard and the principle about extension. Regulate the description of TCM literature information and instruct the construction of relational databases.Research the concept system of ontology about TCM literature about clinicalresearch. Using ontology of principles, methods, formulas and medicinals in literature of clinical research to establish the concept system of literature of TCM clinical research, define the properties and semantic relations between all kinds ofconcepts. Help organizing information and knowledge of TCM literature, help achieving Intelligent Search. Direct statistics retrieval results, user can over huge document browsing and know the theme situation well.4. prospectBoosted invest is required for the work of information standardization of syndromes, symptoms, therapeutic principles.The establishment of ontology of TCM diagnosis and treatment needs a lot of effort, protege is suitable for small sample research, ontologybuilding of TCM construction should consider thetool that supportbatch import.This research do much detailed ontology construction, can be more directly, and more faster to get the related information.Intelligent search model according to these ontology still need more tests to prove the significance and efficacy.Provide reference for the doctor of TCM. The intelligent search model based on the result of research questionnaire to doctor about their demands to literature of clinical research more accord with their search needs, can more completed and more directly show the interior of the literature information.Through this research, hope can create a reasonable research methodor the operation process in Traditional Chinese Medicine literature of clinical research structured and the construction,will be areference of standard of collection and operation about database established.5. EpilogueInformation service of TCMgradually turns from the resource service to deepen information service and knowledge service, begins to value the user’s actual demand to improve the methods of information providing.Because the introduction of the concept of ontology and semantic technologies, make the service of search more accurate and rapid. At the same time it can make the different system or different structure in the same field to share resources, achieve theoretical data resource sharing.
Keywords/Search Tags:Traditional Chinese Medicine, literature of clinical research, Ontology, Intelligent search, similarity, Information on Traditional Chinese Medicine
PDF Full Text Request
Related items