Font Size: a A A

Research And Application Of Medical Data Retrieval Technology Based On Hadoop

Posted on:2024-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:J C ZhangFull Text:PDF
GTID:2544307184455824Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
The rapid development of computer network technology has ushered in the era of the internet.The increasing amount of medical data,including diseases,medications,and medical records,has resulted in a series of medical data.How to accurately and quickly retrieve useful medical data and use big data technology to improve the efficiency and accuracy of medical data retrieval has brought new opportunities to the future medical industry.When conducting medical data retrieval,if the computer can accurately locate the user’s needs based on the user’s input statement,and find the hidden meaning behind the user’s input statement,it can feedback the most needed data to the user.Therefore,conducting medical data retrieval based on the user’s semantics will make the retrieval results more accurate.This thesis first delves into the key technologies of Hadoop platform,the core of big data technology,including HDFS,Map Reduce distributed computing framework,and HBase.Hadoop serves as the overall system infrastructure,while HBase serves as the database for storing medical data.In terms of data preprocessing,distributed network crawlers are used to collect the required medical data,which is then cleaned using Hadoop.The cleaned medical text data is first stored in HDFS and then parsed and stored in HBase.To address the issue of full-scanning of the first-level index in HBase when the query content does not match the primary key and the problem of four I/O operations when querying the secondary index in a single table with large data volume,this thesis establishes a secondary index in HBase based on the co-processor mechanism and improves it by setting the primary key of the index table to match that of the data table,ensuring that the data table and the index table are in the same Region,and keeping I/O operations to a minimum of two.Additionally,when data is written,it is synchronized to Lucene and indexed to improve the efficiency of medical data retrieval.This thesis utilizes a latent semantic retrieval model to perform medical data retrieval based on semantic retrieval needs.An initial term-document matrix is established.The weighting calculation algorithm of the latent semantic retrieval model is improved by proposing term position factors and term frequency factors,and local weights of vocabulary are improved.The global weight of vocabulary is improved by combining information entropy with the IDF weight calculation algorithm.Additionally,global and local weights of documents are introduced.The weight of each term is taken as the value of the element in the term-document matrix,further improving the precision and recall of retrieval.Through singular value decomposition based on Hadoop,the term-document matrix is decomposed into three matrices,and the retrieval results are sorted by calculating similarity relationships.Through comparative experiments,this thesis has validated that the introduced latent semantic retrieval model has higher accuracy than the vector space retrieval model based on keyword matching for medical data retrieval.The improved weight calculation algorithm in the latent semantic retrieval model can enhance the precision and recall of the retrieval process.The introduction of the improved HBase secondary index further increases the efficiency of medical data retrieval.
Keywords/Search Tags:Medical data retrieval, Hadoop, HBase, Secondary index of HBase, Latent semantic retrieval model
PDF Full Text Request
Related items