Font Size: a A A

Identification And Elimination Of Duplicate Reports In Signal Detection Of Adverse Drug Reaction

Posted on:2019-11-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:W T ShiFull Text:PDF
GTID:1364330542492011Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Background:The database of spontaneous reporting system is an important basis for adverse drug reactions monitoring and is the cornerstone for effective Pharmacovigilance.With the passage of time,the accumulation of data and the country’s attention to drug safety issues,reports of adverse reactions reported to the National Center for ADR monitoring have increased to over 11 million by the end of 2017.It is difficult to avoid duplicate reports because “Drug Adverse Reaction Reporting and Monitoring Management Measures” requires drug manufacturers,pharmaceutical companies and medical institutions to report adverse drug reactions,as well as the reason for not associating with the previous report when entering the follow-up report.Recently,the China Food and Drug Administration promulgated the "Announcement on Direct Reporting of Adverse Reactions by Drug Licensing Holders",requiring drug licensing holders to report adverse drug reactions.It can also cause duplicate reports.The presence of duplicate reports can cause false positive and false negative ADR signals,thus affecting the accuracy of signal detection.It is important to identify and eliminate duplicate reports from massive ADR data using statistical methods effectively so as to provide reliable data for signal detection of ADR and to discover drugs that are harmful to human accurately and timely.Objective:Our study mainly explored two parts based on database of ADR spontaneous reporting system in China.First,we analyzed the status of the duplicate reports of ADR database in China and build a variable matching model,a Hit-miss model and a Levenshtein Distance model that were applicable to our country’s database.After comparison,an optimal model suitable for China’s ADR database for removing duplicate reports was selected.Secondly,the optimal model was used to identify and eliminate duplicate reports from the ADR database in China,and the signal was re-examined to explore the effect of duplicate reports on signal detection.So we can provide high-quality data for the signal detection of ADR.Methods:Methodological study: First,we randomly selectd one month data according to the report date and used the variable matching method to find the suspected duplicate report.Then two people found out the duplicate reports separately by comparing other variables.Therefore,we obtained a gold standard database of duplicate reports for model evaluation.Secondly,three methods were applied to the gold standard database.From the six variables of name,gender,birth date,drug name,ADR,and ADR date,we selected different variable to form four situations(situation one: name,gender,birth date,drug name,ADR,and ADR date;situation two: name,birth date,drug name,ADR,and ADR date;situation three: name,gender,drug name,ADR,and ADR date;situation four: name,drug name,ADR,and ADR date).The comprehensive index F1-Measure composed of recall rate and precision rate were the evaluation index,and the optimal variable matching model,Hit-miss model and Levenshtein distance model were constructed.In order to improve operational efficiency,we used multiple lookup techniques in the Hit-miss model and Levenshtein distance model.Case study:We applied three models to the 2014 national ADR data to identify duplicate reports and re-detect ADR signals after eliminating the duplicate reports.Compared with the results of signal detection that has duplicate reports not been weighed,and we got the new signals and the lost signals after the elimination of the duplicate reports.We explained the results after the new and lost signals are compared with known adverse reaction databases.Results:1.Results of methodological study(1)The gold standard database of duplicate reportsIn the 2014 database,we selected 86,882 reports that the report date is in March.Using the variable matching method that included different variables(birth date,drug name,ADR,and ADR date;name,sex,birth date,ADR date;name,drug name,ADR),1280 suspected duplicate reports were found.After artificial comparison of other variables such as nationality,weight,telephone number,disease history,medical record number,reporter,and medical unit,a total of 359 duplicate reports were confirmed.(2)Results of three modelsAfter comparison,the variable matching model was best in situation 4,when four variables including name,drug name,ADR,and ADR date were included.The F1-Measure was highest with 58.82%,the recall rate and precision rate were 57.10% and 60.65,respectively.Hit-miss model was best in situation 2 and threshold value was 38.5,when five variables including name,birth date,drug name,ADR,and ADR date were included.The F1-Measure was highest with 74.93%,the recall rate and precision rate were 71.59% and 78.59 respectively.Levenshtein distance model was best in situation 4 and threshold value was 3.85,when four variables including name,drug name,ADR,and ADR date were included.The F1-Measure was highest with 75.96%,the recall rate and precision rate were 74.37% and 77.62 respectively.The 205,257 and 267 groups of true positive duplicate combinations were detected by the variable matching model,Hit-miss model and Levenshtein distance model respectively.2.Results of case studyOur case study was based on the National Adverse Drug Reactions spontaneous reporting system in 2014 including 1,232,641 reports.We used the variable matching model,Hit-miss model and Levenshtein distance model to identify duplicate reports.A total of 4191 repeated reports were found by the variable matching model,and the rate of duplicate reports was 0.35%.However the authenticity of the reports with names missing was doubtful.The Hit-miss model found a total of 5230 duplicate reports with an incidence rate of 0.36%.However,the model could not be well identified in highly duplicate reports with only different ADR dates,such as leukopenia reduction and myelosuppressed.The Levenshtein distance model found 4309 sets of duplicate reports,and the incidence rate was 0.32%.Compared with the variable matching model,the model not only filtered out exactly the same two reports,but also filtered out two reports with minor differences.Compared with the Hit-miss model,the model is more accurate and trustworthy.In our sthdy,29921,32428 and 21994 ADR signals were detected by ROR,PRR and IC respectively before duplication elimination.After duplication elimination by variable matching model,Hit-miss model and Levenshtein distance model,ROR method obtained signals with the results of 28803,28612,28739,PRR of 31248,31086,31201,and IC of 21242,21050,and 21155,respectively.The number of signals has a certain decrease,but the change is very small indicating that the duplicate reports have limited impact on the signals detection of ADR at this stage.We compared the signals obtained after duplication elimination with those before,and it was found that more than 90% of the lost signals were false positive signals.Conclusions:In summary,our study suggested that variable matching model(name,drug name,ADR and ADR date)or Levenshtein distance model(name,drug name,ADR and ADR date,the threshold is 3.85)should be used to eliminate duplicate reports in the database of China spontaneous reporting system and further manual determination of duplicate reports filtered by the model is required.Although the incidence of duplicate reports of adverse drug reactions in China is now less than 1%,due to the existence of the “Announcement on Direct Reporting of Adverse Reactions by Drug Licensing Holders”,the incidence of duplicate reports is bound to increase.So we must pay attention to the duplicate reports in the database.
Keywords/Search Tags:Adverse drug reactions, duplicate reports, variable matching model, Hit-miss matching model, Levenshtein distance model, signal detection
PDF Full Text Request
Related items