Testing the face validity and inter-rater agreement of a simple approach to drug-drug interaction evidence assessment.
Grizzle AJ, Hines LE, Malone DC, Kravchenko O, Hochheiser H, Boyce RD. Testing the face validity and inter-rater agreement of a simple approach to drug-drug interaction evidence assessment. J Biomed Inform. 2020 Jan;101:103355. doi: 10.1016/j.jbi.2019.103355. Epub 2019 Dec 12. PMID: 31838211; DOI: 10.1016/j.jbi.2019.103355.
Low concordance between drug-drug interaction (DDI) knowledge bases is a well-documented concern. One potential cause of inconsistency is variability between drug experts in approach to assessing evidence about potential DDIs. In this study, we examined the face validity and inter-rater reliability of a novel DDI evidence evaluation instrument designed to be simple and easy to use.
A convenience sample of participants with professional experience evaluating DDI evidence was recruited. Participants independently evaluated pre-selected evidence items for 5 drug pairs using the new instrument. For each drug pair, participants labeled each evidence item as sufficient or insufficient to establish the existence of a DDI based on the evidence categories provided by the instrument. Participants also decided if the overall body of evidence supported a DDI involving the drug pair. Agreement was computed both at the evidence item and drug pair levels. A cut-off of ≥ 70% was chosen as the agreement threshold for percent agreement, while a coefficient > 0.6 was used as the cut-off for chance-corrected agreement. Open ended comments were collected and coded to identify themes related to the participants' experience using the novel approach.
The face validity of the new instrument was established by two rounds of evaluation involving a total of 6 experts. Fifteen experts agreed to participate in the reliability assessment, and 14 completed the study. Participant agreement on the sufficiency of 22 of the 34 evidence items (65%) did not exceed the a priori agreement threshold. Similarly, agreement on the sufficiency of evidence for 3 of the 5 drug pairs (60%) was poor. Chance-corrected agreement at the drug pair level further confirmed the poor interrater reliability of the instrument (Gwet's AC1 = 0.24, Conger's Kappa = 0.24). Participant comments suggested several possible reasons for the disagreements including unaddressed subjectivity in assessing an evidence item's type and study design, an infeasible separation of evidence evaluation from the consideration of clinical relevance, and potential issues related to the evaluation of DDI case reports.
Even though the key findings were negative, the study's results shed light on how experts approach DDI evidence assessment, including the importance situating evidence assessment within the context of consideration of clinical relevance. Analysis of participant comments within the context of the negative findings identified several promising future research directions including: novel computer-based support for evidence assessment; formal evaluation of a more comprehensive evidence assessment approach that requires consideration of specific, explicitly stated, clinical consequences; and more formal investigation of DDI case report assessment instruments.