Building a gold standard to construct search filters: A case study with biomarkers for oral cancer.
John J. Frazier, Corey D. Stein, Eugene Tseytlin, Tanja Bekhuis. Building a gold standard to construct search filters: A case study with biomarkers for oral cancer. J Med Libr Assoc. 2015 Jan;103(1):22-30. doi: 10.3163/1536-5050.103.1.005.
OBJECTIVE: To support clinical researchers, librarians and informationists may need search filters for particular tasks. Development of filters typically depends on a "gold standard" dataset. This paper describes generalizable methods for creating a gold standard to support future filter development and evaluation using oral squamous cell carcinoma (OSCC) as a case study. OSCC is the most common malignancy affecting the oral cavity. Investigation of biomarkers with potential prognostic utility is an active area of research in OSCC. The methods discussed here should be useful for designing quality search filters in similar domains.
METHODS: The authors searched MEDLINE for prognostic studies of OSCC, developed annotation guidelines for screeners, ran three calibration trials before annotating the remaining body of citations, and measured inter-annotator agreement (IAA).
RESULTS: We retrieved 1,818 citations. After calibration, we screened the remaining citations (n = 1,767; 97.2%); IAA was substantial (kappa = 0.76). The dataset has 497 (27.3%) citations representing OSCC studies of potential prognostic biomarkers.
CONCLUSIONS: The gold standard dataset is likely to be high quality and useful for future development and evaluation of filters for OSCC studies of potential prognostic biomarkers.
IMPLICATIONS: The methodology we used is generalizable to other domains requiring a reference standard to evaluate the performance of search filters. A gold standard is essential because the labels regarding relevance enable computation of diagnostic metrics, such as sensitivity and specificity. Librarians and informationists with data analysis skills could contribute to developing gold standard datasets and subsequent filters tuned for their patrons' domains of interest.