Lecture Series
Towards Automating the Initial Screening Phase of a Systematic Review
Tanja Bekhuis, PhD,
Post Doctoral Fellow
Friday, September 4, 2009
11:00 AM to 12:00 PM
Parkvale Building (200 Meyran Avenue)
Classroom M-184 (on the mezzanine level), or via video conference at the UPMC Cancer Pavilion, Room 341
Abstract: At the beginning of the colloquium, Tanja will briefly describe life as a summer intern at the Lister Hill National Center for Biomedical Communications, the R&D division of NLM. A presentation follows about research conducted with a computer scientist in the Communications Engineering Branch.
Systematic review authors synthesize research to guide clinicians in their practice of evidence-based medicine. Ideally, two or more teammates independently identify provisionally eligible studies by reading the same set of hundreds and sometimes thousands of citations during an initial screening phase. This bottleneck slows the production of quality systematic reviews. To address the problem, we investigated whether supervised machine learning methods can potentially reduce work load during this phase. We also extended earlier research by including observational studies of treatments. To build training and test sets (ntrain=300; ntest=100), we used a subset of citations from a search conducted for an in-progress Cochrane review. Citations were annotated in accordance with review team decisions regarding potential eligibility. We extracted features from titles, abstracts, and metadata using a bag-of-words approach as baseline. After processing the features, we trained and compared several classifiers with respect to mean performance based on 10-fold cross-validations. The evolutionary support vector machine (EvoSVM) with an Epanechnikov kernel is the best classifier with respect to mean recall (100%). However, mean precision (48%) is modest. Using an iterative grid parameter optimization algorithm, EvoSVM with a radial kernel is best (mean recall=92%; mean precision=75%). As expected, given the very small training set and bag-of-words approach, performance for EvoSVM with an Epanechnikov kernel drops in the held-out test condition (recall=77%; precision=29%). Because near-perfect recall is essential in this context, we conclude that supervised machine learning methods may be useful for reducing work load. Future research necessarily entails extracting richer features, particularly phrases for observational studies, in combination with meta-classifiers to attain very high recall and boost precision.
For more information: www.dbmi.pitt.edu or 412.647.7113