FAIR is foul and foul is FAIR: The challenge of ascertaining and ensuring the quality of online data

Seminar Date: 
Seminar Time: 
11am - 12pm
Seminar Location: 
Zoom Link
Mark Musen, MD, PhD
Presenter's Institution: 
Stanford University

Since the publication of the FAIR principles in 2016, the scientific community has been awash with efforts to make its data findable, accessible, interoperable, and reusable.  There is enormous peer pressure to assert that one’s online experimental data are already FAIR and that one’s data are more FAIR than those of the next person.  Alas, most online data are not FAIR, and attempts to measure FAIRness in a systematic way have gone nowhere.  The problem persists because the only contributor to data FAIRness over which individual investigators have direct control is the quality of the metadata that they use to annotate their datasets.  The quality of metadata cannot be determined programmatically in hope of measuring data FAIRness, unfortunately. The way to guarantee that data are FAIR is to ensure that datasets are archived with comprehensive metadata in the first place—metadata that describe not only the data themselves, but also the experimental conditions under which the data were collected.  The CEDAR workbench represents a means to help investigators to use standard reporting guidelines and standard ontologies to ensure the quality of their metadata at the time that data are archived.  Scientific data will not be truly FAIR until tools such as CEDAR are used to ensure that metadata are of high quality from the very beginning.