Recent developments in scalable Bayesian inference have enabled fast learning of complex probabilistic models using massive data sets. We will discuss these developments in the context of topic models of discrete grouped data, focusing on text. We will review our recent collaborations on scalable model learning using stochastic variational inference, and discuss new applications to structured hierarchical topic models.
Eye-tracking is a valuable research tool that is used in laboratory and limited field environments. We take steps toward developing methods that enable widespread adoption of eye-tracking and its real-time application in clinical decision support. Eye-tracking will enhance awareness and enable intelligent views, more precise alerts, and other forms of decision support in the Electronic Medical Record (EMR). We evaluated a low-cost eye-tracking device and found the device’s accuracy to be non-inferior to a more expensive device.
The FDA and EPA have identified pharmacogenomics and toxicogenomics as key opportunities for personalized medicine and environmental risk assessment. The first decade in the 21st century has witnessed breakthroughs in molecular diagnosis and prognosis in cancer treatment. It has now become increasingly important to transform environmental health protection by using high throughput assays to identify genomic biomarkers for personalized risk assessment. This talk will introduce two projects in Dr.
Developmental defects occur in 100,000 to 200,000 children born each year in the United States of America. 97% of these defects are from unidentified causes. Many fetal outcomes (e.g., developmental defects), result from interactions between genetic and environmental factors. The lifetime effects from prenatal exposures with low impact (e.g., air pollution) are often understudied. Even when these exposures are studied, the focus is often placed on immediate effects of the exposure (e.g., fetal anomalies, miscarriage rates) leaving lifetime effects largely unexplored.
Public databases such at the NIH Sequencing Read Archive (SRA) now contain hundreds of thousands of short-read sequencing experiments. A major challenge now is making that raw data accessible and useful for biological analysis — researchers must be able to find the relevant and related experiments on which to perform their analyses. A fundamental computational problem towards that effort is the problem of searching for short-read experiments by sequence.
Learning of classification models often relies on data that are labeled/annotated by a human expert. In general, more expertise and time the labeling process requires, more costly it is to label the data. In addition, there may be constraints on how many data instances one expert can feasibly label. Our goal is to find ways of reducing the number of labels and at the same time preserve or improve the quality of the models based on such labels. In this talk, I present two solutions we have developed to address the above problems.
Biomarkers are objectively measured characteristics which are commonly used across a range of scientific disciplines for diagnosis, prognosis, and prediction, and potentially as surrogate measures for the actual clinical outcome. The utility of biomarker studies, however, is typically limited to evaluating associations rather than causal relationships.