The FDA and EPA have identified pharmacogenomics and toxicogenomics as key opportunities for personalized medicine and environmental risk assessment. The first decade in the 21st century has witnessed breakthroughs in molecular diagnosis and prognosis in cancer treatment. It has now become increasingly important to transform environmental health protection by using high throughput assays to identify genomic biomarkers for personalized risk assessment. This talk will introduce two projects in Dr.
Developmental defects occur in 100,000 to 200,000 children born each year in the United States of America. 97% of these defects are from unidentified causes. Many fetal outcomes (e.g., developmental defects), result from interactions between genetic and environmental factors. The lifetime effects from prenatal exposures with low impact (e.g., air pollution) are often understudied. Even when these exposures are studied, the focus is often placed on immediate effects of the exposure (e.g., fetal anomalies, miscarriage rates) leaving lifetime effects largely unexplored.
Public databases such at the NIH Sequencing Read Archive (SRA) now contain hundreds of thousands of short-read sequencing experiments. A major challenge now is making that raw data accessible and useful for biological analysis — researchers must be able to find the relevant and related experiments on which to perform their analyses. A fundamental computational problem towards that effort is the problem of searching for short-read experiments by sequence.
Learning of classification models often relies on data that are labeled/annotated by a human expert. In general, more expertise and time the labeling process requires, more costly it is to label the data. In addition, there may be constraints on how many data instances one expert can feasibly label. Our goal is to find ways of reducing the number of labels and at the same time preserve or improve the quality of the models based on such labels. In this talk, I present two solutions we have developed to address the above problems.
Biomarkers are objectively measured characteristics which are commonly used across a range of scientific disciplines for diagnosis, prognosis, and prediction, and potentially as surrogate measures for the actual clinical outcome. The utility of biomarker studies, however, is typically limited to evaluating associations rather than causal relationships.
Lung cancer is the leading cause of cancer death in the United States. Cigarette smoking causes 85% of lung cancer deaths, however only about 15% of smokers will develop lung cancer in their lifetime. Genetic variations can modify the effect of the exposure to cigarette smoke. Our lab studies genetic variations in enzymes that detoxify potent carcinogens from cigarette smoke (including nitrosamines such as NNK and NNAL). We have identified a whole gene deletion polymorphism in a carcinogen metabolizing gene (UGT2B17) that is associated with decreased carcinogen metab
Learning health care systems (LHCS) propose to advance health care by taking advantage of increasing efficiencies in data collection and analysis and information dissemination. A major premise of LHCS is broad and continuous access to patient data, extracted at the point of care, and stored and used primarily in de-identified form.
Now more than ever, electronic health records (EHRs) are generated in large quantities and diverse content. This explosion of information has naturally enabled powerful data analyses to potentially improve healthcare.