Department of Biomedical Informatics

Recent Publications

National Mesothelioma Virtual Bank: A Standard Based Biospecimen and Clinical Data Resource to Enhance Translational Research.

Amin, W., A. V. Parwani, L. Schmandt, S. K. Mohanty, G. Farhat, A. K. Pople, S. B. Winters, N. B. Whelan, A. M. Schneider, J. T. Milnes, F. A. Valdivieso, M. Feldman, H. I. Pass, R. Dhir, J. Melamed, and M. J. Becich. BMC Cancer 8 (2008): 236.

BACKGROUND: Advances in translational research have led to the need for well characterized biospecimens for research. The National Mesothelioma Virtual Bank is an initiative which collects annotated datasets relevant to human mesothelioma to develop an enterprising biospecimen resource to fulfill researchers' need.

PMID: 18700971 [PubMed - indexed for MEDLINE]

Discontinuing Medications: A Novel Approach for Revising the Prescribing Stage of the Medication-Use Process.

Bain, K. T., H. M. Holmes, M. H. Beers, V. Maio, S. M. Handler, and S. G. Pauker. J Am Geriatr Soc (2008).

BACKGROUND: Thousands of Americans are injured or die each year from adverse drug reactions, many of which are preventable. The burden of harm conveyed by the use of medications is a significant public health problem, and therefore, improving the medication-use process is a priority. Recent and ongoing efforts to improve the medication-use process have focused primarily on improving medication prescribing, and not much emphasis has been put on improving medication discontinuation.

PMID: 18771457 [PubMed - indexed for MEDLINE]

Improving Peptide Identification via Validation with Intensity-based Modeling of Tandem Mass Spectra.

Grover, H., Lustgarten, J., Visweswaran, S., Gopalakrishnan, V. In Proceedings of the International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-08). (2008).pp. 56-63.

Consensus guidelines for dosing primarily renally cleared medications in older outpatients.

Hanlon, J. T., S. Aspinall, S. M. Handler, M. Rossi, L. F. Fried, S. Weisbord, C. B. Good, M. Fine, R. Stone, M. Pugh and T. P. Semla. Journal of the American Geriatrics Society 2008;56(4):S190-S190

An Evaluation of Discretization Methods for Learning Rules from Biomedical Datasets.

Lustgarten, J. L., Grover, H., Visweswaran, S., Gopalakrishnan, V. In Proceedings of the 2008 International Conference on Bioinformatics and Computational Biology (BIOCOMP'08). (2008). Editors Hamid R. Arabnia, Mary Qu Yang, Jack Y. Yang. pp. 527-532.

Whole Genome Snp Arrays as a Potential Diagnostic Tool for the Detection of Characteristic Chromosomal Aberrations in Renal Epithelial Tumors.

Monzon, F. A., J. M. Hagenkord, M. A. Lyons-Weiler, J. P. Balani, A. V. Parwani, C. M. Sciulli, J. Li, U. R. Chandran, S. I. Bastacky, and R. Dhir. Mod Pathol 21, no. 5 (2008): 599-608.

BACKGROUND: Renal tumors with complex or unusual morphology require extensive workup for accurate classification. Chromosomal aberrations that define subtypes of renal epithelial neoplasms have been reported. We explored if whole-genome chromosome copy number and loss-of-heterozygosity analysis with single nucleotide polymorphism (SNP) arrays can be used to identify these aberrations and classify renal epithelial tumors.

PMID: 18246049 [PubMed - indexed for MEDLINE]

Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers.

Piwowar, H. A., M. J. Becich, H. Bilofsky, and R. S. Crowley. PLoS Med 5, no. 9 (2008): e183.

BACKGROUND: Advances in translational research have led to the need for well characterized biospecimens for research. The National Mesothelioma Virtual Bank is an initiative which collects annotated datasets relevant to human mesothelioma to develop an enterprising biospecimen resource to fulfill researchers' need.

PMID: 18767901 [PubMed - indexed for MEDLINE]

Facebook for Scientists: Requirements and Services for Optimizing How Scientific Collaborations Are Established

Schleyer, T., H. Spallek, B. S. Butler, S. Subramanian, D. Weiss, M. L. Poythress, P. Rattanathikun, and G. Mueller. J Med Internet Res 10, no. 3 (2008): e24.

OBJECTIVE: This study was conducted to answer the following questions: (1) Which requirements should systems for finding collaborators in biomedical science fulfill? and (2) Which information technology services can address these requirements?

PMID: 18701421 [PubMed - indexed for MEDLINE]

Supporting Emerging Disciplines with E-Communities: Needs and Benefits.

Spallek, H., B. S. Butler, T. K. Schleyer, P. M. Weiss, X. Wang, T. P. Thyvalikakath, C. L. Hatala, and R. A. Naderi.J Med Internet Res 10, no. 2 (2008): e19.

BACKGROUND: Science has developed from a solitary pursuit into a team-based collaborative activity and, more recently, into a multidisciplinary research enterprise. The increasingly collaborative character of science, mandated by complex research questions and problems that require many competencies, requires that researchers lower the barriers to the creation of collaborative networks of experts, such as communities of practice (CoPs).

PMID: 18653443 [PubMed - indexed for MEDLINE]

An implementation of Bayesian adaptive regression splines (BARS) in C with S and R wrappers.

Wallstrom, G., J. Liebner and R. E. Kass. Journal of Statistical Software 2008:26(1).

A multifactor approach to student model evaluation.

Yudelson, M. V., O. P. Medvedeva and R. S. Crowley. User Modeling and User-Adapted Interaction 2008;18(4):349-382

ABSTRACT: Creating student models for Intelligent Tutoring Systems (ITS) in novel domains is often a difficult task. In this study, we outline a multifactor approach to evaluating models that we developed in order to select an appropriate student model for our medical ITS. The combination of areas under the receiver-operator and precision-recall curves, with residual analysis, proved to be a useful and valid method for model selection. We improved on Bayesian Knowledge Tracing with models that treat help differently from mistakes, model all attempts, differentiate skill classes, and model forgetting. We discuss both the methodology we used and the insights we derived regarding student modeling in this novel domain.

Interactive Computer-Aided Diagnosis of Breast Masses: Computerized Selection of Visually Similar Image Sets from a Reference Library.

Zheng, B., C. Mello-Thoms, X. H. Wang, G. S. Abrams, J. H. Sumkin, D. M. Chough, M. A. Ganott, A. Lu, and D. Gur. Acad Radiol 14, no. 8 (2007): 917-27.

RATIONALE AND OBJECTIVES: The clinical utility of interactive computer-aided diagnosis (ICAD) systems depends on clinical relevance and visual similarity between the queried breast lesions and the ICAD-selected reference regions. The objective of this study is to develop and test a new ICAD scheme that aims improve visual similarity of ICAD-selected reference regions.

PMID: 17659237 [PubMed - indexed for MEDLINE]

Evaluation of preprocessing techniques for chief complaint classification.

Dara J, Dowling JN, Travers D, Cooper GF, Chapman WW. J Biomed Inform. 2008 Aug;41(4):613-23. Epub 2007 Nov 29.

OBJECTIVE: To determine whether preprocessing chief complaints before automatically classifying them into syndromic categories improves classification performance. METHODS: We preprocessed chief complaints using two preprocessors (CCP and EMT-P) and evaluated whether classification performance increased for a probabilistic classifier (CoCo) or for a keyword-based classifier (modification of the NYC Department of Health and Mental Hygiene chief complaint coder (KC)). RESULTS: CCP exhibited high accuracy (85%) in preprocessing chief complaints but only slightly improved CoCo's classification performance for a few syndromes. EMT-P, which splits chief complaints into multiple problems, substantially increased CoCo's sensitivity for all syndromes. Preprocessing with CCP or EMT-P only improved KC's sensitivity for the Constitutional syndrome. CONCLUSION: Evaluation of preprocessing systems should not be limited to accuracy of the preprocessor but should include the effect of preprocessing on syndromic classification. Splitting chief complaints into multiple problems before classification is important for CoCo, but other preprocessing steps only slightly improved classification performance for CoCo and a keyword-based classifier.

PMID: 18166502 [PubMed - indexed for MEDLINE]

Making a mark — taking assessment to technology

Cox MJ, Schleyer T, Johnson LA, Eaton KA, Reynolds PA. Br Dent J. 2008 Jul 12;205(1):33-9.

During any course of study, students are assessed usually through a range of methods which may include written examinations, coursework assignments, professional practice, oral tests and practical examinations. This article considers the various forms of assessment in dental education and how information and communication technology is being applied to them. As innovative teaching and learning methods such as computer simulations are introduced, the assessment of results, successes and failures is taking on new forms in many traditional courses. The web is also spreading its tentacles into assessment, with the benefits of offering almost instant feedback and support. However, technology brings its own problems, not least by making ever more ingenious methods of plagiarism easier. Educational establishments, therefore, must be aware of such problems and have policies in place to counteract them.

PMID: 18617943 [PubMed - indexed for MEDLINE]

Using Gaze-tracking Data and Mixture Distribution Analysis to Support a Holistic Model for the Detection of Cancers on Mammograms.

Kundel HL, Nodine CF, Krupinski EA, Mello-Thoms C. Acad Radiol. 2008 Jul;15(7):881-6.

RATIONALE AND OBJECTIVES: Use data collected independently at three institutions to compare time to first fixate the true lesion in searching for cancers on mammograms. Examine the fit of the results to a holistic model of visual perception. MATERIALS AND METHODS: The time required to first fixate a cancer on a mammogram was extracted from 400 eye-tracking records collected independently from three institutions. The time was used as an indicator of the initial perception of cancer. The distribution of first fixation times was partitioned into two normally distributed components using mixture distribution analysis. The true-positive fraction of each component was calculated. RESULTS: About 57% of the cancers had a 95% chance of being fixated in the first second of viewing. The remainder took longer (range, 1.0 to 15.2 seconds). The true-positive fraction was larger for the lesions hit immediately for most of the readers (TPF = 0.63 vs. 0.52, F = 5.88, P = .02) in 68% (13/19) of the readers. CONCLUSIONS: The initial detection occurs before visual scanning and, therefore, must be the result of a parallel "global" analysis of the image resulting in an initial holistic, gestalt-like perception. The development of expertise in medical image analysis may consist of a shift in the recognition mechanism from scan-look-detect to look-detect-scan.

PMID: 18572124 [PubMed - indexed for MEDLINE]

Reducing morphological variability of the cervical carotid artery in serial magnetic resonance imaging using a head and neck immobilization device.

Chapman BE, Minalga ES, Brown C, Roberts JA, Hadley JR. J Magn Reson Imaging. 2008 Jul;28(1):258-62.

PURPOSE: To evaluate how well a head and neck immobilization device performed in reducing lumen morphology variability in repeated MR imaging of the carotid artery. MATERIALS AND METHODS: Quantitative measures of lumen and plaque characteristics may be important for longitudinal management of carotid atherosclerotic disease. However, quantitative measurements of the carotid artery are limited by their dependence on patient positioning, which can be quite variable. We created a head and neck immobilization device to reduce the variability of patient positioning during MR imaging of the carotid artery. In this article we describe the design and use of the immobilization device and assess how well its use reduced variability in vascular orientation and measurements of the carotid lumen cross-sectional area. Evaluation was based on 15 subjects who were repeatedly imaged without the immobilization device and 14 subjects who were repeatedly imaged with the device. RESULTS: Use of the immobilization device decreased the orientation variability from 9.1 degrees to 5.3 degrees (P = 0.0006) and the variability (defined as the standard deviation divided by the mean) of the cross-sectional area decreased from 0.24 to 0.18 (P = 0.04). CONCLUSION: Using the immobilization device effectively reduces variability in repeated imaging of the carotid arteries. J. Magn. Reson. Imaging 2008;28:258-262. (c) 2008 Wiley-Liss, Inc.

PMID: 18581389 [PubMed - indexed for MEDLINE]

Bayesian prediction of an epidemic curve.

Jiang X, Wallstrom G, Cooper GF, Wagner MM. J Biomed Inform. 2008 Jun 13.

An epidemic curve is a graph in which the number of new cases of an outbreak disease is plotted against time. Epidemic curves are ordinarily constructed after the disease outbreak is over. However, a good estimate of the epidemic curve early in an outbreak would be invaluable to health care officials. Currently, techniques for predicting the severity of an outbreak are very limited. As far as predicting the number of future cases, ordinarily epidemiologists simply make an educated guess as to how many people might become affected. We develop a model for estimating an epidemic curve early in an outbreak, and we show results of experiments testing its accuracy.

PMID: 18593605 [PubMed - indexed for MEDLINE]

Evaluation of SOVAT: an OLAP-GIS decision support system for community health assessment data analysis.

Scotch M, Parmanto B, Monaco V. BMC Med Inform Decis Mak. 2008 Jun 9;8:22.

BACKGROUND: Data analysis in community health assessment (CHA) involves the collection, integration, and analysis of large numerical and spatial data sets in order to identify health priorities. Geographic Information Systems (GIS) enable for management and analysis using spatial data, but have limitations in performing analysis of numerical data because of its traditional database architecture.On-Line Analytical Processing (OLAP) is a multidimensional datawarehouse designed to facilitate querying of large numerical data. Coupling the spatial capabilities of GIS with the numerical analysis of OLAP, might enhance CHA data analysis. OLAP-GIS systems have been developed by university researchers and corporations, yet their potential for CHA data analysis is not well understood. To evaluate the potential of an OLAP-GIS decision support system for CHA problem solving, we compared OLAP-GIS to the standard information technology (IT) currently used by many public health professionals. METHODS: SOVAT, an OLAP-GIS decision support system developed at the University of Pittsburgh, was compared against current IT for data analysis for CHA. For this study, current IT was considered the combined use of SPSS and GIS ("SPSS-GIS"). Graduate students, researchers, and faculty in the health sciences at the University of Pittsburgh were recruited. Each round consisted of: an instructional video of the system being evaluated, two practice tasks, five assessment tasks, and one post-study questionnaire. Objective and subjective measurement included: task completion time, success in answering the tasks, and system satisfaction. RESULTS: Thirteen individuals participated. Inferential statistics were analyzed using linear mixed model analysis. SOVAT was statistically significant (alpha = .01) from SPSS-GIS for satisfaction and time (p < .002). Descriptive results indicated that participants had greater success in answering the tasks when using SOVAT as compared to SPSS-GIS. CONCLUSION: Using SOVAT, tasks were completed more efficiently, with a higher rate of success, and with greater satisfaction, than the combined use of SPSS and GIS. The results from this study indicate a potential for OLAP-GIS decision support systems as a valuable tool for CHA data analysis.

PMID: 18541037 [PubMed - indexed for MEDLINE]

EPO-KB: a searchable knowledge base of biomarker to protein links.

Lustgarten JL, Kimmel C, Ryberg H, Hogan W. Bioinformatics. 2008 Jun 1;24(11):1418-9. Epub 2008 Apr 9.

The knowledge base EPO-KB (Empirical Proteomic Ontology Knowledge Base) is based on an OWL ontology that represents current knowledge linking mass-to-charge (m/z) ratios to proteins on multiple platforms including Matrix Assisted Laser/Desorption Ionization (MALDI) and Surface Enhanced Laser/Desorption Ionization (SELDI)--Time of Flight (TOF). At present, it contains information on m/z ratio to protein links that were extracted from 120 published research papers. It has a web interface that allows researchers to query and retrieve putative proteins that correspond to a user-specified m/z ratio. EPO-KB also allows automated entry of additional m/z ratio to protein links and is expandable to the addition of gene to protein and protein to disease links. AVAILABILITY: http://www.dbmi.pitt.edu/EPO-KB

PMID: 18400772 [PubMed - indexed for MEDLINE]

Frequency of laboratory monitoring of chronic medications administered to nursing facility residents: results of a national internet-based study.

Handler SM, Shirts BH, Perera S, Becich MJ, Castle NG, Hanlon JT. Consult Pharm. 2008 May;23(5):387-95.

OBJECTIVE: To determine the minimal frequency of laboratory monitoring of 30 types of chronic medications or classes that are administered to nursing facility residents and are either listed under pharmacy services tag F329 (the tag for unnecessary medications), or have a narrow therapeutic index. DESIGN AND SETTING: Cross-sectional, Internet-based survey. PARTICIPANTS: National sample of 500 pharmacists, 500 nurse practitioners, and 327 physicians. MAIN OUTCOME MEASURE: Minimal frequency of monitoring, recorded as an interval of 1, 3, 6, 9, or 12 months, for each of 35 laboratory parameters (e.g., serum drug level, complete blood count, liver function tests) for the 30 types of chronic medications or classes. Agreement was defined as having two or more of the three professional groups select the same minimal monitoring interval. RESULTS: Overall, 116 professionals (20 pharmacists, 48 physicians, and 48 nurse practitioners) completed the survey. Most respondents were women (58.6% [68/116]), and most had worked in nursing facilities for > 5 years (66.4% [77/116]). Regarding minimal laboratory monitoring intervals, respondents reached agreement concerning 33 of 35 parameters. They selected three or six months as the minimum interval for 30 of 35 parameters (85.7%), one month as the minimum for two parameters, and 12 months as the minimum for one parameter. CONCLUSION: The multidisciplinary panel agreed that most medications that were listed under the F329 tag or have a narrow therapeutic index should have laboratory monitoring every three or six months. The results can be used by nursing facility professionals to establish minimal laboratory monitoring parameters for chronic medications, which may potentially reduce the occurrence of adverse drug reactions.

PMID: 18540792 [PubMed - indexed for MEDLINE]

Consensus list of signals to detect potential adverse drug reactions in nursing homes.

Handler SM, Hanlon JT, Perera S, Roumani YF, Nace DA, Fridsma DB, Saul MI, Castle NG, Studenski SA. J Am Geriatr Soc. 2008 May;56(5):808-15. Epub 2008 Mar 21.

OBJECTIVES: To develop a consensus list of agreed-upon laboratory, pharmacy, and Minimum Data Set signals that a computer system can use in the nursing home to detect potential adverse drug reactions (ADRs). DESIGN: Literature search for potential ADR signals, followed by an internet-based, a two-round, modified Delphi survey. SETTING: A nationally representative survey of experts in geriatrics. PARTICIPANTS: Panel of 13 physicians, 10 pharmacists, and 13 advanced practitioners. MEASUREMENTS: Mean score and 95% confidence interval (CI) for each of 80 signals rated on a 5-point Likert scale (5=strong agreement with likelihood of indicating potential ADRs). Consensus agreement indicated by a lower-limit 95% CI of 4.0 or greater. RESULTS: Panelists reached consensus agreement on 40 signals: 15 laboratory and medication combinations, 12 medication concentrations, 10 antidotes, and three Resident Assessment Protocols (RAPs). Highest consensus scores (4.6, 95% CI=4.4-4.9 or 4.4-4.8) were for naloxone when taking opioid analgesics; phytonadione when taking warfarin; dextrose, glucagon, or liquid glucose when taking hypoglycemic agents; medication-induced hypoglycemia; supratherapeutic international normalized ratio when taking warfarin; and triggering the Falls RAP when taking certain medications. CONCLUSION: A multidisciplinary expert panel was able to reach consensus agreement on a list of signals to detect potential ADRs in nursing home residents. The results of this study can be used to prioritize an initial list of signals to be included in paper- or computer-based methods for potential ADR detection.

PMID: 18363678 [PubMed - indexed for MEDLINE]

A salient problem in informatics?

Schleyer T. J Am Med Inform Assoc. 2008 Apr 24.

The Jan/Feb issue of JAMIA contained an interesting series of articles about the automated identification of smoking status from medical discharge records. It profiled the comparative performance of 11 different systems for the classification of patient records into five general categories for smoking status. The various classification approaches used, such as Bayesian classifiers, natural language processing, support vector machines and neural networks, illustrated the rich and diverse set of algorithms used in automated text processing and classification today. Even more impressive was the performance of some of these systems, which, in certain aspects, approximated the gold standard.

PMID: 18436894 [PubMed - indexed for MEDLINE]

The development and deployment of Common Data Elements for tissue banks for translational research in cancer - an emerging standard based approach for the Mesothelioma Virtual Tissue Bank.

Mohanty SK, Mistry AT, Amin W, Parwani AV, Pople AK, Schmandt L, Winters SB, Milliken E, Kim P, Whelan NB, Farhat G, Melamed J, Taioli E, Dhir R, Pass HI, Becich MJ. BMC Cancer. 2008 Apr 8;8:91.

BACKGROUND: Recent advances in genomics, proteomics, and the increasing demands for biomarker validation studies have catalyzed changes in the landscape of cancer research, fueling the development of tissue banks for translational research. A result of this transformation is the need for sufficient quantities of clinically annotated and well-characterized biospecimens to support the growing needs of the cancer research community. Clinical annotation allows samples to be better matched to the research question at hand and ensures that experimental results are better understood and can be verified. To facilitate and standardize such annotation in bio-repositories, we have combined three accepted and complementary sets of data standards: the College of American Pathologists (CAP) Cancer Checklists, the protocols recommended by the Association of Directors of Anatomic and Surgical Pathology (ADASP) for pathology data, and the North American Association of Central Cancer Registry (NAACCR) elements for epidemiology, therapy and follow-up data. Combining these approaches creates a set of International Standards Organization (ISO) - compliant Common Data Elements (CDEs) for the mesothelioma tissue banking initiative supported by the National Institute for Occupational Safety and Health (NIOSH) of the Center for Disease Control and Prevention (CDC). METHODS: The purpose of the project is to develop a core set of data elements for annotating mesothelioma specimens, following standards established by the CAP checklist, ADASP cancer protocols, and the NAACCR elements. We have associated these elements with modeling architecture to enhance both syntactic and semantic interoperability. The system has a Java-based multi-tiered architecture based on Unified Modeling Language (UML). RESULTS: Common Data Elements were developed using controlled vocabulary, ontology and semantic modeling methodology. The CDEs for each case are of different types: demographic, epidemiologic data, clinical history, pathology data including block level annotation, and follow-up data including treatment, recurrence and vital status. The end result of such an effort would eventually provide an increased sample set to the researchers, and makes the system interoperable between institutions. CONCLUSION: The CAP, ADASP and the NAACCR elements represent widely established data elements that are utilized in many cancer centers. Herein, we have shown these representations can be combined and formalized to create a core set of annotations for banked mesothelioma specimens. Because these data elements are collected as part of the normal workflow of a medical center, data sets developed on the basis of these elements can be easily implemented and maintained.

PMID: 18397527 [PubMed - indexed for MEDLINE]

Estimating the joint disease outbreak-detection time when an automated biosurveillance system is augmenting traditional clinical case finding.

Shen Y, Adamou C, Dowling JN, Cooper GF. J Biomed Inform. 2008 Apr;41(2):224-31. Epub 2007 Nov 21.

The goals of automated biosurveillance systems are to detect disease outbreaks early, while exhibiting few false positives. Evaluation measures currently exist to estimate the expected detection time of biosurveillance systems. Researchers also have developed models that estimate clinician detection of cases of outbreak diseases, which is a process known as clinical case finding. However, little research has been done on estimating how well biosurveillance systems augment traditional outbreak detection that is carried out by clinicians. In this paper, we introduce a general approach for doing so for non-endemic disease outbreaks, which are characteristic of bioterrorist induced diseases, such as respiratory anthrax. We first layout the basic framework, which makes minimal assumptions, and then we specialize it in several ways. We illustrate the method using a Bayesian outbreak detection algorithm called PANDA, a model of clinician outbreak detection, and simulated cases of a windborne anthrax release. This analysis derives a bound on how well we would expect PANDA to augment clinician detection of an anthrax outbreak. The results support that such analyses are useful in assessing the extent to which computer-based outbreak detection systems are expected to augment traditional clinician outbreak detection.

PMID: 18194876 [PubMed - indexed for MEDLINE]

Development of an instrument for measuring clinicians' power perceptions in the workplace.

Bartos CE, Fridsma DB, Butler BS, Penrod LE, Becich MJ, Crowley RS. J Biomed Inform. 2008 Mar 4.

We report on the development of an instrument to measure clinicians' perceptions of their personal power in the workplace in relation to resistance to computerized physician order entry (CPOE). The instrument is based on French and Raven's six bases of social power and uses a semantic differential methodology. A measurement study was conducted to determine the reliability and validity of the survey. The survey was administered online and distributed via a URL by email to 19 physicians, nurses, and health unit coordinators from a university hospital. Acceptable reliability was achieved by removing or moving some semantic differential word pairs used to represent the six power bases (alpha range from 0.76 to 0.89). The Semantic Differential Power Perception (SDPP) survey validity was tested against an already validated instrument and found to be acceptable (correlation range from 0.51 to 0.81). The SDPP survey instrument was determined to be both reliable and valid.

PMID: 18375189 [PubMed - indexed for MEDLINE]

Prevalence of incidental prostate cancer in the general population: a study of healthy organ donors.

Yin M, Bastacky S, Chandran U, Becich MJ, Dhir R. J Urol. 2008 Mar;179(3):892-5; discussion 895. Epub 2008 Jan 22.

PURPOSE: The incidence of prostate cancer has surged dramatically in recent years due to improved cancer screening and detection mechanisms. There has also been significant interest specifically pertaining to the increased incidence of prostate cancer in younger males, which might be due to increased screening. We analyzed our data set of incidental prostate cancer, derived from a project accruing prostate tissues for research from normal organ donors, who are a predominantly white population. MATERIALS AND METHODS: Information about any prior prostate cancer screening in this cohort was not available. In addition, this population had no history of intervention related to benign or malignant prostate disease. The case cohort consisted of 340 prostates harvested for research from organ donors who died suddenly from August 1994 to April 2007. Stroke, motor vehicle accident, homicidal and suicidal gunshot wound to the head, cardiorespiratory arrest and trauma accounted for more than 90% of the causes of death in donors. RESULTS: Evaluation of serially sectioned prostate tissues revealed adenocarcinoma with or without high grade prostate intraepithelial neoplasia in 12% of cases. High grade prostate intraepithelial neoplasia alone occurred in 10.6% of donors. There was an age dependent increase in high grade prostate intraepithelial neoplasia starting from the 4th decade of life. Prostate adenocarcinoma escalated from the 5th decade and thereafter with a 1 in 3 chance of carrying incidental cancer in the 60 to 69-year-old age group and with 46% of 70 to 81-year-old men harboring prostate cancer. CONCLUSIONS: This study provides insight into the prevalence of prostate adenocarcinoma and high grade prostate intraepithelial neoplasia in the general healthy population. Associated issues, such as the age at which to start screening for prostate cancer and donor transmitted malignancy, were also discussed.

PMID: 18207193 [PubMed - indexed for MEDLINE]

Evaluation of training with an annotation schema for manual annotation of clinical conditions from emergency department reports.

Chapman WW, Dowling JN, Hripcsak G. Int J Med Inform. 2008 Feb;77(2):107-13.

OBJECTIVE: Determine whether agreement among annotators improves after being trained to use an annotation schema that specifies: what types of clinical conditions to annotate, the linguistic form of the annotations, and which modifiers to include. METHODS: Three physicians and 3 lay people individually annotated all clinical conditions in 23 emergency department reports. For annotations made using a Baseline Schema and annotations made after training on a detailed annotation schema, we compared: (1) variability of annotation length and number and (2) annotator agreement, using the F-measure. RESULTS: Physicians showed higher agreement and lower variability after training on the detailed annotation schema than when applying the Baseline Schema. Lay people agreed with physicians almost as well as other physicians did but showed a slower learning curve. CONCLUSION: Training annotators on the annotation schema we developed increased agreement among annotators and should be useful in generating reference standard sets for natural language processing studies. The methodology we used to evaluate the schema could be applied to other types of annotation or classification tasks in biomedical informatics.

PMID: 17317291 [PubMed - indexed for MEDLINE]

Establishing a nationwide emergency department-based syndromic surveillance system for better public health responses in Taiwan.

Wu TS, Shih FY, Yen MY, Wu JS, Lu SW, Chang KC, Hsiung C, Chou JH, Chu YT, Chang H, Chiu CH, Tsui FC, Wagner MM, Su IJ, King CC. BMC Public Health. 2008 Jan 18;8:18.

BACKGROUND: With international concern over emerging infectious diseases (EID) and bioterrorist attacks, public health is being required to have early outbreak detection systems. A disease surveillance team was organized to establish a hospital emergency department-based syndromic surveillance system (ED-SSS) capable of automatically transmitting patient data electronically from the hospitals responsible for emergency care throughout the country to the Centers for Disease Control in Taiwan (Taiwan-CDC) starting March, 2004. This report describes the challenges and steps involved in developing ED-SSS and the timely information it provides to improve in public health decision-making. METHODS: Between June 2003 and March 2004, after comparing various surveillance systems used around the world and consulting with ED physicians, pediatricians and internal medicine physicians involved in infectious disease control, the Syndromic Surveillance Research Team in Taiwan worked with the Real-time Outbreak and Disease Surveillance (RODS) Laboratory at the University of Pittsburgh to create Taiwan's ED-SSS. The system was evaluated by analyzing daily electronic ED data received in real-time from the 189 hospitals participating in this system between April 1, 2004 and March 31, 2005. RESULTS: Taiwan's ED-SSS identified winter and summer spikes in two syndrome groups: influenza-like illnesses and respiratory syndrome illnesses, while total numbers of ED visits were significantly higher on weekends, national holidays and the days of Chinese lunar new year than weekdays (p < 0.001). It also identified increases in the upper, lower, and total gastrointestinal (GI) syndrome groups starting in November 2004 and two clear spikes in enterovirus-like infections coinciding with the two school semesters. Using ED-SSS for surveillance of influenza-like illnesses and enteroviruses-related infections has improved Taiwan's pandemic flu preparedness and disease control capabilities. CONCLUSION: Taiwan's ED-SSS represents the first nationwide real-time syndromic surveillance system ever established in Asia. The experiences reported herein can encourage other countries to develop their own surveillance systems. The system can be adapted to other cultural and language environments for better global surveillance of infectious diseases and international collaboration.

PMID: 18201388 [PubMed - indexed for MEDLINE]

Transmembrane helix prediction using amino acid property features and latent semantic analysis.

Ganapathiraju M, Balakrishnan N, Reddy R, Klein-Seetharaman J. BMC Bioinformatics. 2008;9 Suppl 1:S4.

BACKGROUND: Prediction of transmembrane (TM) helices by statistical methods suffers from lack of sufficient training data. Current best methods use hundreds or even thousands of free parameters in their models which are tuned to fit the little data available for training. Further, they are often restricted to the generally accepted topology "cytoplasmic-transmembrane-extracellular" and cannot adapt to membrane proteins that do not conform to this topology. Recent crystal structures of channel proteins have revealed novel architectures showing that the above topology may not be as universal as previously believed. Thus, there is a need for methods that can better predict TM helices even in novel topologies and families. RESULTS: Here, we describe a new method "TMpro" to predict TM helices with high accuracy. To avoid overfitting to existing topologies, we have collapsed cytoplasmic and extracellular labels to a single state, non-TM. TMpro is a binary classifier which predicts TM or non-TM using multiple amino acid properties (charge, polarity, aromaticity, size and electronic properties) as features. The features are extracted from sequence information by applying the framework used for latent semantic analysis of text documents and are input to neural networks that learn the distinction between TM and non-TM segments. The model uses only 25 free parameters. In benchmark analysis TMpro achieves 95% segment F-score corresponding to 50% reduction in error rate compared to the best methods not requiring an evolutionary profile of a protein to be known. Performance is also improved when applied to more recent and larger high resolution datasets PDBTM and MPtopo. TMpro predictions in membrane proteins with unusual or disputed TM structure (K+ channel, aquaporin and HIV envelope glycoprotein) are discussed. CONCLUSION: TMpro uses very few free parameters in modeling TM segments as opposed to the very large number of free parameters used in state-of-the-art membrane prediction methods, yet achieves very high segment accuracies. This is highly advantageous considering that high resolution transmembrane information is available only for very few proteins. The greatest impact of TMpro is therefore expected in the prediction of TM segments in proteins with novel topologies. Further, the paper introduces a novel method of extracting features from protein sequence, namely that of latent semantic analysis model. The success of this approach in the current context suggests that it can find potential applications in other sequence-based analysis problems. AVAILABILITY: http://linzer.blm.cs.cmu.edu/tmpro/ and http://flan.blm.cs.cmu.edu/tmpro/

PMID: 18315857 [PubMed - indexed for MEDLINE]