BLULab NLP Repository
University of Pittsburgh NLP Repository
The University of Pittsburgh NLP Repository is a repository of de-identified clinical reports available to the community for NLP research purposes. The Repository contains all reports of the following types generated from University of Pittsburgh Medical Center (UPMC) hospitals during 2008:
You may request a specific number of randomly selected reports or may specify reports filtered by ICD-9 discharge diagnoses. Reports will be provided in xml format with one report per file.
Obtaining Access to Reports
You may access reports for NLP research projects that have been approved by your IRB. To apply for access, you will need to submit an Online Application Form and to upload the following supplementary material:
The approval process takes 1-2 weeks. We will notify you by email and, if approved, will direct you to an ftp site to download requested reports. You must submit one application per individual project.
Annotations on NLP Repository
Anyone performing annotations on reports from the NLP Repository is required to deposit their annotations back into the Repository. For every applicant who has received reports from the Repository, we list below (a) the types of reports requested by the applicant and (b) the types of annotations the applicant plans to perform on the reports. Once the annotations are received back, new applicants with IRB approval can apply to access the reports with annotations.
Annotations Being Performed by Applicants:
Contact Us
For questions, email Wendy Chapman at nlp-repository@list.pitt.edu