Current and Past Projects

Current Research

The National Center for Data to Health (CD2H) and NCATS are leading the creation of a centralized, secure portal for hosting row-level COVID-19 clinical data—called the National COVID Cohort Collaborative (N3C). This initiative is a partnership among several HHS agencies, the CTSA program, and the distributed clinical data networks PCORnet, OHDSI, ACT/i2b2, and TriNetX. The N3C will accept data via multiple data models and transform them into a common analytic model. The cloud-based collaborative portal will allow for the development of machine learning and other informatics tools that require a large row-level dataset, and will be overseen by a data access committee. We believe this portal will provide additional assets needed to rapidly develop the analytics that clinical centers and physicians need now. Contact: Melissa Haendel.

Funding: National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306.

Model systems are the cornerstone of biomedical research to investigate biological processes, test gene-based disease hypotheses, and develop and test disease treatments. The vast knowledge that we have about model systems can be better utilized if semantically aggregated and made queryable based on any number of facets, such as phenotypic similarity, network analysis, gene expression and function, and genomics. The Monarch Initiative aims to provide easy-to-use tools to navigate this data landscape, services for other resources, and educational outreach regarding the production of structured data for biomedical discovery. Monarch is a collaboration between members at Oregon Health & Science University, Lawrence Berkeley National Laboratory, The Jackson Laboratory For Genomic Medicine, RTI International, Genomics England/Queen Mary, Charité - Universitätsmedizin Berlin, EBI, and Garvan Institute of Medical Research. Contact: Julie McMurry.

Funding: Monarch is funded by NIH grant # 1R24OD011883-01.

Center for Cancer Data Harmonization

The Center for Cancer Data Harmonization (CCDH) aims to facilitate retrospective and prospective semantic harmonization of data across the various nodes of NCI’s Cancer Research Data Commons (CRDC). Contact: Nicole Vasilevsky.

Funding: CCDH is funded by NCI / Leidos contract # HHSN261201500003I.

The Kids First Data Resource Portal will bring together heterogeneous data from childhood cancers and structual abnormalities to support research, study and collaboration built on top of an unprecedented collection of genetic and phenotypic data from pediatric patients. Contact: Nicole Vasilevsky.

Funding: Kids First is funded by NIH grant # 5U2CHL138346.

Gabriella Miller Kids First Pediatric Data Resource Center INCLUDE supplement

This supplement in collaboration with the INCLUDE project will augment the Kids First Data Resource Center’s ability to integrate phenotype data from survey and performance-based instruments to those that assess DS-related clinical data and neurobehavior. Contact: Nicole Vasilevsky.

Funding: The Kids First supplement is funded by NIH grant # 3200670520-03S1.

The CD2H supports a vibrant and evolving collaborative informatics ecosystem for the CTSA Program and beyond. The CD2H harnesses and expands an ecosystem for translational scientists to discover and share their software, data, and other research resources within the CTSA Program network. The CD2H also creates a social coding environment for translational science institutions, leveraging the community-driven DREAM challenges as a mechanism to stimulate innovation. Collaborative innovation also serves as a strong foundation to support mechanisms to facilitate training, engagement, scholarly dissemination, and impact in translational science. Contact: Julie McMurry.

Funding: CD2H is supported by the National Center for Advancing Translational Sciences (NCATS) at the National Institutes of Health (Grant U24TR002306).

The Translator program was launched by NCATS. It is a multiyear, iterative effort will culminate in the development of a relational, N-dimensional Biomedical Data Translator that integrates multiple types of existing data sources, including objective signs and symptoms of disease, drug effects, and intervening types of biological data relevant to understanding pathophysiology. Contact: Julie McMurry.

Funding: This project is funded by NCATS grant # 3 OT3 TR002019 01S2.

Forums for Integrative Phenomics

The goal is to bring together a diverse range of research scientists and clinicians to harmonize these data towards a better understanding of relationships between human disease conditions and their underlying genetic basis.Contact: Julie McMurry.

Funding: This project is funded by NIH grant # 1 U13 CA221044-01.

Converging genomics, phenomics, and environments using interpretable machine learning models

Predicting phenotype from genotype and environment is the holy grail of predicting the effects of climate change on public health, conservation, and agriculture. This research will cause an order of magnitude change in how fast we can discover gene/phenotype relationships leading to the initiation of a new computational discipline that will augment our current taxon-centric view of ecosystems with a more gene-phenotype-centric view. And this project will pass a tipping point in our understanding of the organism-environment relationship and facilitate the creation of a new discipline. Contact: Anne Thessen.

Funding: This project is funded by the Office of Advanced Cyberinfrastructure (OAC), NSF Award #1940062.

Flatten the Curve

We can all slow the spread of COVID-19 and “Flatten the Curve” together. Flatten the Curve provides reliable information reviewed by scientists delivered in as many languages as possible to help. Contact: Julie McMurry.

Funding: Funded by NIH U24TR002306.

Past projects

We are part of the Metabolomics Core of the Undiagnosed Diseases Network (UDN). Our goals are to integrate metabolomics, lipidomics, glycomics, and genomics data with patient clinical phenotypes to provide mechanistic insight and aid diagnoses of rare and undiagnosed diseases. We are particularly involved in the integration of metabolites using existing pathway tools, reaction databases, and the integrated corpus of genotype-phenotype data within the Monarch platform for biological interpretation of disease etiology and biomarker signatures. We also started representing changes in glycomics signatures of patients with genetic diseases and undiagnosed diseases with the Molecular Glyco-Phenotype Ontology (MGPO) so as to enhance Human Phenotype Ontology and model data corpus to best leverage these phenotypic changes in the Exomiser tool. Contact: Julie McMurry.

Phenotypr is a free educational tool for people who believe they may have a disorder and want to learn more about their condition. This tool aims to provide additional information about what you are experiencing. We recommend you discuss this information with your healthcare provider to assist in your diagnosis. Contact: Julie McMurry.

Funding: This project was funded by PCORI grant # 1R24OD011883-01.

This project aims to develop an intelligent concept assistant that will allow researchers to generate and share sets of metadata elements relevant to their project, and will use machine learning techniques to automatically apply this to data. Contact: Melissa Haendel.

Funding: This project was funded by NHGRI grant # 5 U01 HG009453 02.

Researchers at Harvard University, Oregon Health & Science University and The Ohio State University CTSA Program hubs are developing educational resources, tools and technologies and make them available online to trainees, investigators and other members of the translational scientific team. Contact: Marijane White.

National Cancer Institute Theasaurus (NCIt)

The NCI thesaurus (NCIt) is a widely used cancer reference taxonomy that covers over 100,000 terms, developed by the National Cancer Institute (NCI) as a standalone ontology since 2003. The NCI partnered with members of the Monarch Initiative to enhance the ontology for interoperability with OBO ontologies. Contact: Nicole Vasilevsky.

Funding: This project was funded by Leidos contract #15X143

OpenRIF, the Open Research Information Framework, is an open source community devoted to representing expertise ecosystems - all the things we do and all the things we contribute. The community works on developing and promoting interoperable and extensible semantic infrastructure, such as the VIVO Integrated Semantic Framework (VIVO-ISF), an ontology for representing people, works, and the relationships between them; federated databases modeled on PARDI, the Portfolio Analysis and Reporting Data Infrastructure, for research impact and evaluation;and eagle-i, which aims to make research resources discoverable via a semantic search interface and represents their relationships to scholarly activities. Contact: Robin Champieux.

The Colorado Richly Annotated Full-Text (CRAFT) Corpus is a collection of 97 full-length, open-access journal articles from the biomedical literature that are manually annotated, for use as gold-standard resources for the training and testing of biomedical Natural Language Processing (NLP) systems. Within these articles, each mention of the concepts explicitly represented in eight prominent Open Biomedical Ontologies (OBOs) has been annotated, resulting in gold-standard markup of genes and gene products, chemicals and molecular entities, biomacromolecular sequence features, cells and cellular and extracellular components and locations, organisms, biological processes and molecular functionalities. With these ~100,000 concept annotations among the ~800,000 words in the 67 articles of the 1.0 release, it is one of the largest gold-standard biomedical semantically annotated corpora. In addition to this substantial conceptual markup, the corpus is fully annotated along a number of syntactic and other axes, notably by sentence segmentation, tokenization, part-of-speech tagging, syntactic parsing, text formatting, and document sectioning. Current efforts are underway to add new annotations. Contact: Nicole Vasilevsky.

Funding: This project was funded by NIH grants 5R01LM008111 and 5R01LM009254, and DARPA-BAA-14-14.

Web Taxology project

The Web Taxology project is a collaboration between the OHSU Library, Digital Strategy, and the Marketing team to create a data model of all the people, places, and things at OHSU, which is the first step towards improving OHSU's local search results in third-party search engines like Google. The initial goal of the project is to improve patient experience when finding their way to and around OHSU's campuses and clinic locations, with future goals to be used in other contexts and projects throughout the institution as a whole. Contact: Marijane White.

Funding: This project was supported by the OHSU Library.

Open Insight

Open Insight is an education and outreach project designed to stimulate early career researchers' engagement with open science practices through hands-on learning and conversations with leaders in the field. The Open Insight team brings together doctoral students, scientists, and OHSU Library staff with expertise in scholarly communications and data management to explore and promote the practice of open science activities and workflows. Contact: Robin Champieux.


The Clinical and Translational Activity Reporting (CTAR) tool was a collaboration Oregon Clinical and Translational Research Institute and the OHSU Library's Ontology Development Group to prototype tool that would collocate and analyze data about research activities across a disparate set of internal and external databases (e.g. IRB, grants and contracts, PubMed). Leveraging MeSH, other terminologies, and simple Natural Language Processing (NLP) techniques, the CTAR prototype identified research activity topics and trends, and their classification as clinical or translational. The tool was intended to increase the OHSU's and the Oregon Clinical and Translational Research Institute's ability to strategically contribute to research outcomes and human health. Contact: Melissa Haendel.

The CTSAconnect project aimed to integrate information about research activities, clinical activities, and scientific resources by creating an Integrated Semantic Framework (ontology). This new framework facilitated the production and consumption of Linked Open Data (a Semantic Web method of sharing data) about investigators, physicians, biomedical research resources, services, and clinical activities. The goal was to enable software to consume data from multiple sources and allow the broadest possible representation of researchers' and clinicians' activities and research products. Current research tracking and networking systems rely largely on publications, but clinical encounters, reagents, techniques, specimens, model organisms, etc., are equally valuable for representing expertise. CTSAconnect was a collaboration between members at OHSU, Stony Brook University, Cornell University, Harvard University, University at Buffalo, and the University of Florida, and leveraged the work of eagle-i, VIVO, and ShareCenter. Contact: Nicole Vasilevsky.

Funding: CTSAconnect was funded by Booz Allen Hamilton grant #CTSA 10-001: 100928SB23.

eagle-i is a free application that makes it easy to discover biomedical research resources at a growing network of universities; more than 50,000 resources are listed and more are added every week. Resource types include model organisms, reagents, core laboratory services, instrumentation, and biospecimens. Contact: Julie McMurry.

Funding: eagle-i was funded by Booz Allen Hamilton (Grant # 90177520).

The Resource Identification Initiative (#RII) was designed to help researchers sufficiently cite the key resources used to produce the scientific findings reported in the biomedical literature. The project aimed to enable resource identification within the biomedical literature through a pilot study promoting the use of unique Research Resource Identifiers (RRIDs). In addition to being unique, RRID’s meet three key criteria, they are: 1) Machine readable; 2) Free to generate and access; 3) Consistent across publishers and journals. A diverse group of collaborators led the project, including the Neuroscience Information Framework and the OHSU Library. Contact: Nicole Vasilevsky.

Funding: The Resource Identification Initiative was supported by the NIH and the INCF.

Biospecimen Query

This project explored options for enhancing search capabilities for an existing biospecimen search application. Text processing tools were used to map anatomy, pathology, and disease concepts from existing terminologies and ontologies to pathology reports that are currently represented in an unstructured natural text form. The concepts identified in the text were also organized in a relational structure to enable taxonomic and parthood based searches. This was a small exploratory project with a goal of integrating these capabilities in an ongoing effort to expand and integrate OHSU's biospecimen databases. Contact: Melissa Haendel.

Funding: This work was funded by OHSU's Medical Research Foundation.