PhenotypeR
Overview
PhenotypeR is a comprehensive diagnostic system for assessing the research-readiness of study cohorts defined within OMOP Common Data Model (CDM) databases. The system performs multi-level analysis across database, codelist, cohort, and population dimensions to validate phenotype definitions and ensure cohorts meet quality standards for observational health research.
Key Features:
- Multi-level Assessment: Database, codelist, cohort, and population-level diagnostics.
- OMOP CDM Integration: Native integration with OMOP standardized vocabularies and data structures.
- AI-Powered Expectations: LLM-generated expected values for comparison with actual results.
- Interactive Visualization: Web-based Shiny application for exploration and reporting.
- Scalable Processing: Sampling strategies and preprocessing for large healthcare datasets.
- Research-Ready Output: Standardized
summarised_resultformat compatible with OHDSI tools.
Installation
To install PhenotypeR, you can use the following command in R:
# Install from GitHub
remotes::install_github("OHDSI/PhenotypeR")
Getting Started
Here is a simple example of how to get started with PhenotypeR:
library([PhenotypeR](https://ohdsi.github.io/PhenotypeR/))
library([CDMConnector](https://darwin-eu.github.io/CDMConnector/))
library([CohortConstructor](https://ohdsi.github.io/CohortConstructor/))
library(dplyr)
# 1. Connect to a mock CDM
cdm <- [mockPhenotypeR](https://ohdsi.github.io/PhenotypeR/)()
# 2. Create a cohort
cdm$my_cohort <- conceptCohort(
cdm = cdm,
conceptSet = list("acetaminophen" = 1125315),
name = "my_cohort"
)
# 3. Run diagnostics
result <- phenotypeDiagnostics(cohort = cdm$my_cohort)
# 4. View results in an interactive Shiny app
shinyDiagnostics(result)
Core Concepts
PhenotypeR’s architecture is modular, with four primary subsystems working together.
graph TB
subgraph "OMOP_CDM_Database"
CDM_Tables["CDM Tables<br/>person, observation_period<br/>condition_occurrence, drug_exposure<br/>measurement, procedure_occurrence"]
Achilles_Results["Achilles Results<br/>Pre-computed statistics"]
end
subgraph "Core_Diagnostic_Engine"
phenotypeDiagnostics["phenotypeDiagnostics()<br/>Main orchestrator function"]
databaseDiagnostics["databaseDiagnostics()<br/>Database-level analysis"]
codelistDiagnostics["codelistDiagnostics()<br/>Concept usage analysis"]
cohortDiagnostics["cohortDiagnostics()<br/>Cohort validation"]
populationDiagnostics["populationDiagnostics()<br/>Incidence/prevalence"]
end
subgraph "AI_Expectations_System"
getCohortExpectations["getCohortExpectations()<br/>LLM-powered analysis"]
tableCohortExpectations["tableCohortExpectations()<br/>Expectation formatting"]
ellmer_integration["ellmer package<br/>LLM interface"]
end
subgraph "Interactive_Visualization"
shinyDiagnostics["shinyDiagnostics()<br/>App launcher"]
shiny_ui["ui.R<br/>User interface"]
shiny_server["server.R<br/>Business logic"]
preprocess["preprocess.R<br/>Data preparation"]
end
subgraph "Output_Formats"
summarised_result["summarised_result<br/>Standardized results"]
interactive_app["Interactive Shiny App"]
static_reports["Static Reports"]
end
CDM_Tables --> phenotypeDiagnostics
Achilles_Results --> phenotypeDiagnostics
phenotypeDiagnostics --> databaseDiagnostics
phenotypeDiagnostics --> codelistDiagnostics
phenotypeDiagnostics --> cohortDiagnostics
phenotypeDiagnostics --> populationDiagnostics
ellmer_integration --> getCohortExpectations
getCohortExpectations --> tableCohortExpectations
databaseDiagnostics --> summarised_result
codelistDiagnostics --> summarised_result
cohortDiagnostics --> summarised_result
populationDiagnostics --> summarised_result
summarised_result --> shinyDiagnostics
tableCohortExpectations --> shinyDiagnostics
shinyDiagnostics --> shiny_ui
shinyDiagnostics --> shiny_server
preprocess --> shiny_ui
shiny_server --> interactive_app
summarised_result --> static_reports
API Reference / Advanced Usage
phenotypeDiagnostics()
This is the main orchestrator function.
Parameters:
cohort: A cohort table in the CDM.diagnostics: A character vector of diagnostics to run.survival: A boolean to indicate if survival analysis should be run.cohortSample: The number of people to sample from the cohort.matchedSample: The number of people to sample for the matched cohort.populationSample: The number of people to sample for the population.populationDateRange: The date range for the population.
getCohortExpectations()
This function generates AI-powered expectations for a cohort.
Parameters:
chat: Anellmerchat object.phenotypes: A character vector of phenotype names or asummarised_resultobject.
shinyDiagnostics()
This function launches the interactive Shiny application.
Parameters:
result: Asummarised_resultobject.directory: The directory to create the Shiny app in.minCellCount: The minimum cell count to display.open: A boolean to indicate if the app should be opened automatically.expectations: An optional data frame of expectations.
Examples
For more detailed examples and tutorials, please see the package vignettes.