ICU Data Science Lab

Matthew Churpek, MD, MPH, PhD, Majid Afshar, MD, MSCR, and Anoop Mayampurath, PhD, are pulmonary and critical care physicians and clinical informaticians.

The ICU Data Science Lab is an integrated data science laboratory, with a research focus on using electronic health record data combined with epidemiology, biostatistics and machine learning (ML) methods to improve the care of hospitalized patients.

X icon

Machine Learning at the Bedside

Up to 5% of hospitalized patients on the medical-surgical wards develop clinical deterioration requiring intensive care. To prevent clinical deterioration and keep patients out of intensive care, the Churpek-Afshar Lab uses machine learning techniques, such as natural language processing (NLP) and deep learning, to identify patients at risk for sepsis, acute kidney injury (AKI), acute respiratory distress syndrome (ARDS) and other syndromes of critical illness, as well as substance misuse.

The long-term goal of each of our initiatives is to develop and implement clinically useful algorithms and decision support tools to assist in the delivery of early, personalized care to decrease preventable death.


Research Staff

Data Engineer

Data Engineer

Research Associate

Sheriff Mohammed Issaka

Data Scientist

Data Scientist

Research Program Coordinator

Data Scientist

Programmer Analyst

Clinical Research Coordinator


View all collaborators
ICU Data Science Lab members talk around a conference room table
Join Us!

To learn more about medical fellow or resident positions, please contact Madeline Oguss.

Active Projects

Sepsis Early Predication and Subphenotype Illumination Study (SEPSIS)

This project uses data from detailed multicenter electronic health record (EHR), clinical trial, and biomarker data combined with machine learning approaches to improve the identification, risk stratification, and discover important subphenotypes of sepsis to decrease preventable death from infection. In the future, our models will be implemented for earlier identification of sepsis, accurate risk stratification, and to deliver personalized care at the bedside. SEPSIS is funded by the National Institutes of Health. Dr. Churpek is the PI.

The goal of the next five years is to build upon our successful research on sepsis and address key gaps in the field through three future directions:

  1. Using natural language processing and deep learning to improve the identification and risk stratification of infected patients
  2. Identifying important subphenotypes using research biomarkers
  3. Using machine learning to develop personalized treatment algorithms


  • NIH/NIGMS, R01GM123193 (PI: Matthew Churpek) (04/15/2017 - 03/31/2022)
  • NIH/NIGMS, R35GM145330 (PI: Matthew Churpek) (05/01/2022 –02/28/2027)

Related Articles

Using Machine Learning to Identify, Risk Stratify, and Guide Personalized Treatment of COVID-19 Patients

To provide life-saving interventions for patients with COVID-19, this project aims to improve the early identification and risk stratification by developing novel machine learning models to identify, risk stratify and provide personalized treatment recommendations. 

  • Aim 1: Develop novel machine learning models to identify, risk stratify, and provide personalized treatment recommendations for patients with COVID-19.
  • Aim 2: Develop a CDS tool graphical user interface (GUI) and test it in a simulation study.

Funding: DOD USAMRAA W81XWH-21-1-0009, PRMRP-Technology/Therapeutic Development Award-COVID:, (PI: Matthew Churpek) (01/31/2021 – 12/31/2024)

Related Articles

Developing a Clinical Decision Support Tool for the Identification, Diagnosis, and Treatment of Critical Illness in Hospitalized Patients

Clinical deterioration is characterized as the physiological decompensation that occurs when a patient experiences worsening conditions or acute onset of a serious physiological disturbance. Our goal is to develop and implement a clinical decision support tool for the identification, diagnosis, and treatment of patients at high risk of deterioration to decrease preventable death. 

  • Aim 1: Develop machine learning models to identify patients at high risk of deterioration using both structured data and unstructured clinical notes.
  • Aim 2: Develop models to predict the diagnosis that is causing the deterioration event and the potentially life-saving treatments that should be provided to high-risk patients.
  • Aim 3: Develop a clinical decision support tool with a graphical user interface incorporating the models from Aims 1 and 2 via user-centered design principles and then test its effectiveness, efficiency, and user satisfaction in a case-based simulation study.

Funding: NIH/NHLBI, 1R01HL157262 (PI: Matthew Churpek) (08/01/2021 – 07/31/2025)

Related Articles

Using Machine Learning for Early Recognition and Personalized Treatment of Acute Kidney Injury 

Up to 20% of hospitalized patients develop acute kidney injury, which is associated with an increased risk of readmission, morbidity, and mortality. This project will use advanced machine learning methods and biomarkers to improve the identification and treatment of patients at risk of acute kidney injury, and will result in novel tools and personalized treatment algorithms that can be implemented to improve patient outcomes. 

  • Aim 1: To develop and validate a novel AKI risk stratification tool using NLP and deep learning.
  • Aim 2: To identify the gaps in care for patients at high risk of severe AKI and their association with patient outcomes
  • Aim 3: To determine the additive value of renal biomarkers to EHR-based machine learning algorithms for detecting patients at high risk for severe AKI.

Funding: NIH/ NIDDK, 1R01DK126933-01A1 (MPIs: Jay Koyner and Matthew Churpek) (08/01/2021 –07/31/2026)

Related Articles

Data-Driven Strategies for Substance Misuse Identification in Hospitalized Patients (SMART-AI)

Substance misuse screening in hospitalized patients is inconsistent and not standardized. The goal of this project is to provide novel and critically important tools in artificial intelligence for the detection of substance misuse from the electronic health record (EHR), which would enable daily substance misuse screenings. 

  • Aim 1: Using clinical notes from the EHR, we will train and test a natural language processing (NLP) substance misuse classifier in a cohort of adult hospitalized patients.
  • Aim 2: Externally validate our NLP substance misuse classifier from Aim 1 in an independent cohort of hospitalized patients at a separate health system (UW) without current screening for substance misuse.
  • Aim 3: Evaluate the effectiveness of the substance misuse classifier in increasing the proportion of patients who receive an intervention compared to usual care (Rush), which involves interviewer administered screening.

Funding: NIH/NIDA, 1R01DA051464 (PI: Majid Afshar) (09/30/2020 – 07/31/2025)

Related Articles

Using Machine Learning to Predict Clinical Deterioration in Hospitalized Children

Children who are admitted to the hospital and experience deterioration have a high risk of mortality and poor long-term health. Current warning early scores indicating risk of deterioration are subjectively derived and have not reduced in-hospital mortality. The long-term goal is to implement a validated risk-prediction algorithm in hospitals for better detection of clinical deterioration in admitted children. This holds the promise of improving their survival and preventing long-term complications.

  • Aim 1: Develop and validate models to improve the prediction of clinical deterioration in hospitalized children using structured EHR data.
  • Aim 2: Develop and validate models predicting clinical deterioration in hospitalized children using unstructured pediatric clinical notes.
  • Aim 3: Determine the impact of hospital-level environmental factors for predicting clinical deterioration in hospitalized children.

Related Articles

Building a Substance Use Data Commons for Public Health Informatics 

We aim to foster an academic-public-private collaboration to build a data ecosystem that will harmonize data across a Wisconsin regional hospital, pre-hospital agencies like fire, and public health agencies for the first time. We will build a cohort with substance misuse with linked data that are engineered as an AI/ML-ready data commons. During our one-year timeline, we will train and test an AI/ML model that can prioritize those at the highest risk for poor outcomes and uncover important biases in our data sources with input by health equity experts.

  • Aim 1:  Build a Substance Misuse Data Commons across a major hospital system and Wisconsin agencies;
  • Aim 2: Develop and validate a machine learning tool for substance use-related health outcomes;
  • Aim 3: Examine model performance across health disparate groups (race/ethnic groups as well as neighborhoods).
Developing and Evaluating Multi-Modal Clinical Diagnostic Reasoning Models for Automated Diagnosis Generation

Busy hospital settings and information overload from the electronic health record (EHR) contribute to decisional shortcuts and biases by clinicians that lead to missed opportunities for timely and accurate diagnoses. We will develop novel clinical natural language processing (NLP) models that learn and integrate multi-modal EHR data, conduct reasoning over a large-scale medical knowledge base, and test them in a clinical decision support system to mitigate diagnostic error. Completion of the aims will inform the development of NLP- driven clinical decision support tools that generate accurate diagnoses to overcome provider heuristics and improve patient outcomes.

  • Aim 1: Develop a multi-modal generative model that reads in both structured and unstructured EHR data to output diagnoses using a two-stage training process 
  • Aim 2: Construct a knowledge base using a neural symbolic approach from medical concepts and relations sourced from the National Library of Medicine's Unified Medical Language System (UMLS). The knowledge base will be part of the model to generate diagnoses given the information from a daily care note collected in the EHR 
  • Aim 3: Design and pilot a clinical diagnostic decision support system using human-centered design principles. The best models from Aims 1 and 2 will be evaluated for diagnostic accuracy by clinicians in the system using previously validated instruments for patient safety and diagnostic error. 

Funding: NIH/NLM 1K99LM014308-01 (PI: Yanjun Gao) (9/1/2023–8/31/2024)

Publications and Software

Select Publications

View Dr. Matthew Churpek's publications on NCBI My Bibliography

View Dr. Majid Afshar's publications on NCBI My Bibliography

View Dr. Anoop Mayampurath’s publications on NCBI My Bibliography

GitHub Repositories
A female clinician in blue scrubs touches the shoulder of a male colleauge in a white coat while walking in the hospital hallway

Help Us Transform Medicine

You can help support research by making a gift to the Department of Medicine's Pulmonary Research and Education Fund.