Blog

Visualising Treatment Pathways with Sankey Diagrams in R

By: Yayehirad A Melsew

March 2025

Objective

Understanding chemotherapy sequencing is essential in oncology research. By visualising treatment transitions, we can gain insights into patient care pathways, identify trends, and improve decision-making in cancer treatment. Sankey diagrams provide an intuitive way to map these transitions and highlight end transitions such as ongoing treatment (if there is no treatment failure) or death.

Introduction

Tracking and analysing treatment pathways is crucial in oncology. Sankey diagrams allow researchers and clinicians to visualise sequential treatments and better understand how patients progress through different lines of therapy. Unlike traditional flowcharts or timelines, Sankey diagrams emphasize the volume of transitions between stages, making them ideal for complex treatment data. To facilitate this, I have developed an R function, create_sankey_diagram, that enables easy creation of such diagrams. This blog post will walk you through its usage and applications.

Installation and Setup

First, install the necessary R packages:

install.packages(c("ggalluvial", "dplyr", "ggplot2", 
                   "readxl", "RCurl", "RColorBrewer"))
install.packages("devtools")
devtools::install_github("davidsjoberg/ggsankey")

Then, source the function directly from GitHub:

source("https://raw.githubusercontent.com/Yayehirad/Sankey/master/create_sankey_diagram.R")

Note: If you encounter issues installing ggsankey (e.g., GitHub access errors), ensure devtools is installed correctly and check your internet connection. Alternatively, consult the ggsankey GitHub page for manual installation instructions.

Id FirstLine SecondLine ThirdLine FourthLine
P1 ChemoG ChemoD ChemoA ChemoG
P2 ChemoG Ongoing NA NA
P3 ChemoC ChemoD ChemoD Ongoing
P4 ChemoF ChemoF ChemoF Ongoing
P5 ChemoC ChemoG ChemoC ChemoF
P6 ChemoB ChemoB ChemoB ChemoB
P7 ChemoB ChemoB Ongoing NA
P8 ChemoF ChemoF ChemoF ChemoF
P9 ChemoC ChemoD ChemoB ChemoE

Note: In this table, terms like “ChemoG” or “ChemoD” are placeholders representing different chemotherapy regimens, while “Ongoing” indicates the patient remains on that treatment, and “NA” means no further treatment was recorded.

Loading Simulated Treatment Data

We will use a simulated dataset that represents treatment pathways for 100 patients across multiple lines of therapy.

simulated_data <- read.csv("https://raw.githubusercontent.com/Yayehirad/Sankey/master/simulated_data.csv")

Generating Sankey Diagrams

Now, let’s use create_sankey_diagram to visualise treatment transitions.

Example 1: Two Lines of Treatment with Legend

Sankey diagram showing patient transitions from FirstLine to SecondLine treatments, with colored flows labeled by treatment type (e.g., ChemoG to ChemoD) and a legend identifying each treatment, plus numbers indicating patient counts per transition.
plot12_with_legend <- create_sankey_diagram(
  data = simulated_data,              # Dataset containing treatment data
  id_col = "Id",                      # Column identifying unique patients
  lot_cols = c("FirstLine", "SecondLine"), # Columns for treatment lines to plot
  show_legend = TRUE,                 # Include a legend for treatment types
  show_numbers = TRUE                 # Display patient counts on flows
)
print(plot12_with_legend)

Example 2: Three Lines of Treatment Without Legend

Sankey diagram displaying patient transitions across FirstLine, SecondLine, and ThirdLine treatments, with colored flows showing progression (e.g., ChemoC to ChemoD to ChemoB) and numbers indicating patient counts, without a legend.
plot123_no_numbers <- create_sankey_diagram(
  data = simulated_data,              # Dataset containing treatment data
  id_col = "Id",                      # Column identifying unique patients
  lot_cols = c("FirstLine", "SecondLine", "ThirdLine"), # Columns for treatment lines
  show_legend = FALSE,                # Exclude the legend
  show_numbers = TRUE                 # Display patient counts on flows
)
print(plot123_no_numbers)

Example 3: Four Lines of Treatment

Sankey diagram illustrating patient transitions across FirstLine, SecondLine, ThirdLine, and FourthLine treatments, with colored flows showing complex pathways (e.g., ChemoF to ChemoF to ChemoF to Ongoing), without a legend or patient count numbers.
plot1234 <- create_sankey_diagram(
  data = simulated_data,              # Dataset containing treatment data
  id_col = "Id",                      # Column identifying unique patients
  lot_cols = c("FirstLine", "SecondLine", "ThirdLine", "FourthLine"), # Columns for treatment lines
  show_legend = FALSE,                # Exclude the legend
  show_numbers = FALSE                # Hide patient counts on flows
)
print(plot1234)

Conclusion

The create_sankey_diagram function is a powerful tool for visualising complex treatment pathways in cancer research. By incorporating these diagrams into your workflow, you can better understand patient journeys, treatment choices, and outcomes. This can support data-driven decision-making and enhance oncology research.

Explore Health Data Science with Yaye

Welcome to Yaye Melsew’s Website!

I’m Yayehirad (Yaye) Melsew, a clinical data scientist with a passion for transforming health data into meaningful insights. This space serves as my professional portfolio. Here, I share my work, insights, and experiences in health data management. I also cover analysis and visualization.

Through my blog, I’ll explore key topics in clinical data science. These topics range from ensuring health data quality to advanced statistical analyses and visual storytelling. Whether you’re a fellow researcher or a data enthusiast, I hope you find valuable insights here. You also be curious about how health data shapes medical decisions.

Feel free to explore, connect, and engage—I’d love to hear your thoughts!

4o