Blog
Visualising Treatment Pathways with Sankey Diagrams in R
By: Yayehirad A Melsew
March 2025
Objective
Understanding chemotherapy sequencing is essential in oncology research. By visualising treatment transitions, we can gain insights into patient care pathways, identify trends, and improve decision-making in cancer treatment. Sankey diagrams provide an intuitive way to map these transitions and highlight end transitions such as ongoing treatment (if there is no treatment failure) or death.
Introduction
Tracking and analysing treatment pathways is crucial in oncology. Sankey diagrams allow researchers and clinicians to visualise sequential treatments and better understand how patients progress through different lines of therapy. Unlike traditional flowcharts or timelines, Sankey diagrams emphasize the volume of transitions between stages, making them ideal for complex treatment data. To facilitate this, I have developed an R function, create_sankey_diagram, that enables easy creation of such diagrams. This blog post will walk you through its usage and applications.
Installation and Setup
First, install the necessary R packages:
install.packages(c("ggalluvial", "dplyr", "ggplot2",
"readxl", "RCurl", "RColorBrewer"))
install.packages("devtools")
devtools::install_github("davidsjoberg/ggsankey")
Then, source the function directly from GitHub:
source("https://raw.githubusercontent.com/Yayehirad/Sankey/master/create_sankey_diagram.R")
Note: If you encounter issues installing ggsankey (e.g., GitHub access errors), ensure devtools is installed correctly and check your internet connection. Alternatively, consult the ggsankey GitHub page for manual installation instructions.
| Id | FirstLine | SecondLine | ThirdLine | FourthLine |
|---|---|---|---|---|
| P1 | ChemoG | ChemoD | ChemoA | ChemoG |
| P2 | ChemoG | Ongoing | NA | NA |
| P3 | ChemoC | ChemoD | ChemoD | Ongoing |
| P4 | ChemoF | ChemoF | ChemoF | Ongoing |
| P5 | ChemoC | ChemoG | ChemoC | ChemoF |
| P6 | ChemoB | ChemoB | ChemoB | ChemoB |
| P7 | ChemoB | ChemoB | Ongoing | NA |
| P8 | ChemoF | ChemoF | ChemoF | ChemoF |
| P9 | ChemoC | ChemoD | ChemoB | ChemoE |
Note: In this table, terms like “ChemoG” or “ChemoD” are placeholders representing different chemotherapy regimens, while “Ongoing” indicates the patient remains on that treatment, and “NA” means no further treatment was recorded.
Loading Simulated Treatment Data
We will use a simulated dataset that represents treatment pathways for 100 patients across multiple lines of therapy.
simulated_data <- read.csv("https://raw.githubusercontent.com/Yayehirad/Sankey/master/simulated_data.csv")
Generating Sankey Diagrams
Now, let’s use create_sankey_diagram to visualise treatment transitions.
Example 1: Two Lines of Treatment with Legend
plot12_with_legend <- create_sankey_diagram(
data = simulated_data, # Dataset containing treatment data
id_col = "Id", # Column identifying unique patients
lot_cols = c("FirstLine", "SecondLine"), # Columns for treatment lines to plot
show_legend = TRUE, # Include a legend for treatment types
show_numbers = TRUE # Display patient counts on flows
)
print(plot12_with_legend)
Example 2: Three Lines of Treatment Without Legend
plot123_no_numbers <- create_sankey_diagram(
data = simulated_data, # Dataset containing treatment data
id_col = "Id", # Column identifying unique patients
lot_cols = c("FirstLine", "SecondLine", "ThirdLine"), # Columns for treatment lines
show_legend = FALSE, # Exclude the legend
show_numbers = TRUE # Display patient counts on flows
)
print(plot123_no_numbers)
Example 3: Four Lines of Treatment
plot1234 <- create_sankey_diagram(
data = simulated_data, # Dataset containing treatment data
id_col = "Id", # Column identifying unique patients
lot_cols = c("FirstLine", "SecondLine", "ThirdLine", "FourthLine"), # Columns for treatment lines
show_legend = FALSE, # Exclude the legend
show_numbers = FALSE # Hide patient counts on flows
)
print(plot1234)
Conclusion
The create_sankey_diagram function is a powerful tool for visualising complex treatment pathways in cancer research. By incorporating these diagrams into your workflow, you can better understand patient journeys, treatment choices, and outcomes. This can support data-driven decision-making and enhance oncology research.
Explore Health Data Science with Yaye
Welcome to Yaye Melsew’s Website!
I’m Yayehirad (Yaye) Melsew, a clinical data scientist with a passion for transforming health data into meaningful insights. This space serves as my professional portfolio. Here, I share my work, insights, and experiences in health data management. I also cover analysis and visualization.
Through my blog, I’ll explore key topics in clinical data science. These topics range from ensuring health data quality to advanced statistical analyses and visual storytelling. Whether you’re a fellow researcher or a data enthusiast, I hope you find valuable insights here. You also be curious about how health data shapes medical decisions.
Feel free to explore, connect, and engage—I’d love to hear your thoughts!
4o