Plotting Sankey diagram with muiltiple stages but same node labels in R

I would like to plot a sankey diagram to show how observations migrate from one risk level to the other over multiple stages (in this case years). Thus, the risk level labels are the same in each year. X axis should have Years, Y axis should have proportion as illustrated in the picture . Below is the code I attempted. Thanks!

# Sample data frame
library(ggsankeyfier)
library(dplyr)
library(ggplot2)

df <- data.frame(
  ID = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5),
  risk_level = c("High", "High", "High", 
                 "Low", "Low", "Very low", 
                 "Low", "Low", "Low", 
                 "Low", "Moderate", "Low", 
                 "Moderate", "High", "High"),
  Year = c(2022, 2023, 2024, 
           2022, 2023, 2024, 
           2022, 2023, 2024, 
           2022, 2023, 2024, 
           2022, 2023, 2024))

df1 <- df %>% 
  group_by(risk_level, Year) %>%
  summarise(count = n(), .groups = "drop_last") %>% 
  group_by(Year) %>%
  mutate(proportion = count / sum(count)) %>%
  ungroup() 

# Converting the data for the Sankey diagram
df_pivot <-  pivot_stages_longer(df1, stages_from = c("Year",
                                                        "risk_level"),
                                    ## the column that represents the size of the flows:
                                    values_from = "proportion")

#attempting to plot the sankey diagram
ggplot(df_pivot, aes(x = stage, y = proportion, group = node,
           connector = connector, edge_id = edge_id, fill = node)) +
  geom_sankeyedge(v_space = "auto") +
  geom_sankeynode(v_space = "auto")

# Sample data frame
library(ggsankeyfier)
library(dplyr)
library(ggplot2)

df <- data.frame(
  ID = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5),
  risk_level = c("High", "High", "High", 
                 "Low", "Low", "Very low", 
                 "Low", "Low", "Low", 
                 "Low", "Moderate", "Low", 
                 "Moderate", "High", "High"),
  Year = c(2022, 2023, 2024, 
           2022, 2023, 2024, 
           2022, 2023, 2024, 
           2022, 2023, 2024, 
           2022, 2023, 2024))

df1 <- df %>% 
  group_by(risk_level, Year) %>%
  summarise(count = n(), .groups = "drop_last") %>% 
  group_by(Year) %>%
  mutate(proportion = count / sum(count)) %>%
  ungroup() 

# Converting the data for the Sankey diagram
df_pivot <-  pivot_stages_longer(df1, stages_from = c("Year",
                                                        "risk_level"),
                                    ## the column that represents the size of the flows:
                                    values_from = "proportion")

#attempting to plot the sankey diagram
ggplot(df_pivot, aes(x = stage, y = proportion, group = node,
           connector = connector, edge_id = edge_id, fill = node)) +
  geom_sankeyedge(v_space = "auto") +
  geom_sankeynode(v_space = "auto")

Share Improve this question edited yesterday CJ Yetman 8,8482 gold badges29 silver badges61 bronze badges asked yesterday ccc 374 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

The issue is the wrong setup of the data. To achieve your desired result reshape to wide, then compute the counts and the proportion for each unique path of risk levels along the stages in the data. Afterwards use pivot_stages_longer to reshape the data to the long format required by ggsankeyfier :

library(ggsankeyfier)
library(ggplot2)
library(dplyr)
library(tidyr)

df_pivot <- df |>
  mutate(
    risk_level = factor(
      risk_level, c("Very low", "Low", "Moderate", "High")
    )
  ) |> 
  tidyr::pivot_wider(names_from = Year, values_from = risk_level) |> 
  count(across(-ID)) |> 
  mutate(prop = n / sum(n)) |> 
  pivot_stages_longer(
    stages_from = c("2022", "2023", "2024"),
    values_from = c("prop", "n")
  )

# attempting to plot the sankey diagram
ggplot(df_pivot, aes(
  x = stage, y = prop, group = node,
  connector = connector, edge_id = edge_id, fill = node
)) +
  geom_sankeyedge(v_space = "auto") +
  geom_sankeynode(v_space = "auto", order = "as_is")

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

Plotting Sankey diagram with muiltiple stages but same node labels in R - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)