最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Plotting Sankey diagram with muiltiple stages but same node labels in R - Stack Overflow

programmeradmin5浏览0评论

I would like to plot a sankey diagram to show how observations migrate from one risk level to the other over multiple stages (in this case years). Thus, the risk level labels are the same in each year. X axis should have Years, Y axis should have proportion as illustrated in the picture . Below is the code I attempted. Thanks!

# Sample data frame
library(ggsankeyfier)
library(dplyr)
library(ggplot2)

df <- data.frame(
  ID = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5),
  risk_level = c("High", "High", "High", 
                 "Low", "Low", "Very low", 
                 "Low", "Low", "Low", 
                 "Low", "Moderate", "Low", 
                 "Moderate", "High", "High"),
  Year = c(2022, 2023, 2024, 
           2022, 2023, 2024, 
           2022, 2023, 2024, 
           2022, 2023, 2024, 
           2022, 2023, 2024))

df1 <- df %>% 
  group_by(risk_level, Year) %>%
  summarise(count = n(), .groups = "drop_last") %>% 
  group_by(Year) %>%
  mutate(proportion = count / sum(count)) %>%
  ungroup() 

# Converting the data for the Sankey diagram
df_pivot <-  pivot_stages_longer(df1, stages_from = c("Year",
                                                        "risk_level"),
                                    ## the column that represents the size of the flows:
                                    values_from = "proportion")

#attempting to plot the sankey diagram
ggplot(df_pivot, aes(x = stage, y = proportion, group = node,
           connector = connector, edge_id = edge_id, fill = node)) +
  geom_sankeyedge(v_space = "auto") +
  geom_sankeynode(v_space = "auto")

I would like to plot a sankey diagram to show how observations migrate from one risk level to the other over multiple stages (in this case years). Thus, the risk level labels are the same in each year. X axis should have Years, Y axis should have proportion as illustrated in the picture . Below is the code I attempted. Thanks!

# Sample data frame
library(ggsankeyfier)
library(dplyr)
library(ggplot2)

df <- data.frame(
  ID = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5),
  risk_level = c("High", "High", "High", 
                 "Low", "Low", "Very low", 
                 "Low", "Low", "Low", 
                 "Low", "Moderate", "Low", 
                 "Moderate", "High", "High"),
  Year = c(2022, 2023, 2024, 
           2022, 2023, 2024, 
           2022, 2023, 2024, 
           2022, 2023, 2024, 
           2022, 2023, 2024))

df1 <- df %>% 
  group_by(risk_level, Year) %>%
  summarise(count = n(), .groups = "drop_last") %>% 
  group_by(Year) %>%
  mutate(proportion = count / sum(count)) %>%
  ungroup() 

# Converting the data for the Sankey diagram
df_pivot <-  pivot_stages_longer(df1, stages_from = c("Year",
                                                        "risk_level"),
                                    ## the column that represents the size of the flows:
                                    values_from = "proportion")

#attempting to plot the sankey diagram
ggplot(df_pivot, aes(x = stage, y = proportion, group = node,
           connector = connector, edge_id = edge_id, fill = node)) +
  geom_sankeyedge(v_space = "auto") +
  geom_sankeynode(v_space = "auto")
Share Improve this question edited yesterday CJ Yetman 8,8482 gold badges29 silver badges61 bronze badges asked yesterday cccccc 374 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

The issue is the wrong setup of the data. To achieve your desired result reshape to wide, then compute the counts and the proportion for each unique path of risk levels along the stages in the data. Afterwards use pivot_stages_longer to reshape the data to the long format required by ggsankeyfier :

library(ggsankeyfier)
library(ggplot2)
library(dplyr)
library(tidyr)

df_pivot <- df |>
  mutate(
    risk_level = factor(
      risk_level, c("Very low", "Low", "Moderate", "High")
    )
  ) |> 
  tidyr::pivot_wider(names_from = Year, values_from = risk_level) |> 
  count(across(-ID)) |> 
  mutate(prop = n / sum(n)) |> 
  pivot_stages_longer(
    stages_from = c("2022", "2023", "2024"),
    values_from = c("prop", "n")
  )

# attempting to plot the sankey diagram
ggplot(df_pivot, aes(
  x = stage, y = prop, group = node,
  connector = connector, edge_id = edge_id, fill = node
)) +
  geom_sankeyedge(v_space = "auto") +
  geom_sankeynode(v_space = "auto", order = "as_is")

发布评论

评论列表(0)

  1. 暂无评论