最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

r - How can I layer an outlined bar graph on top of a colored bar graph in ggplot? - Stack Overflow

programmeradmin3浏览0评论

I have data that looks like this:

expected_data

resp_migration_status kmcluster percentage expected
1 Non-migrant 1 21.9 30.5
2 Non-migrant 2 30.1 27.4
3 Non-migrant 3 24.7 19.9
4 Non-migrant 4 23.3 22.3
5 Migrant 1 41.9 30.5
6 Migrant 2 22.6 27.4
7 Migrant 3 19.4 19.9
8 Migrant 4 16.1 22.3
9 Displaced 1 36.9 30.5
10 Displaced 2 26.2 27.4
11 Displaced 3 11.9 19.9
12 Displaced 4 25 22.3

I have data that looks like this:

expected_data

resp_migration_status kmcluster percentage expected
1 Non-migrant 1 21.9 30.5
2 Non-migrant 2 30.1 27.4
3 Non-migrant 3 24.7 19.9
4 Non-migrant 4 23.3 22.3
5 Migrant 1 41.9 30.5
6 Migrant 2 22.6 27.4
7 Migrant 3 19.4 19.9
8 Migrant 4 16.1 22.3
9 Displaced 1 36.9 30.5
10 Displaced 2 26.2 27.4
11 Displaced 3 11.9 19.9
12 Displaced 4 25 22.3

I'd like to construct a bar graph which shows percentage by kmcluster and over resp_migration_status. I've done this successfully using this code:

ggplot(expected_data, aes(x = resp_migration_status, y = percentage, fill = kmcluster)) +
  geom_bar(stat = "identity", position = "dodge") +  # Use stat = "identity" for pre-computed values
  labs(
    title = "Percentage distribution of network cluster by migration status",
    x = "Migration Status",
    y = "Percentage",
    fill = "Cluster"
  ) +
  scale_y_continuous(labels = scales::percent_format(scale = 1)) +  # Format y-axis as percentages
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

Overlayed on this bar graph, I'd like to do another graph with black outlines for the bars, which shows the expected percentage by kmcluster and over resp_migration_status. Essentially, it's a graphical representation of a chi-square test: understanding what the distribution of cluster would be by migration type if it was perfectly random, compared to the 'actual' distribution where some migration types are disproportionately in one cluster.

How do I overlay a very basic (black outlined) bar graph on the original graph to represent this? I have this code:

ggplot(expected_data, aes(x = resp_migration_status, y = expected, fill = kmcluster)) +
  geom_bar(stat = "identity", position = "dodge", color = "black", fill = NA) +  # Use stat = "identity" for pre-computed values, bars with black outlines
  labs(
    title = "Expected percentage distribution of network cluster by migration status",
    x = "Migration Status",
    y = "Percentage"
  ) +
  scale_y_continuous(labels = scales::percent_format(scale = 1)) +  # Format y-axis as percentages
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

But adding fill = NA inside geom_bar overrides the fill=cluster in the aes, such that it no longer divides the data across cluster types and it makes it into some strange stacked bar (see image).

So the first question is:

  1. How do I divide the data by migration type and cluster, without coloring in each bar and instead just outlining them in black?

Secondly:

  1. How do I overlay this bar graph on top of the original one?
Share Improve this question edited Mar 19 at 14:56 MrFlick 207k19 gold badges295 silver badges318 bronze badges Recognized by R Language Collective asked Mar 19 at 14:51 KristenKristen 193 bronze badges 2
  • Apologies -- the data table looked perfect in the preview but didn't post as expected :( – Kristen Commented Mar 19 at 14:54
  • 2 Rather than including data in a table, it's better to share it as a dput() so we can copy/paste it directly into R for testing. See how to create a reproducible example. – MrFlick Commented Mar 19 at 14:57
Add a comment  | 

1 Answer 1

Reset to default 1

To add your second bars on top of the first you have to explicitly map on the group aes to still get a dodged bar chart.

library(ggplot2)

ggplot(expected_data, aes(
  x = resp_migration_status,
  y = percentage, fill = factor(kmcluster)
)) +
  geom_col(position = "dodge") +
  geom_col(aes(y = expected, group = factor(kmcluster)),
    color = "black", fill = NA, position = "dodge"
  ) +
  labs(
    title = "Percentage distribution of network cluster by migration status",
    x = "Migration Status",
    y = "Percentage",
    fill = "Cluster"
  ) +
  scale_y_continuous(labels = scales::percent_format(scale = 1)) + # Format y-axis as percentages
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

DATA

expected_data <- data.frame(
  resp_migration_status = c(
    "Non-migrant", "Non-migrant", "Non-migrant", "Non-migrant",
    "Migrant", "Migrant", "Migrant", "Migrant",
    "Displaced", "Displaced", "Displaced", "Displaced"
  ),
  kmcluster = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L),
  percentage = c(
    21.9, 30.1, 24.7,
    23.3, 41.9, 22.6, 19.4, 16.1, 36.9, 26.2, 11.9, 25
  ),
  expected = c(
    30.5, 27.4, 19.9,
    22.3, 30.5, 27.4, 19.9, 22.3, 30.5, 27.4, 19.9, 22.3
  )
)
发布评论

评论列表(0)

  1. 暂无评论