最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

r - Stacked bar chart of relative abundance with one colour for higher taxonomic rank and gradient colour - Stack Overflow

programmeradmin5浏览0评论

This question is very similar to some other questions on here but I can't crack it. The issue comes as I need to reshape my data.

I have count data from microbiome data and want to make a stacked bar chart. The charts are then grouped according to a qualitative variable. I would like the higher taxonomic groups to be a certain colour and there be a continuous gradient within those groups. Similar to this:

I have been following these two questions: How to creat a bar graph of microbiota data with one color for higher taxonomic rank and gradient color and Stacked barplot with colour gradients for each bar

Here is an example of my data:

  ID Group Family3 Family4 Family5 Family6 Family7 Family8 Family9 Family10
1  1     1      38      73      60      20      33      71      83      40
2  2     1      96      16      88      23      19      70      44      77
3  3     2      69      99      80      60      55      76      99      92
4  4     2      82      91      91      71      79      79      12      38
5  5     3      41      83      77      84      70      37      79      92

I have IDs, group and then the various families. My dataset is has more columns/row. Script to make example data:

# Set seed for reproducibility
set.seed(123)

# Create the data frame
df <- data.frame(
  ID = 1:5,
  Group = c(1, 1, 2, 2, 3)
)

# Add columns Family3 to Family10 with random values between 0 and 100
for (i in 3:10) {
  df[[paste0("Family", i)]] <- sample(0:100, nrow(df), replace = TRUE)
}

# Print the resulting data frame
print(df)

I have a separate dataframe with the Phylum and Family information:

df_taxa <- data.frame(Phylum=c("Phyla1", "Phyla2", "Phyla3", "Phyla2", "Phyla2", "Phyla2", "Phyla1", "Phyla3", "Phyla1", "Phyla1"), Family=c("Family1", "Family8", "Family9", "Family2", "Family7", "Family6", "Family10", "Family3", "Family5", "Family4"))

There are some steps I do beforehand to remove columns with low counts and filter the df_taxa so it only contains the Phylum/Family info from the columns that remain after removing the low count columns.

This is the script I have been using the generate my stacked bar charts:

library(reshape2)
library(ggplot2)
df_melt <- reshape2::melt(df,id.vars=c("ID", "Group")) #reshape dataframe for ggplot 

df_cols <- ColourPalleteMulti(df_taxa, "Phylum", "Family") # Generate colours. This function is found in the second link. 

ggplot(df_melt, aes(ID,value, fill=variable)) + geom_bar(position="fill", stat="identity") + scale_fill_manual("", values=df_cols) + facet_grid_paginate(. ~ Group, scales ="free") #Plot with ggplot

This is what the plot looks like:

The issue is that it is not splitting the colours according to the Phyla. I have looked at the other questions and it says that it is easier to add an additional column called group to the original dataframe, then this is used as the fill option:

#Example given from second link
df$group <- paste0(df$color, "-", df$clarity, sep = "")

# Build the colour pallete
colours <-ColourPalleteMulti(df, "color", "clarity")

# Plot resultss
ggplot(df, aes(color)) + 
  geom_bar(aes(fill = group), colour = "grey") +
  scale_fill_manual("Subject", values=colours, guide = "none")

I don't see how I can do this as I melt the data and I use the count data variable as the fill option for ggplot.

Any help would be greatly appreciated. Thanks

This question is very similar to some other questions on here but I can't crack it. The issue comes as I need to reshape my data.

I have count data from microbiome data and want to make a stacked bar chart. The charts are then grouped according to a qualitative variable. I would like the higher taxonomic groups to be a certain colour and there be a continuous gradient within those groups. Similar to this:

I have been following these two questions: How to creat a bar graph of microbiota data with one color for higher taxonomic rank and gradient color and Stacked barplot with colour gradients for each bar

Here is an example of my data:

  ID Group Family3 Family4 Family5 Family6 Family7 Family8 Family9 Family10
1  1     1      38      73      60      20      33      71      83      40
2  2     1      96      16      88      23      19      70      44      77
3  3     2      69      99      80      60      55      76      99      92
4  4     2      82      91      91      71      79      79      12      38
5  5     3      41      83      77      84      70      37      79      92

I have IDs, group and then the various families. My dataset is has more columns/row. Script to make example data:

# Set seed for reproducibility
set.seed(123)

# Create the data frame
df <- data.frame(
  ID = 1:5,
  Group = c(1, 1, 2, 2, 3)
)

# Add columns Family3 to Family10 with random values between 0 and 100
for (i in 3:10) {
  df[[paste0("Family", i)]] <- sample(0:100, nrow(df), replace = TRUE)
}

# Print the resulting data frame
print(df)

I have a separate dataframe with the Phylum and Family information:

df_taxa <- data.frame(Phylum=c("Phyla1", "Phyla2", "Phyla3", "Phyla2", "Phyla2", "Phyla2", "Phyla1", "Phyla3", "Phyla1", "Phyla1"), Family=c("Family1", "Family8", "Family9", "Family2", "Family7", "Family6", "Family10", "Family3", "Family5", "Family4"))

There are some steps I do beforehand to remove columns with low counts and filter the df_taxa so it only contains the Phylum/Family info from the columns that remain after removing the low count columns.

This is the script I have been using the generate my stacked bar charts:

library(reshape2)
library(ggplot2)
df_melt <- reshape2::melt(df,id.vars=c("ID", "Group")) #reshape dataframe for ggplot 

df_cols <- ColourPalleteMulti(df_taxa, "Phylum", "Family") # Generate colours. This function is found in the second link. 

ggplot(df_melt, aes(ID,value, fill=variable)) + geom_bar(position="fill", stat="identity") + scale_fill_manual("", values=df_cols) + facet_grid_paginate(. ~ Group, scales ="free") #Plot with ggplot

This is what the plot looks like:

The issue is that it is not splitting the colours according to the Phyla. I have looked at the other questions and it says that it is easier to add an additional column called group to the original dataframe, then this is used as the fill option:

#Example given from second link
df$group <- paste0(df$color, "-", df$clarity, sep = "")

# Build the colour pallete
colours <-ColourPalleteMulti(df, "color", "clarity")

# Plot resultss
ggplot(df, aes(color)) + 
  geom_bar(aes(fill = group), colour = "grey") +
  scale_fill_manual("Subject", values=colours, guide = "none")

I don't see how I can do this as I melt the data and I use the count data variable as the fill option for ggplot.

Any help would be greatly appreciated. Thanks

Share Improve this question asked Mar 21 at 12:20 btredcupbtredcup 374 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 2

I understand you want to create subpallets for your data based on the group Phylum and color the pertaining families within each Phylum with seperate palettes.

For this you could

  1. Create a column phylum_family that combines Phylum and Family in your df_melt
  2. Do the same in your df_taxa
  3. Order df_taxa by phylum_family
  4. Plot df_melt and fill by phylum_family
  5. scale_fill_manual by the custom color palette created with the combinations in df_taxa

and this will give

Code

library(reshape2)
library(ggplot2)
library(ggforce)
library(dplyr)

df <- data.frame(
  ID = 1:5,
  Group = c(1, 1, 2, 2, 3)
)
# Add columns Family3 to Family10 with random values between 0 and 100
for (i in 3:10) {
  df[[paste0("Family", i)]] <- sample(0:100, nrow(df), replace = TRUE)
}
df_taxa <- data.frame(Phylum=c("Phyla1", "Phyla2", "Phyla3", "Phyla2", "Phyla2", "Phyla2", "Phyla1", "Phyla3", "Phyla1", "Phyla1"), 
                      Family=c("Family1", "Family8", "Family9", "Family2", "Family7", "Family6", "Family10", "Family3", "Family5", "Family4"))

df_melt <- reshape2::melt(df, id.vars=c("ID", "Group")) %>%
  left_join(df_taxa[,c("Family", "Phylum")], by = c("variable" = "Family")) %>%
  mutate(phylum_family = paste(Phylum, variable, sep = "-"))


# color pallet multi

ColourPalleteMulti <- function(df, group, subgroup){
  
  # Find how many colour categories to create and the number of colours in each
  categories <- aggregate(as.formula(paste(subgroup, group, sep="~" )), df, function(x) length(unique(x)))
  category.start <- (scales::hue_pal(l = 100)(nrow(categories))) # Set the top of the colour pallete
  category.end  <- (scales::hue_pal(l = 40)(nrow(categories))) # set the bottom
  
  # Build Colour pallette
  colours <- unlist(lapply(1:nrow(categories),
                           function(i){
                             colorRampPalette(colors = c(category.start[i], category.end[i]))(categories[i,2])}))
}


# We'll still use ColourPalleteMulti but now on our mapping dataframe
df_taxa$phylum_family <- paste(df_taxa$Phylum, df_taxa$Family, sep = "-")

df_taxa <- arrange(df_taxa, phylum_family) # order
df_cols <- setNames(ColourPalleteMulti(df_taxa, "Phylum", "Family"), df_taxa$phylum_family)

# Now plot with the combined phylum-family as the fill
ggplot(df_melt, aes(ID, value, fill = phylum_family)) +
  geom_bar(position = "fill", stat = "identity") +
  scale_fill_manual("", values = df_cols) +
  facet_grid_paginate(. ~ Group, scales = "free")

Let me know, if any of this needs further explanation or if I misundertood you.

Adding brackets

I found this for grouping legends in ggplot. But you can also improvise some brackets by using cowplot. This can be improved as it's very manual atm.

p <- ggplot(df_melt, aes(ID, value, fill = phylum_family)) +
  geom_bar(position = "fill", stat = "identity") +
  scale_fill_manual("", values = df_cols) +
  facet_grid_paginate(. ~ Group, scales = "free") +
  theme(plot.margin = margin(5, 80, 5, 5, "pt")) 

library(cowplot)
p <- ggdraw(p)

add_taxonomic_bracket <- function(plot, label, color, y_min, y_max, 
                                  x_bracket = 0.91, bracket_width = 0.02, 
                                  label_offset = 0.03, size = 1) {
  x_end <- x_bracket - bracket_width
  y_mid <- (y_min + y_max) / 2
  
  plot + 
    draw_line(
      x = c(x_bracket, x_bracket), 
      y = c(y_min, y_max),
      color = color, 
      size = size
    ) +
    draw_line(
      x = c(x_bracket, x_end), 
      y = c(y_max, y_max),
      color = color, 
      size = size
    ) +
    draw_line(
      x = c(x_bracket, x_end), 
      y = c(y_min, y_min),
      color = color, 
      size = size
    ) +
    draw_label(
      label, 
      x = x_end, 
      y = y_mid, 
      color = color,
      hjust = -0.6,
      fontface = "italic"
    )
}

p <- add_taxonomic_bracket(p, "Phyla 1", "#E65100", 0.53, 0.62)
p <- add_taxonomic_bracket(p, "Phyla 2", "#388E3C", 0.42, 0.53)
p <- add_taxonomic_bracket(p, "Phyla 3", "#1565C0", 0.36, 0.42)
p

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论