最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

r - How to label points selectively in a biplot with factoextra::fviz_pac_biplot or ggbiplot - Stack Overflow

programmeradmin1浏览0评论

I want to demonstrate multivariate outliers with a PCA biplot, by labeling points with large Malahanobis D^2, but only label those points. The idea is that outliers usually appear on the smallest PCA dimensions.

I'm using a cleaned up version of the palmerpenguins data. I want to label the 3 points identified as noteworthy.

library(heplots)
library(dplyr)
library(ggplot2)
library(ggbiplot)
library(factoextra)

data(peng, package="heplots")

# find potential multivariate outliers
DSQ <- heplots::Mahalanobis(peng[, 3:6])
noteworthy <- order(DSQ, decreasing = TRUE)[1:3] |> print()

[1] 283  10  35

ggbiplot doesn't allow points and labels. Here is an attempt using factoextra::fviz_pca_biplot. I create a vector lab with just those 3 case numbers non-blank. But when I tried using that in the call, there are no labels whatsoever in the plot.

peng.pca <- prcomp (~ bill_length + bill_depth + flipper_length + body_mass,
                    data=peng, scale. = TRUE)

# create vector of labels, blank except for the noteworthy
lab <- 1:nrow(peng)
lab <- ifelse(lab %in% noteworthy, lab, "")


fviz_pca_biplot(
  peng.pca,
  axes = 3:4,
  habillage = peng$species,
  addEllipses = TRUE, ellipse.level = 0.68,
  palette = peng.colors("dark"),
  arrowsize = 1.5, col.var = "black", labelsize=4,
#  label = lab
) +
  theme(legend.position = "top")

I want to demonstrate multivariate outliers with a PCA biplot, by labeling points with large Malahanobis D^2, but only label those points. The idea is that outliers usually appear on the smallest PCA dimensions.

I'm using a cleaned up version of the palmerpenguins data. I want to label the 3 points identified as noteworthy.

library(heplots)
library(dplyr)
library(ggplot2)
library(ggbiplot)
library(factoextra)

data(peng, package="heplots")

# find potential multivariate outliers
DSQ <- heplots::Mahalanobis(peng[, 3:6])
noteworthy <- order(DSQ, decreasing = TRUE)[1:3] |> print()

[1] 283  10  35

ggbiplot doesn't allow points and labels. Here is an attempt using factoextra::fviz_pca_biplot. I create a vector lab with just those 3 case numbers non-blank. But when I tried using that in the call, there are no labels whatsoever in the plot.

peng.pca <- prcomp (~ bill_length + bill_depth + flipper_length + body_mass,
                    data=peng, scale. = TRUE)

# create vector of labels, blank except for the noteworthy
lab <- 1:nrow(peng)
lab <- ifelse(lab %in% noteworthy, lab, "")


fviz_pca_biplot(
  peng.pca,
  axes = 3:4,
  habillage = peng$species,
  addEllipses = TRUE, ellipse.level = 0.68,
  palette = peng.colors("dark"),
  arrowsize = 1.5, col.var = "black", labelsize=4,
#  label = lab
) +
  theme(legend.position = "top")

Share Improve this question asked Feb 7 at 16:44 user101089user101089 3,9921 gold badge31 silver badges60 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 1

You can use a geom_text layer where you add your label vector as a column to the global data using e.g. dplyr::bind_cols:

library(heplots)
library(dplyr, warn = FALSE)
library(ggplot2)
library(factoextra)

fviz_pca_biplot(
  peng.pca,
  axes = 3:4,
  habillage = peng$species,
  addEllipses = TRUE,
  ellipse.level = 0.68,
  label = "none",
  # palette = peng.colors("dark"),
  arrowsize = 1.5,
  col.var = "black",
) +
  geom_text(
    data = ~ dplyr::bind_cols(., label),
    aes(label = label),
    vjust = 0,
    nudge_y = .05
  ) +
  theme(legend.position = "top")

I'm the maintainer of the ggbiplot pkg, so I dug in and modified the code for ggbiplot() to do what I want here. I added a geom.ind argument that allows geom.ind = c("point", "text"). For testing purposes, this is still on a new geoms branch in the package.

#remotes::install_github("friendly/ggbiplot", ref = "geoms")
# adjust variable names to fold at '_'
vn <- rownames(peng.pca$rotation)
vn <- gsub("_", "\n", vn)
rownames(peng.pca$rotation) <- vn

ggbiplot(peng.pca, 
         choices = 3:4,
         groups = peng$species, 
         ellipse = TRUE, ellipse.alpha = 0.1,
         circle = TRUE,
         var.factor = 4.5,
         geom.ind = c("point", "text"),
         point.size = 2,
         labels = lab, labels.size = 6,
         varname.size = 5,
         clip = "off") +
  theme_minimal(base_size = 14) +
#  theme_penguins("dark") +
  theme(legend.direction = 'horizontal', legend.position = 'top') 

This gives:

发布评论

评论列表(0)

  1. 暂无评论