最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

r - generatePartialDependenceData function returns Error when used for multiclass classification model - Stack Overflow

programmeradmin4浏览0评论

I have build an XGBoost multiclass classification model using mlr and i want to visualize the partial dependence for some features. However, if i try to do so using generatePartialDependenceData() i get the following error:

Error in melt.data.table(as.data.table(out), measure.vars = target, variable.name = if (td$type == : One or more values in 'measure.vars' is invalid.

I have checked for discrepancies between the task.desc in the Task object and the factor.levels in the WrappedModel object, but everything seems fine. Additionally, i have no trouble generating the data for a regression XGBoost with a different target variable using the same function. Is there a problem on my end, or is this a bug?

Here is an example using the palmerpenguins dataset:

# library
library(tidyverse)
library(caret)
library(mlr)

peng <- palmerpenguins::penguins

# data partition
set.seed(1234)
inTrain <- createDataPartition(
  y = peng$species,
  p = 0.7,
  list = F
)

# build task
train_class <- peng[inTrain,] %>% select(-sex, -year) %>% 
  createDummyFeatures(target = "species", cols = "island") %>% 
  makeClassifTask(data = ., target = "species")

# build learners
xgb_class_learner <- makeLearner(
  "classif.xgboost",
  predict.type = "response"
)

# build model
xgb_class <- train(xgb_class_learner, train_class)

# generate partial dependence
generatePartialDependenceData(xgb_class, train_class)

I have build an XGBoost multiclass classification model using mlr and i want to visualize the partial dependence for some features. However, if i try to do so using generatePartialDependenceData() i get the following error:

Error in melt.data.table(as.data.table(out), measure.vars = target, variable.name = if (td$type == : One or more values in 'measure.vars' is invalid.

I have checked for discrepancies between the task.desc in the Task object and the factor.levels in the WrappedModel object, but everything seems fine. Additionally, i have no trouble generating the data for a regression XGBoost with a different target variable using the same function. Is there a problem on my end, or is this a bug?

Here is an example using the palmerpenguins dataset:

# library
library(tidyverse)
library(caret)
library(mlr)

peng <- palmerpenguins::penguins

# data partition
set.seed(1234)
inTrain <- createDataPartition(
  y = peng$species,
  p = 0.7,
  list = F
)

# build task
train_class <- peng[inTrain,] %>% select(-sex, -year) %>% 
  createDummyFeatures(target = "species", cols = "island") %>% 
  makeClassifTask(data = ., target = "species")

# build learners
xgb_class_learner <- makeLearner(
  "classif.xgboost",
  predict.type = "response"
)

# build model
xgb_class <- train(xgb_class_learner, train_class)

# generate partial dependence
generatePartialDependenceData(xgb_class, train_class)
Share Improve this question edited Mar 12 at 11:39 ChickenTartR asked Mar 10 at 18:27 ChickenTartRChickenTartR 275 bronze badges 3
  • mlr has been deprecated for a few years now -- please use the successor package mlr3 instead. You can find information on partial dependence plots in the mlr3 book. – Lars Kotthoff Commented Mar 10 at 21:12
  • The function generatePartialDependenceData() from the mlr package does not handle probability matrices in multi-class classification well. I would suggest changing predict.type = "prob" in makeLearner and specifying a particular column (target) in partial dependence. – KacZdr Commented Mar 11 at 10:09
  • Thank you both for the quick responses. I´ll try using the mlr3 package instead. – ChickenTartR Commented Mar 12 at 11:28
Add a comment  | 

1 Answer 1

Reset to default -1

As mentioned by KacZdr, setting the predict.type argument to "prob" works fine.

# build learners
xgb_class_learner <- makeLearner(
  "classif.xgboost",
  predict.type = "prob"
)

However, since Lars kotthoff mentioned that the mlr package is deprecated, here is an alternative code using mlr3 . There seems to be an issue with ggplot in the $plot() function for FeatureEffects objects, when i try using effect$plot() i get:

Error in `geom_rug()`:

! problem while computing position.

i Error occured in the 2nd layer.

Caused by error in `if (params$width > 0) ...`:

! Missing value, where TRUE/FALSE is required

So i just generate the data and plot it myself.

# library
library(tidyverse)
library(mlr3)
library(mlr3learners)
library(mlr3pipelines)
library(iml)

peng <- palmerpenguins::penguins

# buil task
tsk_peng <- peng %>% select(-sex, -year) %>% 
  as_task_classif(target = "species")

# data partition
splits <- partition(tsk_peng)

# build learner
lrn_classif <- as_learner(po("encode", method = "one-hot") %>>% lrn("classif.xgboost"))

# train model
lrn_classif$train(tsk_peng, row_ids = splits$train)

# partail dependence
predictor <- Predictor$new(
  lrn_classif, 
  data = tsk_peng$data(rows = splits$test, cols = tsk_peng$feature_names),
  y = tsk_peng$data(rows = splits$test, cols = tsk_peng$target_names)
  )

effect <- FeatureEffects$new(predictor, method = "pdp")

# plot
## continuous
effect$results %>% 
  keep(names(.) %in% effect$features[1:4]) %>% 
  bind_rows() %>% 
  ggplot(aes(x = .borders, y = .value, col = .class))+
  geom_line()+
  facet_grid(~.feature, scale = "free")

## factor
effect$results$island %>% 
  ggplot(aes(x = .borders, y = .value, fill = .class))+
  geom_bar(stat = "identity", position = "dodge")

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论