r - generatePartialDependenceData function returns Error when used for multiclass classification model

I have build an XGBoost multiclass classification model using mlr and i want to visualize the partial dependence for some features. However, if i try to do so using generatePartialDependenceData() i get the following error:

Error in melt.data.table(as.data.table(out), measure.vars = target, variable.name = if (td$type == : One or more values in 'measure.vars' is invalid.

I have checked for discrepancies between the task.desc in the Task object and the factor.levels in the WrappedModel object, but everything seems fine. Additionally, i have no trouble generating the data for a regression XGBoost with a different target variable using the same function. Is there a problem on my end, or is this a bug?

Here is an example using the palmerpenguins dataset:

# library
library(tidyverse)
library(caret)
library(mlr)

peng <- palmerpenguins::penguins

# data partition
set.seed(1234)
inTrain <- createDataPartition(
  y = peng$species,
  p = 0.7,
  list = F
)

# build task
train_class <- peng[inTrain,] %>% select(-sex, -year) %>% 
  createDummyFeatures(target = "species", cols = "island") %>% 
  makeClassifTask(data = ., target = "species")

# build learners
xgb_class_learner <- makeLearner(
  "classif.xgboost",
  predict.type = "response"
)

# build model
xgb_class <- train(xgb_class_learner, train_class)

# generate partial dependence
generatePartialDependenceData(xgb_class, train_class)

Error in melt.data.table(as.data.table(out), measure.vars = target, variable.name = if (td$type == : One or more values in 'measure.vars' is invalid.

Here is an example using the palmerpenguins dataset:

# library
library(tidyverse)
library(caret)
library(mlr)

peng <- palmerpenguins::penguins

# data partition
set.seed(1234)
inTrain <- createDataPartition(
  y = peng$species,
  p = 0.7,
  list = F
)

# build task
train_class <- peng[inTrain,] %>% select(-sex, -year) %>% 
  createDummyFeatures(target = "species", cols = "island") %>% 
  makeClassifTask(data = ., target = "species")

# build learners
xgb_class_learner <- makeLearner(
  "classif.xgboost",
  predict.type = "response"
)

# build model
xgb_class <- train(xgb_class_learner, train_class)

# generate partial dependence
generatePartialDependenceData(xgb_class, train_class)

Share Improve this question edited Mar 12 at 11:39 asked Mar 10 at 18:27 ChickenTartR 275 bronze badges

mlr has been deprecated for a few years now -- please use the successor package mlr3 instead. You can find information on partial dependence plots in the mlr3 book. – Lars Kotthoff Commented Mar 10 at 21:12
The function generatePartialDependenceData() from the mlr package does not handle probability matrices in multi-class classification well. I would suggest changing predict.type = "prob" in makeLearner and specifying a particular column (target) in partial dependence. – KacZdr Commented Mar 11 at 10:09
Thank you both for the quick responses. I´ll try using the mlr3 package instead. – ChickenTartR Commented Mar 12 at 11:28

Add a comment |

1 Answer 1

Sorted by: Reset to default -1

As mentioned by KacZdr, setting the predict.type argument to "prob" works fine.

# build learners
xgb_class_learner <- makeLearner(
  "classif.xgboost",
  predict.type = "prob"
)

However, since Lars kotthoff mentioned that the mlr package is deprecated, here is an alternative code using mlr3 . There seems to be an issue with ggplot in the $plot() function for FeatureEffects objects, when i try using effect$plot() i get:

Error in `geom_rug()`:

! problem while computing position.

i Error occured in the 2nd layer.

Caused by error in `if (params$width > 0) ...`:

! Missing value, where TRUE/FALSE is required

So i just generate the data and plot it myself.

# library
library(tidyverse)
library(mlr3)
library(mlr3learners)
library(mlr3pipelines)
library(iml)

peng <- palmerpenguins::penguins

# buil task
tsk_peng <- peng %>% select(-sex, -year) %>% 
  as_task_classif(target = "species")

# data partition
splits <- partition(tsk_peng)

# build learner
lrn_classif <- as_learner(po("encode", method = "one-hot") %>>% lrn("classif.xgboost"))

# train model
lrn_classif$train(tsk_peng, row_ids = splits$train)

# partail dependence
predictor <- Predictor$new(
  lrn_classif, 
  data = tsk_peng$data(rows = splits$test, cols = tsk_peng$feature_names),
  y = tsk_peng$data(rows = splits$test, cols = tsk_peng$target_names)
  )

effect <- FeatureEffects$new(predictor, method = "pdp")

# plot
## continuous
effect$results %>% 
  keep(names(.) %in% effect$features[1:4]) %>% 
  bind_rows() %>% 
  ggplot(aes(x = .borders, y = .value, col = .class))+
  geom_line()+
  facet_grid(~.feature, scale = "free")

## factor
effect$results$island %>% 
  ggplot(aes(x = .borders, y = .value, fill = .class))+
  geom_bar(stat = "identity", position = "dodge")

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

r - generatePartialDependenceData function returns Error when used for multiclass classification model - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)