I have a multi class problem where I want to consider the most important features the model (in this case a LightGBM model) has learned when predicting individual classes, but then also when considering a subset of classes. Does this amount to simply summing the SHAP values of the selection of classes of interest?
Finally, what if we want to use our multi-class model to now perform a binary prediction. How best could we formulate the SHAP values to reflect this?
As an example, let's consider a multi-class model with three labels predicting a future outcome: hospital_emergency, hospital_appointment, healthy. Assuming these outcomes are indeed independent, I can train a multi-class model. The probabilities (perhaps not well calibrated) will sum to one. i.e.,
1 = P(healthy) + P(hospital_emergency) + P(hospital_appointment)
In this case, I should be able to find the probability of a subset of these outcomes by summing them. For example, I can find:
P(hospital) = P(hospital_emergency) + P(hospital_appointment)
Now, using the SHAP package, we can easily take our LightGBM model and using our hold out test set produce our shap values. Using these, I will be able to create shap beeswarm plots for each class, and also get the Mean Absolute Shap to rank the features by importance for each class.
My question is how do I produce the SHAP values associated with a subset of the outcome classes.
For example, what happens if I want the top features which lead to any hospital outcome?
Seeing as we are just summing the probabilities, my assumption is we can simply sum the underlying SHAP values
and values_base
associated with out subset of classes under consideration?
Written crudely, for my hospital example, this would conceptually equate to:
SHAP(hospital) = SHAP(hospital_emergency) + SHAP(hospital_emergency)
My final question is whether we can somehow 'binarize' the shape values? Say I wanted to use this multi-class model to now predict going to hospital:
1 = P(healthy) + P(hospital) = P(not_hospital) + P(hospital)
Does anyone have a suggestion about how we could formulate the SHAP values to reflect this?
Obviously we already have the SHAP values associated from the multi-class model's 'healthy' label prediction (crudely denoted as SHAP(healthy)
), but this doesn't include the 'information' from the other two classes which are predicting the opposite..
I am imagining something like:
SHAP(not_hospital) = SHAP(healthy) - SHAP(hospital_emergency) - SHAP(hospital_appointment)
But I am unsure whether this is appropriate. Especially as we end up with SHAP values which are larger than in the binary case.
Any help or insights would be greatly appricated!