最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

scikit learn - How is each tree within DecisionTreeClassifier calculating probability of a class? - Stack Overflow

programmeradmin1浏览0评论

According to the sklearn docs, if you apply predict_proba to DecisionTreeClassifier:

The predicted class probability is the fraction of samples of the same class in a leaf.

Let's say that the rows where class = 1 in my training dataset look like this:

feature_1 | feature_2 | class
----------|-----------|------
A         | C         | 1
A         | C         | 1
A         | D         | 1
B         | C         | 1
B         | D         | 1

I'm interpreting the docs to mean that if I trained a model on this data, predict_proba would tell me that a row where feature_1 = A and feature_2 = C would have a 40% chance of falling under class 1. This is because there are five rows total where class = 1, two of which also have feature_1 = A and feature_2 = C. Two is 40% of five.

Obviously this is a very simple example, but I'm just trying to understand the general methodology predict_proba uses.

Is my interpretation correct? I would have thought that in this case, the probability of class being 1 would be at least partially affected by any rows in the training dataset where feature_1 = A, feature_2 = C, and class != 1?

According to the sklearn docs, if you apply predict_proba to DecisionTreeClassifier:

The predicted class probability is the fraction of samples of the same class in a leaf.

Let's say that the rows where class = 1 in my training dataset look like this:

feature_1 | feature_2 | class
----------|-----------|------
A         | C         | 1
A         | C         | 1
A         | D         | 1
B         | C         | 1
B         | D         | 1

I'm interpreting the docs to mean that if I trained a model on this data, predict_proba would tell me that a row where feature_1 = A and feature_2 = C would have a 40% chance of falling under class 1. This is because there are five rows total where class = 1, two of which also have feature_1 = A and feature_2 = C. Two is 40% of five.

Obviously this is a very simple example, but I'm just trying to understand the general methodology predict_proba uses.

Is my interpretation correct? I would have thought that in this case, the probability of class being 1 would be at least partially affected by any rows in the training dataset where feature_1 = A, feature_2 = C, and class != 1?

Share Improve this question edited Mar 12 at 19:58 desertnaut 60.5k32 gold badges155 silver badges182 bronze badges asked Mar 12 at 15:43 SRJCodingSRJCoding 4755 silver badges18 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

First of all, decision trees operate with numerical features - not categorical. And then the learning algorithm will attempt to find decision thresholds that optimize the given criteria. This happens in a greedy fashion until the stopping criteria (like max_depth) is hit. When the stopping criterion is hit, we have a leaf node. And at this leaf node (when we have applied all the featureA < threshold 1 && featureE > threshold2 etc), we might be left with multiple samples coming from different classes. This is when the cited rule comes in. The probabilities for each class, in that specific leaf node, will be set to the observed class proportions.

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论