scikit learn - How is each tree within DecisionTreeClassifier calculating probability of a class?

According to the sklearn docs, if you apply predict_proba to DecisionTreeClassifier:

The predicted class probability is the fraction of samples of the same class in a leaf.

Let's say that the rows where class = 1 in my training dataset look like this:

feature_1 | feature_2 | class
----------|-----------|------
A         | C         | 1
A         | C         | 1
A         | D         | 1
B         | C         | 1
B         | D         | 1

I'm interpreting the docs to mean that if I trained a model on this data, predict_proba would tell me that a row where feature_1 = A and feature_2 = C would have a 40% chance of falling under class 1. This is because there are five rows total where class = 1, two of which also have feature_1 = A and feature_2 = C. Two is 40% of five.

Obviously this is a very simple example, but I'm just trying to understand the general methodology predict_proba uses.

Is my interpretation correct? I would have thought that in this case, the probability of class being 1 would be at least partially affected by any rows in the training dataset where feature_1 = A, feature_2 = C, and class != 1?

According to the sklearn docs, if you apply predict_proba to DecisionTreeClassifier:

The predicted class probability is the fraction of samples of the same class in a leaf.

Let's say that the rows where class = 1 in my training dataset look like this:

feature_1 | feature_2 | class
----------|-----------|------
A         | C         | 1
A         | C         | 1
A         | D         | 1
B         | C         | 1
B         | D         | 1

Obviously this is a very simple example, but I'm just trying to understand the general methodology predict_proba uses.

Share Improve this question edited Mar 12 at 19:58 desertnaut 60.5k32 gold badges155 silver badges182 bronze badges asked Mar 12 at 15:43 SRJCoding 4755 silver badges18 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

First of all, decision trees operate with numerical features - not categorical. And then the learning algorithm will attempt to find decision thresholds that optimize the given criteria. This happens in a greedy fashion until the stopping criteria (like max_depth) is hit. When the stopping criterion is hit, we have a leaf node. And at this leaf node (when we have applied all the featureA < threshold 1 && featureE > threshold2 etc), we might be left with multiple samples coming from different classes. This is when the cited rule comes in. The probabilities for each class, in that specific leaf node, will be set to the observed class proportions.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

scikit learn - How is each tree within DecisionTreeClassifier calculating probability of a class? - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)