Decision Tree and Random Forest IN SHORT

Decision Tree

Sandipan Paul
2 min readOct 6, 2022

A decision tree is non-parametric unsupervised learning used for classification and regression. For example in classification, the customer will invest in a fixed deposit or not (yes or no). And the example in regression, what is the probability customer will invest in a fixed deposit or not?

The decision tree creates a simple rule based on homogeneity. Homogeneity means similarity, if a variable contains only one level then it is 100% homogeneous. The decision tree split the data based on homogeneity.

3 methods of detecting Homogeneity are Entropy, Information Gain & Gini Index.

Entropy

Entropy is the measure of randomness, the higher entropy hardest it is to conclude. For example, flipping a coin has the highest entropy.

Entropy = sum [-p * log(p)] where p is probability of the event

Example of the flipping coin, probability(head) = 0.5, probability(tail) = 0.5

Entropy = sum [ -p(head) * log(p(head)) -p(tail) * log(p(tail)) ]

=sum[-0.5 * log(0.5) -0.5 * log(0.5)] =sum[-0.5*(-1) -0.5*(-1)] =sum[0.5+0.5] =1

So entropy(flipping a coin) = 1. Therefore entropy ranges from 0 to 1. So variable which has the lowest entropy is selected for splitting.

The entropy of Homogenous Variables (a variable contains only one value) is 0.

Information gain

Information gain, when we select a column based on entropy it splits the data into many sub-data which will have some entropy. So we have to calculate some of all entropy from top to bottom.

Information Gain = Overall Entropy — Entropy of All Variables

Information gain is equal to the overall entropy before the split minus the entropy of each column after the split. So select a variable which maximises the information gain which in turn minimises the entropy. In another word, less value of entropy suggests a high value of information gain.

Gini Index

The Gini index is based on the Gini impurity.

Gini = 1 — sum(probability²)

For example, flipping a coin

= 1 — sum [P(head)² + P(tail)²]= 1 — sum[0.5² + 0.5²] = 1 – 0.5 = 0.5

So Gini index ranges from 0 to 0.5 and selects the variable with the lowest Gini index for splitting.

Gini Index of Homogenous Variables (a variable contains only one value) is 0.

The Gini index is mostly used in big datasets because of time complexity. Gini index takes less time than Entropy because entropy has a log in its formula.

Advantages of decision tree

It is easy to visualise, has little data preparation (no scaling), handles multiple class outputs & easily identifies the non-linear relationship.

Disadvantages of decision tree

A decision tree is easily overfitted as it creates bias if the output class is getting dominated and the solution is decision tree pruning or random forest. Lastly, the decision tree does not guarantee global optima, any time it returns local optima as it uses a set of data for training.

--

--