Describing Information Gain
- Information gain is the amount of information gained about a random variable from observing another random variable
- Information gain is a metric that is used to evaluate the splitting criteria for a decison tree
- In other words, we evaluate the information gain of an attribute (or split) to tell us how important the attribute is
-
The following are some example attributes we may evaluate using information gain:
- Should we split on balance > 50k
- Should we split on applicant = employed
- Should we split on weather = sunny
- Information gain uses the entropy metric in its calculation
- Our goal is to find a split that maximizes the information gain, which will happen if we minimize the entropy for the groups created by the split
- Specifically, information gain can be defined as the following equation:
- Where is the information gain
- Where is the total entropy of a parent node
- Where is the average entropy of a proposed child node
Describing Entropy
- Entropy measures the level of impurity in a group created by a proposed split
- We should only think about splitting if our group is very impure (i.e. Entropy = 1)
- Our goal is to find a split that minimizes the entropy for each group created by the split
- Minimizing the entropy is the same as minimizing the impurity
- In other words, the best split will be the one that makes sure each group contains data with the same value (i.e. the least impure)
- Entropy is defined by the following equation:
- Where is the probability of class (or group)
Describing Gini Impurity
- Gini impurity and entropy are different types of measures of impurity
- Specifically, CART analytics considers them as almost the same
- Given a choice, some prefer the gini impurity
- This is because it doesn't require logarithmic functions to be computed
- The gini impurity can be defined as the following:
- Where is the probability of class (or group)
Example of Calculating Entropy
- Let's say we are trying to evaluate the purity of a group, where we have 16 males and 14 females in our sample
- Then, we could define our entropy as the following:
- In this case, we should think about splitting, since the group is extremely pure
Another Example of Calculating Entropy
- Let's say we are trying to evalute the purity of a group, where we have only 16 males in our sample
- Then, we could define our entropy as the following:
- In this case, we shouldn't think about splitting, since the group is extremely pure
Example of Information Gain Calculation
- Let's say we are trying to evaluate a split of a group, where we initially have 16 males and 14 females in our sample
-
Calculate the parent entropy
- In this case, the impurty is large, so we should split
-
Calculate one child's entropy
- Here, there are data points in this group after the split
- Also, of those data points are female, and of those data points are male
- In this case, the impurity is fairly high for this group after the split
-
Calculate the other child's entropy
- Here, there are data points in this group after the split
- Also, of those data points are female, and of those data points are male
- In this case, the impurty is fairly small for this group after the split
-
Calculate the weighted average entropy of the children
-
Calculate the information gain
- Therefore, this split gives us amount of additional information
- We should evalute other splits, and choose this one if there aren't any other splits with an information gain greater than
References
Next