Motivating Ensemble Methods
- An ensemble method is a technique that combines several base models together in order to produce one optimal predictive model
- Ensemble methods are used to improve predictions by decreasing variance or bias
- Ensemble methods produce learners by generating additional data in the training stage
- These new training datasets are produced by random sampling with replacement from the original dataset
Describing Bagging
- In the case of bagging, any observation has the same probability of appearing in the new dataset
- The training stage is parallel for bagging
- Meaning, each model is build independently
- In bagging, the result is obtained by averaging the responses of the learners (or majority vote)
Bagging Algorithm
- Draw a random sample with replacement from the training set
- Train a model on that random sample
- Keep repeating the above steps until we're satisfied with the number of models we have
- Perform classification on all of the models, and choose the class with the highest number of votes out of all the models (i.e. majority vote)
Advantages of Bagging
- Attempts to reduce variance
- Reduces overfitting
- Handles higher dimensionality of data well
- Maintains accuracy for missing data
Disadvantages of Bagging
- Bagging will rarely produce better bias
- Since the final prediction is based on the mean of the predictions from the subset trees, it won't give precise values for the classification and regression model
Describing Boosting
- In the case of boosting, observations have a distinct probability of appearing in the new dataset
- Boosting builds the new learnings in a sequential way
- Boosting is an iterative ensemble technique that adjusts the weight of an observation based on its previous classification's success
- Specifically, boosting will increase the weight of an observation if the observation was incorrectly classified
- In boosting, the result is obtained by taking a weighted average of the learners
- Specifically, the algorithm allocates weights to each resulting model
- A learner with a good classification result on the training data will be assigned a higher weight compared to learners with a poor classification results
- So, boosting needs to keep track of learners' errors, too
Boosting Algorithm
- Draw a random sample without replacement from the training set
- Train a weak learner on that random sample
- Draw another random sample without replacement from the training set, and add of the observations that were incorrectly classified from the previous sample
- Train a weak learner on our new random sample
- Draw another random sample without replacement from the training set, and add of the observations that were incorrectly classified from the previous two samples
- Keep repeating the above steps until we're satisfied with the number of weak learners we have
- Perform classification on all of the models, and choose the class with the highest number of votes out of all the models (i.e. majority vote)
Advantages of Boosting
-
Supports different loss functions
- By default, we typically use binary:logistic loss function
- Works well with interactions
- Attemps to reduce bias
Disadvantages of Boosting
- Prone to overfitting
- Requires careful tuning of different hyper-parameters
References
Previous
Next