Data Science

An ensemble method is a technique that combines several base models together in order to produce one optimal predictive model
Ensemble methods are used to improve predictions by decreasing variance or bias
Ensemble methods produce $n$ learners by generating additional data in the training stage
These $n$ new training datasets are produced by random sampling with replacement from the original dataset

In the case of bagging, any observation has the same probability of appearing in the new dataset
The training stage is parallel for bagging
Meaning, each model is build independently
In bagging, the result is obtained by averaging the responses of the $n$ learners (or majority vote)

Draw a random sample with replacement from the training set
Train a model on that random sample
Keep repeating the above steps until we're satisfied with the number of models we have
Perform classification on all of the models, and choose the class with the highest number of votes out of all the models (i.e. majority vote)

Bagging will rarely produce better bias
Since the final prediction is based on the mean of the predictions from the subset trees, it won't give precise values for the classification and regression model

In the case of boosting, observations have a distinct probability of appearing in the new dataset
Boosting builds the new learnings in a sequential way
Boosting is an iterative ensemble technique that adjusts the weight of an observation based on its previous classification's success
Specifically, boosting will increase the weight of an observation if the observation was incorrectly classified
In boosting, the result is obtained by taking a weighted average of the $n$ learners
Specifically, the algorithm allocates weights to each resulting model
A learner with a good classification result on the training data will be assigned a higher weight compared to learners with a poor classification results
So, boosting needs to keep track of learners' errors, too

Draw a random sample without replacement from the training set
Train a weak learner on that random sample
Draw another random sample without replacement from the training set, and add $50%$ of the observations that were incorrectly classified from the previous sample
Train a weak learner on our new random sample
Draw another random sample without replacement from the training set, and add $50%$ of the observations that were incorrectly classified from the previous two samples
Keep repeating the above steps until we're satisfied with the number of weak learners we have
Perform classification on all of the models, and choose the class with the highest number of votes out of all the models (i.e. majority vote)

Supports different loss functions
- By default, we typically use binary:logistic loss function
Works well with interactions
Attemps to reduce bias

Information Gain

Decision Trees