Describing Logistic Regression
- The most common form of logistic regression is binary logistic regression, which is used for establishing a relationship between one or more independent variables and a dependent categorical variable with 2 categories (i.e. binary variable)
- In other words, logistic regression is a learning method that uses a logistic function (or sigmoid function) to model the probability of success (for a binary dependent variable)
- Essentially, logistic regression is equivalent to the linear regression model (i.e. a linear combination of the independent variables) with the sigmoid function applied to it
-
The output of a linear regression model are the conditional means (i.e. ), whereas the output of a logistic regression model are the conditional probabilities (i.e. )
- This is an effect of the sigmoid function
Probability
- Probabilities of success are defined as the number of successes divided by the total number of observations (i.e. successes and failures)
- We typically define probabilities as the following:
- Probabilities are not linearly related to the covariates
Odds and Log-Odds
- The odds, log-odds, and probability convey the same concept, but in different formats
- Odds of success are defined as the ratio of the probability of success over the probability of failure
- Log-odds of success are defined as the log of the odds of success
- We will sometimes apply the logit function to our conditional probabilities (logistic regression model's predictions), which will give us the log-odds of success
- This is because log-odds represent our probabilities of success as a function of our covariates
- In other words, the log-odds are linearly related to our covariates
- We can define the odds function as the following:
- We can define the log odds function as the following:
Example of Calculating Odds
- Let's say an average of out of every people will default on their loans
- Then, the probability of a person defaulting on their loans is the following:
- And, the odds of a person defaulting is the following:
Example of Calculating Log-Odds
- Let's say an average of out of every people will default on their loans
- Then, the log-odds of a person defaulting on their loans is the following:
Logistic Function
- In terms of logistic regression, the logistic function is typically synonymous with the sigmoid function
- The logistic function models the probabilities of success
- The logistic function will always produce an S-shaped curve
- Meaning, the amount that changes due to a one-unit change in will depend on the current value of
- The logistic function is defined as the following:
- Here, the coefficients are just our logistic regression coefficients given by the logit function
Logit Function
- The logit link function models the log-odds of success
- Said another way, the logit link function models the probabilities of success as a function of the covariates
- Meaning, the logit link function will always produce a linear regression line
- The beta coefficients, given by the glm output in R, relates to the change in log-odds:
- We can interpret the beta coefficients as the following: increasing by one unit will change the log odds of success by
- We can also interpret the beta coefficients as the following: increasing by one unit will multiply the odds of success by
References
Previous
Next