Motivating Softmax Regression
- Up until now, the only type of classification we've gone over is binary classification
- In other words, we're only able to classify two possible labels
- Sometimes, we may want to classify an observation where multiple classifications exist
- Said another way, we may be interested in multi-classification
- For example, we could predict if an image is a cat, dog, or neither
- We wouldn't use standard regression or binary classification
- Instead, we would want to use softmax regression
Describing Softmax Regression
- Softmax Regression refers to a network where its output layer uses a softmax activation function
- Up until now, our output layer has only included a single neuron
- We do this so we can output a single number, instead of a vector
- For softmax regression, our network's output layer includes number of neurons
- Here, represents the number of classes
- For example, we would set if we're trying to predict whether an image is a cat, dog, or neither
- In summary, a network with an output layer that is the softmax activation function will typically return a vector of predictions, instead of a single-number prediction
Defining Softmax Regression
-
Receive some input
- Typically, the input of our output layer is the activations from the previous layer
- Here, is a vector
-
Compute our weighted input :
- Here, is a matrix
- And, is a vector
- Then, is a vector
-
Compute the softmax activations :
- Here, is a vector
Example of using a Softmax Activation Function
- Let's keep using our example network listed above
- Let's define as the following:
- Before computing the softmatrix activations, we should define each individual :
- We still need to sum those values together:
- Now, let's compute each individual softmatrix activation :
Intuition behind Softmax Learning
- In a network without any hidden layers, an output layer using a softmax activation function creates linear decision boundaries
- There would be number of linear decision boundaries
- As we add hidden layers to our network, we would not have linear decision boundaries anymore
- Instead, our network would use non-linear decision boundaries
- These non-linear decision boundaries are useful for understanding non-linear relationships between the number of classes
- Also, we should notice that each value in our softmax output vector represents a probability
- Specifically, we could add up these values and they'd add up to
- Essentially, softmax converts the values given by into probabilities represented by
- Therefore, softmax becomes logistic regression when
tldr
- Softmax regression is used when we're interested in multi-classification
- Softmax regression refers to a form of regression on a network with an output layer that is the softmax activation function
- A network with an output layer that is the softmax activation function will typically return a vector of predictions, instead of a single-number prediction
- The number of neurons in the output layer will be equal to the number of classes that we're predicting on
References
Previous
Next