Introducing the XOR Function
- We can model the XOR function using a feedforward network
- The XOR function takes in two binary values and
- The XOR function returns when exactly one of its binary inputs is equal to
- Otherwise, the function returns
Illustrating an XOR Problem
- The most classic example of a linearly inseparable pattern is a logical exclusive-or (XOR) function
- In these situations, there are typically two distinct classes of data points that can't be linearly separated initially
- For example, we may have the following data:
- We can model our data using an XOR gate
- A linear model applied directly to the original input cannot implement the XOR function
- However, a linear model can solve the problem in our transformed space, where the transformed space is represented by the features extracted by a neural network
- In our sample solution, the two points that must have output 1 have been collapsed into overlapping points in feature space
Representing XOR as a Feedforward Network
- In most cases, the true target function is unknown
- However, we know the target function is the XOR function in this case:
- In this example, there are only four possible inputs:
- Therefore, we will train the network on all four of these points
- To introduce the concept of loss functions, we'll treat this problem as a regression problem
- Since we're treat this problem as a regression problem, we can use the mean squared error is our loss function
- However, in practical applications, MSE is usually not an appropriate loss function for modeling binary data
- The MSE loss function in our case is the following:
- If we assume our model is a linear model, then we could define the form of our model as the following:
- With respect to and , we can minimize iteratively using gradient descent or in closed form using the normal equations
Defining the XOR Architecture
- We may decide to model the XOR function using a three-layer neural network
- Our network could have two inputs, a hidden layers with two nodes, and an output layer with one node
- In this case, we can describe an affine transformation from a vector to a vector to a vector
- We need a vector of weights for our inputs and another vector of weights for our hidden layer outputs
- We need a vector of biases for our hidden layer and another vector of biases for our output layer:
- Since we only have four data points, our network, inputs, and output looks like the following:
0 | 0 | 0 | 0 | 0 |
0 | 1 | 1 | 0 | 1 |
1 | 0 | 1 | 0 | 1 |
1 | 1 | 1 | 1 | 0 |
tldr
- Neural networks are good at modeling many types of functions
- These functions include AND, OR, XOR, etc.
References
Previous
Next