Introducing Feedforward Networks
- Up to now, we've been discussing neural networks where the output from one layer is used as input to the next layer
- These multilayer perceptrons are feedforward neural networks
- These models are called feedforward because information from flows through the intermediate computations defined by a function to produce the output
- In other words, information is only fed forward and never fed back
- Networks that include feedback connections are called recurrent neural networks
Motivating Feedforward Networks
- We assume that data is created by a data-generating process
- A data-generating process is the unknown, underlying phenomenon that creates the data
- This data-generating process is modeled by a true function
- We'll refer to this true function as
Defining Feedforward Networks
- The goal of a feedforward network is to approximate some unknown function with
- In this case, is the unknown, optimal classifier that maps an input to a category
- In this case, is a known, approximated classifier that maps an input to a category
- In other words, we estimate with
- A feedforward network defines a mapping and learns the value of the parameters that result in the best function approximation
Representing Feedforward Networks
- Feedforward networks are typically represented by composing together many different functions
- For example, the output values of our feedforward network may be defined as a chain of three functions:
- Where is called the first hidden layer
- Where is called the second hidden layer
- Where is called the output layer
Training Feedforward Networks
- models the training data
- does not model the data-generating process
- models the data-generating process
- We want to match
- The training data provides us with noisy, approximate examples of evaluated at different training points
- Each example is accompanied by a label
- In other words, we hope that the training data produce training labels that are close to
Training Layers of Feedforward Networks
- The training data directly specifies what the output layer must do at each point
- The goal of the output layer is to produce a value that is close to
- We don't want the output layer to produce a value that is always equal to , since we would be overfitting the noise
- The behavior of the other layers (i.e. hidden layers) is not directly specified by the training data
- The goal of the hidden layers is not to produce a value that is close to
- Instead, the goal of the hidden layers is to help the output layer produce a value that is close to
- In other words, the learning algorithm must decide how to use these hidden layers to best implement an approximation of
- These layers are called hidden layers because the training data does not show the desired output for each of these layers
Learning Nonlinear Functions
- Sometimes our is a nonlinear function of
- In this case, we will want to transform so that becomes a linear function of
-
We will usually want to apply the linear model to a transformed input , instead of applying a linear model to itself
- Here, is a nonlinear transformation
- We can think of as a new representation of
-
We can choose the mapping by using:
-
Generic feature mapping implicitely used in kernel functions
- These generic feature mappings are usually generalizations
- These generalizations usually produce poor predictions on a test set
-
Manually engineered functions
- Until the advent of deep learning, this was the dominant approach
- It requires decades of human effort for each separate task
-
Activation function used in deep learning
- The strategy of deep learning is to learn :
-
In this approach, we now have the following:
- Parameters that we use to learn
- Parameters that map from to the desired output
- This is an example of a deep feedforward network
-
Feature Mapping using Activation Functions
- This approach is the only one of the three that gives up on the convexity of the training problem, but the benefits outweigh the harms
- In this approach, we parametrize the representation as
- And, we use the optimization algorithm to find the that corresponds to a good representation
-
If we wish, this approach can capture the benefit of the first approach by being highly generic
- We do this by using a very broad family
-
Deep learning can also capture the benefit of the second approach by providing model customization
- Human practitioners can encode their knowledge to help generaliziation by designing families that they expect will perform well
- The advantage is that the human designer only needs to find the right general function family, rather than precisely the right function
tldr
- When we want to find nonlinear decision boundaries, we transform so that becomes a linear function of
- We will usually want to apply the linear model to a transformed input , instead of applying a linear model to itself
- Feedforward networks are multilayer perceptrons are feedforward neural networks where input data goes through functions to produce the output
- Data in feedforward networks are never fed backwards
References
Previous
Next