Overview of Steps for Neural Networks
- Obtain some response data and predictor data
- Initialize parameter values
- Propagate forwards to get predictions
- Compute the cost function to get the error
- Propagate backwards to get the gradient
-
Use gradient descent to repeatedly update parameters until is minimized
- This refers to repeating steps 3-5
Defining Notation
- The output of some activation function of the neuron in the layer is directly related to the output of the activation function in the layer
- We'll refer to the output of an activation function as:
- In other words, the output of an activation function is based on the output of an activation function of its previous layer, the weights of the current layer, and the bias of the current layer
- We can also refer to the weighted input of a neuron in the layer as the following:
- In other words, a term refers to the input going into our activation function of the neuron in the layer
Describing Forward Propagation
- The goal of forward propagation is to calculate layer-by-layer neuron activations until we get to the output
- Specifically, forward propagation is a layer-level function that receives input and outputs
- We can repeatedly forward propagate to get the output and compare it with the real value to get the error
- In other words, we can determine how well our neural network is behaving by calculating the error
- That error is found by determining our prediction
- We determine our using forward propagation
Describing Backward Propagation
- The goal of backward propagation is to estimate parameters and by computing the partial derivatives and of a cost function
- Specifically, backward propagation is a layer-level function that receives input and outputs , , and
- We can repeatedly backward propagate to find the partial derivative of the cost function with respect to each weight and bias
- Then, we'll use gradient descent to update and by evaluating those partial derivatives from our backward propagation step
- In summary, we propagate forwards to see how well our neural netowrk is behaving and finding the error
- After we find out that our network has error, we propagate backwards and use a form of gradient descent to update new values of weights and biases
- Then, we will propagate forwards again to see how well those weights are performing, and then propagate backwards and use gradient descent again to update the weights
- This will go on until we reach some minima for error value
Summarizing Steps of Neural Network
- The goal of forward propagation is to understand how changes in the weights and biases lead to change in the predictions
- The goal of backpropagation is to understand how changes in the weights and biases lead to changes in the error (i.e. cost function)
- The goal of gradient descent is to minimize the error by updating our weights and biases
Illustrating Forward and Backward Propagation
Color | Representation |
---|---|
Function | |
Input of backward and forward propagation | |
Output of inner activation and chain rule functions | |
Output of backward and forward propagation |
- Where dashed lines indicate cached values
- Where represents the initialized partial derivative
- Since we're using logistic regression:
- Where represents the following:
- Where represents the following:
- Although is not really used, represents the following:
- Where represents the following:
- Where represents the following:
Computing Components of Forward Propagation
- In forward propagation, our output is:
- We cache and output the following:
- These components look like the following:
Computing Components of Backward Propagation
- In backward propagating, our input is:
- Our output is:
- These components look like the following:
Coding Backward Propagation
# Binary classification with 2-layer
# neural network (single hidden layer)
sigmoid = lambda x: 1 / (1 + np.exp(-x))
def fprop(x, y, params):
# Follows procedure given in notes
W1, b1, W2, b2 = [params[key] for key
in ('W1', 'b1', 'W2', 'b2')]
z1 = np.dot(W1, x) + b1
a1 = sigmoid(z1)
z2 = np.dot(W2, a1) + b2
a2 = sigmoid(z2)
loss = -(y * np.log(a2) + (1-y) * np.log(1-a2))
ret = {
'x': x, 'y': y, 'z1': z1, 'a1': a1,
'z2': z2, 'a2': a2, 'loss': loss
}
for key in params:
ret[key] = params[key]
return ret
def bprop(fprop_cache):
# Follows procedure given in notes
x, y, z1, a1, z2, a2, loss = [fprop_cache[key] for key
in ('x', 'y', 'z1', 'a1', 'z2', 'a2', 'loss')]
dz2 = (a2 - y)
dW2 = np.dot(dz2, a1.T)
db2 = dz2
dz1 = np.dot(fprop_cache['W2'].T, dz2)
dz1 = dz1 * sigmoid(z1) * (1-sigmoid(z1))
dW1 = np.dot(dz1, x.T)
db1 = dz1
return {'b1': db1, 'W1': dW1, 'b2': db2, 'W2': dW2}
# Gradient checking
if __name__ == '__main__':
# Initialize random parameters and inputs
W1 = np.random.rand(2,2)
b1 = np.random.rand(2, 1)
W2 = np.random.rand(1, 2)
b2 = np.random.rand(1, 1)
params = {'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}
x = np.random.rand(2, 1)
y = np.random.randint(0, 2) # Returns 0/1
fprop_cache = fprop(x, y, params)
bprop_cache = bprop(fprop_cache)
# Numerical gradient checking
# Note how slow this is!
# Thus we want to use
# the backpropagation algorithm instead.
eps = 1e-6
ng_cache = {}
# For every single parameter (W, b)
for key in params:
param = params[key]
# This will be our numerical gradient
ng = np.zeros(param.shape)
for j in range(ng.shape[0]):
for k in xrange(ng.shape[1]):
# For every element of parameter matrix,
# compute gradient of loss wrt that
# element numerically using finite differences
add_eps = np.copy(param)
min_eps = np.copy(param)
add_eps[j, k] += eps
min_eps[j, k] -= eps
add_params = copy(params)
min_params = copy(params)
add_params[key] = add_eps
min_params[key] = min_eps
fprop_new = fprop(x, y, add_params)['loss']
fprop_min = fprop(x, y, min_params)['loss']
ng[j, k] = (fprop_new - fprop_min) / (2 * eps)
ng_cache[key] = ng
# Compare numerical gradients to those
# computed using backpropagation algorithm
for key in params:
print key
# These should be the same
print(bprop_cache[key])
print(ng_cache[key])
tldr
- Forward propagation is a layer-level function that receives input and outputs
- Backward propagation is a layer-level function that receives input and outputs , , and
- The goal of forward propagation is to understand how changes in the weights and biases lead to change in the predictions
- The goal of backpropagation is to understand how changes in the weights and biases lead to changes in the error (i.e. cost function)
- The goal of gradient descent is to minimize the error by updating our weights and biases based on those changes
References
Previous
Next