Motivating Derivatives
- Before diving straight into the backpropagation algorithm, we should understand why we care about the partial derivatives and
- Graphing out the computations of derivatives can help us understand the use of partial derivatives
- Specifically, we'll use a graph to determine the partial derivatives of with respect to and
- Essentially, a partial derivative says an increase of in the bottom value will cause the top value to increase by
Example of Computing Derivatives
- Suppose we have the following functions and inputs:
- Let's focus on the parameter for now
- Sometimes, we'll want to increase or decrease
- Changing will also change as a result
- Sometimes, we want to know how changes before we change
- If we plug in for instead of , then we'll notice:
- So, increasing by leads to a change in of
- If we change by a large number, then we'll notice:
- So, increasing by leads to a change in of
- If we keep changing by some value, then we'll notice that will always increase by times that value
- In other words, a one-unit increase in will lead to a three-unit increase in
- We call this a partial derivative:
- Usually, we don't want to manually adjust and observe changes in to find the partial derivative
- Luckily, we can use calculus to determine the partial derivative of some function with respect to parameter :
An Assumption about Derivatives
- As we saw previously, changing by some amount will lead to a change in by times that amount
- Stated another way, the partial derivative
- We can also determine partial derivatives of other values with respect to
- For example, if we change by again, then we'll notice:
- So, increasing by leads to a change in of
- In other words, they change by the same amount
- Therefore, our partial derivative looks like the following:
- Notice, can also change if we change or :
- Therefore, we're making an assumption when we notice a change in with a change in
- Specifically, we're assuming that both and remain fixed
- The partial derivative makes this assumption as well
- In other words, all partial derivates nudge the input associated with the denominator, observe changes in the function associated with the numerator, and hold all other parameters and functions fixed
Observing the Chain Rule
- We've already noticed the relationship between , , and when we made a change to and observed the change in :
- From this, we've seen that because influences , and influences
- In other words, partial derivatives are dependent on both the direct and indirect effects of parameters
- This concept is captured by the chain rule:
- The chain rule is the calculus we used for computing our partial derivatives previously
- This is a major step in the backpropagation algorithm
tldr
- A partial derivative says an increase of in the bottom value will cause the top value to increase by
- All partial derivates are an observed change in the function associated with the numerator
-
All partial derivatives find this change by:
- Nudging the input associated with the denominator
- Holding all other parameters and functions fixed
- The chain rule is a formula to compute the partial derivatives
- Its formula shows that partial derivatives are influenced by the direct and indirect changes of dependent parameters
References
Previous
Next