Computing Derivatives

Motivating Derivatives

  • Before diving straight into the backpropagation algorithm, we should understand why we care about the partial derivatives J(w,b)w\frac{\partial J(w,b)}{\partial w} and J(w,b)b\frac{\partial J(w,b)}{\partial b}
  • Graphing out the computations of derivatives can help us understand the use of partial derivatives
  • Specifically, we'll use a graph to determine the partial derivatives of J(w,b)J(w,b) with respect to bb and ww
  • Essentially, a partial derivative Jv\frac{\partial J}{\partial v} says an increase of cc in the bottom value vv will cause the top value JJ to increase by c×c \times Jv\frac{\partial J}{\partial v}

Example of Computing Derivatives

  • Suppose we have the following functions and inputs:

derivativecomputation

  • Let's focus on the parameter aa for now
  • Sometimes, we'll want to increase or decrease aa
  • Changing aa will also change JJ as a result
  • Sometimes, we want to know how JJ changes before we change aa
  • If we plug 5.0015.001 in for aa instead of 55, then we'll notice:
a=55.001a = 5 \to 5.001 v=1111.001v = 11 \to 11.001 J=3333.003J = 33 \to 33.003
  • So, increasing aa by 0.0010.001 leads to a change in JJ of 0.0030.003
  • If we change aa by a large number, then we'll notice:
a=515a = 5 \to 15 v=1121v = 11 \to 21 J=3363J = 33 \to 63
  • So, increasing aa by 1010 leads to a change in JJ of 3030
  • If we keep changing aa by some value, then we'll notice that JJ will always increase by 33 times that value
  • In other words, a one-unit increase in aa will lead to a three-unit increase in JJ
  • We call this a partial derivative:
Ja=3\frac{\partial J}{\partial a} = 3
  • Usually, we don't want to manually adjust aa and observe changes in JJ to find the partial derivative
  • Luckily, we can use calculus to determine the partial derivative of some function JJ with respect to parameter aa:
J=3v=3a+3bcJ = 3v = 3a + 3bc Ja=3\frac{\partial J}{\partial a} = 3

An Assumption about Derivatives

  • As we saw previously, changing aa by some amount will lead to a change in JJ by 33 times that amount
  • Stated another way, the partial derivative Ja=3\frac{\partial J}{\partial a} = 3
  • We can also determine partial derivatives of other values with respect to aa
  • For example, if we change aa by 0.0010.001 again, then we'll notice:
a=55.001a = 5 \to 5.001 v=1111.001v = 11 \to 11.001
  • So, increasing aa by 0.0010.001 leads to a change in vv of 0.0010.001
  • In other words, they change by the same amount
  • Therefore, our partial derivative looks like the following:
va=1\frac{\partial v}{\partial a} = 1
  • Notice, vv can also change if we change bb or cc:
v=a+bcv = a + bc
  • Therefore, we're making an assumption when we notice a 0.0010.001 change in vv with a 0.0010.001 change in aa
  • Specifically, we're assuming that both bb and cc remain fixed
  • The partial derivative va\frac{\partial v}{\partial a} makes this assumption as well
  • In other words, all partial derivates nudge the input associated with the denominator, observe changes in the function associated with the numerator, and hold all other parameters and functions fixed

Observing the Chain Rule

  • We've already noticed the relationship between aa, vv, and JJ when we made a change to aa and observed the change in JJ:
a=55.001a = 5 \to 5.001 v=1111.001v = 11 \to 11.001 J=3333.003J = 33 \to 33.003
  • From this, we've seen that Ja=3\frac{\partial J}{\partial a} = 3 because aa influences vv, and vv influences JJ
avJa \to v \to J
  • In other words, partial derivatives are dependent on both the direct and indirect effects of parameters
  • This concept is captured by the chain rule:
Ja=Jvva\frac{\partial J}{\partial a} = \frac{\partial J}{\partial v} \frac{\partial v}{\partial a} Ja=3×1=3\frac{\partial J}{\partial a} = 3 \times 1 = 3
  • The chain rule is the calculus we used for computing our partial derivatives previously
  • This is a major step in the backpropagation algorithm

tldr

  • A partial derivative Jv\frac{\partial J}{\partial v} says an increase of cc in the bottom value vv will cause the top value JJ to increase by c×c \times Jv\frac{\partial J}{\partial v}
  • All partial derivates are an observed change in the function associated with the numerator
  • All partial derivatives find this change by:

    • Nudging the input associated with the denominator
    • Holding all other parameters and functions fixed
  • The chain rule is a formula to compute the partial derivatives
  • Its formula shows that partial derivatives are influenced by the direct and indirect changes of dependent parameters

References

Previous
Next

Cost Function

Backpropagation