Orthogonalization

Introducing a Conceptual Framework for Deep Learning

  • We can represent the deep learning process in a few general steps

    1. Optimize a cost function JJ using any of the following:

      • Gradient descent
      • Momentum
      • RMSprop
      • Adam
      • etc.
    2. Reduce overfitting using any of the following:

      • Regularization
      • Getting more data
      • etc.
  • Most of the complication around deep learning stems from tuning the hyperparameters for each algorithm
  • We do this because our goal is to find two algorithms:

    • One algorithm that best optimizes the cost function
    • Another algorithm that best prevents overfitting

Optimizing a Cost Function

  • As stated previously, optimizing a cost function JJ is the first step in our framework
  • For this step, our only goal is to find the parameters ww and bb that minimize JJ
  • Common methods for optimizing a cost function include gradient descent and Adam
  • However, we can use other approaches, such as RMSprop, Momentum, etc.

Reducing Overfitting

  • As stated previously, reducing overfitting is the second step in our framework
  • For this step, our only goal is to reduce variance
  • Common methods for reducing overfitting include regularization and getting more data
  • However, we can use other approaches, such as early stopping, data segmentation, etc.

Defining Orthogonalization

  • Orthogonalization is not specific to deep learning
  • Orthogonalization is a general concept that carries a different meaning in mathetmatics, computer science, and debate
  • Roughly speaking, orthogonalization refers to components acting independently of each other
  • Said another way, two things are orthogonal if they act in isolation of each other

Relating Orthogonalization to our Framework

  • Here, orthogonalization refers to thinking of one task at a time
  • Generally speaking, organizing the deep learning process into two orthogonal tasks can help reduce a lot of the complication around deep learning
  • Specifically, organizing the deep learning process into two orthogonal tasks can help us with our development time and general intuition
  • In other words, we should be executing these two tasks independently when designing a neural network

tldr

  • Orthogonalization is not specific to deep learning
  • Roughly speaking, orthogonalization refers to components acting independently of each other
  • Said another way, two things are orthogonal if they act in isolation of each other
  • The deep learning process can be generalized into two orthogonal tasks:

    1. Optimizing a cost function JJ
    2. Reducing overfitting

References

Previous
Next

Dropout Regularization

Normalizing Inputs