Pooling Layer

Describing Max Pooling

  • Max pooling attempts to create an abstract representation of an image using fewer dimensions
  • For each iteration of convolution, max pooling involves calculating the maximum value from the portion of values covered by the filter
  • We take the max of certain regions in order to detect some feature (given by a filter) in our input image
  • A high value tends to indicate that a feature exists in that region
  • This will reduce the dimensionality of an input image
  • Meaning, max pooling layers benefit in the following ways:

    • They are less prone to overfitting (i.e. accuracy benefit)
    • They don't require parameter learning (i.e. speed benefit)

downsampling

Introducing Common Pooling Methods

  • There are two forms of pooling:

    • Max pooling
    • Average pooling
  • Average pooling is rarely ever used
  • Average pooling is only sometimes used to reduce the dimensions of an image, while attempting to best capture the image's properties

Implementing Max Pooling

  • Max pooling doesn't rely on any learnable parameters
  • Instead, max pooling only relies on the following hyperparameters:

    • f[l]f^{[l]}: The size of the filter in the lthl^{th} layer
    • p[l]p^{[l]}: The amount of padding in the lthl^{th} layer
    • s[l]s^{[l]}: The stride in the lthl^{th} layer
  • We almost always set p=0p=0
  • The most common choices of hyperparameters are the following:

    • f=2f=2 and s=2s=2 (and p=0p=0)
    • f=3f=3 and s=2s=2 (and p=0p=0)
  • An input image will have the following dimensions:
nh[l]×nw[l]×nc[l]n_{h}^{[l]} \times n_{w}^{[l]} \times n_{c}^{[l]}
  • An output image will have the following dimensions:
nh[l]fs+1×nw[l]fs+1×nc[l]\lfloor \frac{n_{h}^{[l]}-f}{s} + 1 \rfloor \times \lfloor \frac{n_{w}^{[l]}-f}{s} + 1 \rfloor \times n_{c}^{[l]}
  • The follow is an example of max pooling:

maxpool


tldr

  • Max pooling attempts to create an abstract representation of an image using fewer dimensions
  • For each iteration of convolution, max pooling involves calculating the maximum value from the portion of values covered by the filter
  • We take the max of certain regions in order to detect some feature (given by a filter) in our input image
  • A high value tends to indicate that a feature exists in that region
  • This will reduce the dimensionality of an input image
  • Meaning, max pooling layers benefit in the following ways:

    • They are less prone to overfitting (i.e. accuracy benefit)
    • They don't require parameter learning (i.e. speed benefit)

References

Previous
Next

Convolutional Layer

LeNet-5 Implementation