Introducing Matrix Filters
- A matrix filter is just a matrix of numbers
- These numbers could be positive or negative numbers
- These numbers are usually small, whole numbers
- Some filters may use decimal fractions or much larger numbers
- New images are created by transforming some old image via a matrix filter
- Thus, a matrix filter represents a transformation of an old image
- A new image represents the output of that transformation
-
The transformation takes the following into account:
- What is a given pixel value from the old image?
- What are the surrounding pixel values of that given pixel from the old image?
- We will denote a matrix filter as the following:
Introducing the Convolution Operation
- In order to perform the transformation represented by our matrix filter, we need to perform a special mathematical operation on the new image
- In machine learning, we refer to this operation as convolution
- Mathematicians refer to this operation as cross-correlation
- Roughly, we can think of convolution as a measure of similarity
- Specifically, it measures the similarity of groups of pixels
- The following is an example of convolution:
Using Convolution in Edge Detection
- We can use the convolution operator and matrix filters for many different purposes
- For example, we can convolve an image with a filter to perform edge detection on that image
- Different filters will lead to different ways to detect edges
- Specifically, we can detect vertical edges in an image using the following filter:
- We can detect horizontal edges in an image using the filter:
- There are a few other filters that allow us to better detect vertical edges in more complicated images:
Convolution in Deep Learning
- With the rise of deep learning, we don't need to have computer vision researchers handpick the numbers within our matrix filter
- If we treat those numbers as parameters, then we can learn them using backward propagation
- Then, our filter can become the standard vertical, sobel, or scharr filter depending on which filter best detects edges in our image
- More importantly, our filter could become a different matrix that captures edges in our complicated image even better than any of those three hand-coded filters
- In this case, our filter could learn edges at , , , or any other orientation
- Fundamentaly, the convolution operation allows backward propagation to learn the best filter for our image and use-case
Motivating Issues with our Example
- Let's return to the previous example involving convolution
- We essentially convolved the following matrices:
-
From this, we can gather two issues:
-
Our resulting image shrinks
- This is a bigger problem when we apply convolution on the convolved image many times
- In this case, our image could shrink down to a single pixel
-
There is less influence on pixels in the corners and edges
- Our filter is applied to the pixels in the center of the image more often than the pixels along the edges
- In other words, the information from pixels on the corners and edges of the image is thrown out, roughly speaking
-
Introducing Padding
- Padding can solve these two issues
- Padding refers to adding a border around an input image so we can get the desired dimensions in our output image
- Specifically, the border has a width of number of pixels
- Typically, the values of the padded border are assigned to
- Using the previous example from above, padding the input image and convolving the padded image with our filter would look like:
- We can see the dimensions of this output image is
- Notice, these are the same dimension as our original input image
- Here, the padding so we can return a image
- However, the amount of padding will need to change depending on the dimensions of our filter matrix
- To return an image with the same dimensions as our input image, we should set to be the following:
- Note, the dimensions of our filter matrix are typically odd
-
There are two reasons that is usually odd:
-
is always symmetrical
- In other words, we don't need to pad more on the left or pad more on the right when is odd
-
The filter matrix has a central position
- Then we can refer to a position of a filter
- It's nice to have a central position or a central pixel that we can refer to
-
Introducing Strides
- There is one more operation that is often used in convolutional neural networks
- That operation is called stride
- A stride refers to the number of pixels that is skipped between each step associated with convolution
-
The dimensions of our output image is governed by the following:
- : the dimension of our input matrix
- : the dimension of our filter matrix
- : the amount of padding applied to the input matrix
- : the stride taken during our convolution
- In other words, our output image has dimensions
- The symbols and refer to the floor operation
- In other words, is always rounded down if it isn't an integer
Motivating Convolutions over Volumes
- We can convolve over a stack of images known as a volume
- For example, we can convolve over a RGB image
- In this case, we would create a volume, where each of the layers represents a red, green, or blue channel
- Notice, the number of channels in our image must match the number of channels in our filter
- We can apply multiple filters to a single volume
- This will lead to our output image having multiple channels:
- In theory, we can have a filter that only looks at certain channels
- In other words, we can have a filter that only looks at the red channel, blue channel, or green channel
- For example, a filter only looking for vertical edges in the red channel could look like the following:
- On the other hand, a filter that looks at vertical edges in each channel could look like the following:
Example of Convolution over Volumes
Alternative Visualization of Volumes
- We can also visualize images and filters as cubes
- Each pixel can be represented as a small block
- The following is an example of our stack represented using cubes:
tldr
- A matrix filter is just a matrix of numbers
- These numbers could be positive or negative numbers
- These numbers are usually small, whole numbers
- Some filters may use decimal fractions or much larger numbers
- New images are created by transforming some old image via a matrix filter
- Thus, a matrix filter represents a transformation of an old image
- A new image represents the output of that transformation
- Fundamentaly, the convolution operation allows backward propagation to learn the best filter for our image and use-case
References
Next