blog posts

What is Convolution Neural Network and How Does it Work ?

Torsional neural network or convolutional neural network! It sounds like a weird combination of words! A combination of different sciences such as mathematics, biology, and computer. These networks are one of the most important innovations in computer vision considered. The word neural network became very popular in 2012; This year, Alex Krichevsky used the neural network to win the ImageNet (Annual Computer Vision Olympics) award. Cherishevsky reduced the classification error from 26 percent to 15 percent.

شبکه عصبی کانولوشن / convolutional neural network

This reduction was very significant and was considered a great success. Since then, many companies have used deep learning as the core of their products. Facebook uses a neural network to tag images automatically; Google also uses this technology to search images. Companies such as Amazon, Instagram, and Pinterest also use the convolution neural network (CNN neural network) to provide relevant offers to their users; However, image processing is the most common use of neural networks.

What is the requirement to use a convolution neural network? 

Image classification is a process in which we take several images from the input. At the output, we specify their class (dog, car, house, etc.) or the probability of belonging to each class. This is almost a matter of course; from the time we are born until we become an adult, we learn it well over time and naturally. We can recognize everything around us without any mistakes. More precisely, whenever we look at our surroundings, we recognize all the objects and assign a label to each of them. Doing so, that is, identifying and naming objects in an environment, is not so easy for the computer!

Input and output on a convolutional neural network or CNN

When a computer receives an image as input, it sees it as an array of numbers. The number of arrays depends on the size of the image (based on pixels). For example, if we give a color image in JPG format and a size of 480 x 480 pixels to the computer, its replacement array will have 480 x 480 x 3 cells (the number 3 goes back to RGB). Each house also has a number between 0 and 255.

This number indicates the pixel intensity. Although these numbers seem meaningless to us, in classifying images using the convulsive neural network, the only tools in our hands are such numbers. The basic idea is to give the computer an array of numbers similar to what we described. The computer identifies something like this in the output: This image is 80% probability of a cat, 15% probability of a dog, and 5% probability of a bird.

What is the function of a convulsive neural network?

So far, we have become familiar with the issue at hand and the input and output. Let’s think about how to solve the problem. We want the computer to look at the pictures and recognize the unique features of a particular object, such as a book, and recognize whether the book is in the picture or not. We humans also do this process subconsciously when recognizing objects. For example, when we see a dog, to distinguish it, we first look at its more detailed limbs, such as its corners, claws, legs, and so on.

As we adapt to the patterns in our minds, we realize we see a dog. To understand complex images, such as a dog, a computer first recognizes the simpler features of that image, such as edges and curves. In a neural network, there are several layers; In each of these layers, specific features are detected, and finally, in the last layer, the image is fully identified. The process we described was how a convulsive neural network works; We will now go into more detail.

The connection between torsional neural network and biology!

We want to address some of the more basic concepts in this section. The first time you hear the term convulsive neural network, you probably remember biology and neuroscience. Of course, you have not gone too far! The brain’s visual cortex inspires the structure of the torsional neural network (CNN). In 1962, two scientists, Hubble and Wiesel, conducted an interesting experiment. They showed that by seeing the edges in different shapes, certain cells in the brain’s visual cortex are stimulated. For example, seeing horizontal lines stimulates certain cells.

By seeing the lines perpendicular to each other, different cells show sensitivity. Hubble and Wiesel found that these cells were arranged in columns and very neatly together and that the result of their cooperation was that we could have a good visual perception of our surroundings. The basis of the explosive neural network is like the visual cortex of our brain! In fact, on a CNN, there are several layers, each specific to identify specific items. Finally, the output of the image perception model is complete.

What is the structure of a torsional neural network?

As mentioned, in a torsional neural network, the computer takes an image as input; This image then enters a complex network with several torsional and nonlinear layers. In each of these layers, operations are performed, and at the end, a class or percentage of occurrence of several different classes is displayed on the output. The hard part is the middle ground and how they work! In the following, we will examine the most important layers.

The first layer in the convolutional neural network Mathematical concepts

The first layer in a torsional neural network is always a convolutional layer. As mentioned earlier, the input of this layer is an array of numbers. The first layer in the neural network works as a flashlight! In a dark room, imagine a flashlight that we project in the upper left corner of the image, and a range of the image lights up, and we see that part. Then we take the flashlight on the other parts of the image to see at least the whole image.

Now let’s tell the stories we told in machine learning language! This flashlight is called a filter (or neuron or kernel). The image that the flashlight shines on is called the receptive field. It should be noted that filters are themselves arrays of numbers. The numbers in the filter are called weight or parameters. It should be noted that the depth of this filter should be equal to the depth of the image. For example, the filter should be the same if the image is a 5 * 5 * 3 array (depth 3).

The filter sees a part of the image in each view. It then moves over the image to scan other areas as well. This motion of the filter on the image is called convolve.

As the filter passes through image, the numbers in the filter are multiplied by the numerical array of image pixels. Finally, all the results of the beats are added together, and we get a number. Suppose we want to see an image with dimensions of 32 * 32 * 3 using a 5 * 5 * 3 filter. With the operation we described, this filter finally produces a numeric array with dimensions of 1 x 28 x 28 (the reason it is 28 x 28 is that in 784 modes, you can see a 32 x 32 image using the 5 x 5 filter). The resulting matrix of 28 * 28 * 28 is finally called the activation map or feature map. If we use two filters instead of one, we end up with a matrix with 28 x 28 x 2. This can increase our accuracy in higher dimensions.

The first layer in the convolutional neural network Practical concepts

Let’s take a look from above and see what CNN does. Each of the filters mentioned in the previous section can be considered a feature identifier. Feature here means things like a straight line, a simple color, or a curvature. Suppose the first filter is a 7 x 7 x 3 filter and a curvature detector. This filter is a numerical matrix like the one below, in which the values ​​of this matrix have higher numerical values ​​in places where there is curvature. Now we place this filter on the part of our image. Then, as in the image below, we multiply the numbers in the cells by the sum and the multiples.

As you can see, the result is a large number. The large number indicates a curvature in this area, such as the curvature of the filter.

In the image below, the product is multiplied by a small number because the filter does not match the input image. As we mentioned, we are looking for an activation map; That is, an array of numbers with dimensions of 26 * 26 * 1 (suppose we use only one curve detector filter). The top left of this activation map will be 660. This large number indicates that there is likely to be a curve in a particular image area. Note that we only used one filter here. To extract more information from the image, we need to use more filters; more filters mean higher dimensions.

Deeper layers of convolutional neural network

In addition to the layer described in a neural network, there are other layers. These layers have different tasks and functions. In general, the inner layers are responsible for maintaining and maintaining nonlinear dimensions and affairs. The last layer in the convulsive neural network is also of particular importance.

The last layer in the torsional neural network

In the last layer of a convolutional neural network, the output of the other layers is received as input. The output of the last layer is a next N vector. N is the number of classes available. For example, if your network is a network for identifying numbers, the number of classes is tens; Because we have ten digits. Each component represents the probability of a class occurring in the next N vector. The last layer of a neural network convulses is that it looks at the properties of the upper layer layers and compares the degree to which these properties correspond to each class; The greater the match, the higher the probability that the class will occur.

How does a convulsive neural network work?

So far, you have learned a lot about torsional neural networks; But you probably still have many questions, and new questions have formed in your mind. Questions such as how filters are made or how the computer can assign appropriate values ​​to the filters during a training process. This process is called backpropagation. We humans did not understand the things around us when we were born. Over time, we saw different objects, and those around us told us the names of those objects, and we learned. Computers have a similar function; at the beginning, the numbers in the filter matrix are random and random. Over time, by displaying different images to the computer, the numbers in the filter are corrected to achieve an acceptable performance.

CNN Neural Network Test

Once our model is finalized, it’s time to test. To test the model, many images are used, so we know what its contents are. We give the image to the model’s input to show us the output; We then check the output to see if it works properly.

How do companies use convolutional neural networks?

Today, companies that collect more data are sure to beat their competitors. Data is the gold of the new world! The more data we add to a deep learning model, the more the model modifies and ultimately performs better. Companies like Facebook (Instagram) and Pinterest can use huge amounts of video data to build unique models.