Get started quickly with Keras-create your first personal "Santa Claus" image classification model

Get started quickly with Keras-create your first personal "Santa Claus" image classification model

2017 has reached the end of the last month, will Christmas be far behind? I don t know what arrangements you have for Christmas or some good memories. What I remember most clearly is those packaged apples the night before Christmas every year, which means peace and safety. When it comes to Christmas, the indispensable protagonist-"Santa Claus" will appear in various streets and alleys and various picture albums. This article will lead readers to use Keras to complete the classification of "Santa Claus" images, which is considered before Christmas. Warm-up activities.

Before introducing the official content, readers can read this content first:

How to use Google images to obtain training data

In the first part of this tutorial, I will introduce the data set used in this article; secondly, use Python and Keras to train a convolutional neural network model that can detect whether Santa Claus is present in an image. The selected network structure is similar to LeNet network; Finally, evaluate the model built in this article on a series of images, and then discuss the limitations of the method in this article and how to expand it.

"Santa Claus" and "Non-Santa Claus" data sets

In order to train the built model, this article needs two types of image sets:

  • The image contains Santa Claus ("Santa Claus");
  • Image does not contain Santa Claus ("Not Santa Claus")

Last week, we used Google Images to quickly obtain a training image dataset. The dataset contains 461 images of Santa Claus, as shown in Figure 1 (left); in addition, from UKBench data set randomly acquired 461 does not include Santa Claus, as As shown in Figure 1 (right).

The first image classifier based on convolutional neural network and Keras

As shown in Figure 2, this graph is a typical Lenet network structure. It was originally used to classify digital handwriting, but now it is extended to other types of images. This tutorial mainly introduces how to apply deep learning to image classification, so Keras and Python statements will not be introduced in detail. Interested readers can read the book Deep Learning for Computer Vison with Python . First define the network architecture. Create a new file and name it lenetpy, and insert the following code:
Lines 2-8 are the Python packages that need to be imported, where conv2d means performing convolution, maxpooling2d means performing maximum pooling, Activation means a specific activation function type, and the Flatten layer is used to "flatten" the input for the convolutional layer Transition to the fully connected layer, Dense means the fully connected layer. The real creation of the Lenet network structure is lines 10-12 of the code. Whenever a new convolutional neural network structure is defined, I like:

  • Put it in its own class (for namespace and ease of organization)
  • Create a static construction function to complete the establishment of the entire model 

A large number of parameters are required to build the model:

  • weight: the width of the input image
  • height: the height of the input image
  • depth: the number of channels of the input image (1 means single-channel image grayscale, 3 means standard RGB image)
  • claclasses: the total number of layer categories that you want to organize

Lines 1 and 4 define our model, line 15 initialize inputshape, and lines 18-19 update inputshape normally

Now that we have initialized our model, we can start adding other layers, the code is as follows: Lines 21-25 create the first CONV->RELU->POOL layer. The convolutional layer uses 20 filters of size 5x5, followed by the RELU activation function, and finally uses the maximum pooling operation with a window size of 2x2; Define the second CONV->RELU->POOL layer: This time the convolutional layer uses 50 filters, and the increase in the number of filters deepens the entire network architecture. The final code block is to "flatten" the data to connect the fully connected layers: Line 33 can squash the output of the maxpooling2d layer into a single vector; Line 34 shows that the fully connected layer contains 500 nodes, followed by a ReLU activation function; Line 38 defines another fully connected layer, the nodes of this layer The number is equal to the number of categories, and the Dense layer is sent to the softmax classifier to output the probability value of each category;

Line 42 returns the calling function of the model;

Use Keras to train a convolutional neural network image classifier

Open a new file and name it train_networkpy, and insert the following code to open

Lines 2-18 import the data package needed by the program;

The following begins to parse the command line parameters:

There are two required command line parameters, --dataset and --model, and the path selection of the accuracy/loss graph. Among them --dataset represents the training set of the model, --model represents the model saved after training the classifier, if --plot is not specified, the default is plot.PNG.

Next, set some training variables, initialize the list, and set the image path:

Lines 32-34 define the training times, initial learning rate and batch size of the model;

Lines 38 and 39 initialize the data and label lists, these lists correspond to the stored images and category labels;

Lines 42-44 get the path of the input image and randomly scramble the image;

Now preprocess the image:

This loop simply re-adjusts the size of each image to 28 28 size (the space required by LeNet)

The label can be extracted because our data directory structure is as follows:

Therefore, an example of imagePath is:

Extract tags from ImagePath, the result is:

Next, divide the data set into training data set and test data set:

Line 61 further preprocesses the input data, and scales the data points [0, 255] to the range of [0, 1] according to the ratio;

Then lines 66-67 use 75% of the data as the training set and 25% of the data as the test set; lines 70-71 perform one-hot encoding on the labels; then, increase the amount of data through the following operations:

Lines 74-76 create an image generator to randomly rotate, move, flip, cut, etc., the images of the data set. This operation allows us to achieve good results with a smaller data set.

Continue to learn more about Keras training image classifier:

Lines 80-83 use the Adam optimizer. Since this article is a two-classification problem, binary cross-entropy loss function (binary cross-entropy) can be used. But if the classification task performed is more than two types, the loss function-number is replaced by category cross entropy (categorical_crossentropy)

Lines 87-89 call model.fit_generator to start training the network, line 93 saves the model parameters, and finally draws the performance results of the image classifier:

In order to train the network model, you need to open a terminal and execute the following commands:

It can be seen that when the network is trained for 25 rounds, the test accuracy of the model is 97.40%, and the loss function is also very low, as shown in the following figure:

Evaluation of Convolutional Neural Network Image Classifier

Open a new file and name it test_networkpy, and then start the evaluation:

Lines 2-7 import the required data package, and note that the imported load_model is the model saved during the training process.

Next, parse the command line parameters:

Two command line parameters are required: --model and input --image, and then load image preprocessing:

The preprocessing is almost the same as the previous one. There is no more explanation here, except that the 25th line adds an extra dimension to the data through np.expand_dims. If you forget to add the dimension, it will cause an error when calling model.predict. Now load the image classifier model and make predictions:

Line 29 loads the model and line 32 makes a prediction. Finally draw the avatar and predict the label:

The 35th line creates the label, the 36th line selects the corresponding probability value, the 37th line displays the label text in the upper left corner of the image, the 40-42th line adjusts the image size to the standard width to ensure that it fits the computer screen, and finally, the 45th The output image is displayed in line, and line 46 indicates that the display ends when a key is pressed.

The following are the results of an experiment that includes an image of Santa Claus:

The following are the results of the experiment without Santa Claus image:

Limitations of the image classification model in this article

The image classifier in this article has some limitations:

The first is that the input image size is 28 28 very small. Some sample images (the Santa Claus itself is already very small in the image) are adjusted to 28 28 to greatly reduce the size of Santa Claus. 

The optimal convolutional neural network normally accepts an input image size of 200-300 pixels, so some larger images will help us build a more powerful image classifier. However, the use of larger resolution images will increase the depth and complexity of the network model, which will mean that more training data needs to be collected, as well as an expensive computational training process.

Therefore, if you want to improve the accuracy of the model in this article, you have the following four suggestions:

  • Collect more training data (more than 5000 "Santa Claus" images);
  • Use high-resolution images in training. Images with 64 64 and 128 128 pixels may have better results;
  • Use a deeper network architecture during training;
  • Read Deep Learning for Computer Vision with Python , there are more details about custom data sets and other content;


  • This article teaches you how to use Keras and Pyhton to train the LeNet model and use it to classify the image of Santa Claus. The ultimate goal can be to build an application similar to Not Hotdog ;
  • The "Santa Claus" image data set (460 images) was obtained according to the previous tutorial- collecting deep learning images through Google Pictures , while the "No Santa" image data set was selected from the UKBench data set ;
  • Evaluate the network model built in this paper on a series of test images. In each case, the model in this paper can correctly classify the input image.

author information


Adrain Rosebrock, entrepreneur, PhD, specializes in image search engines.


This article was recommended by teacher @ - and translated by Alibaba Cloud Cloud Community .

Article was originally titled "Image classification with Keras and deep learning", Author: Adrain Rosebrock, Translator: Begonia, review:

The article is a simplified translation, for more detailed content, please view the original text