# Jitter, Convolutional Neural Networks, and a Kaggle Framework

In this post, we’re going to look at the Digit Recognizer challenge from Kaggle. This challenge uses the MNIST dataset of handwritten digits. We’re going to train a Convolutional Neural Network with Keras to recognize the digits. We will employ some preprocessing steps to improve the generalization of our model.

In our analysis, we’ll use pandas, NumPy, and even a tool from sci-kit learn. A side goal is to build a framework that could be used in other Kaggle competitions as well. The key idea is to make a validation set from the training set to test your ideas, rather than submitting to the leaderboards.

## Prepare the Training, Validation, and Test Set

When we take a look at the training set, we see that the label column has multiple values. Our eventual goal is to use a SoftMax layer for our network, so we will need to convert it to multiple columns with binary values. Because we are going to modify the images as we train, we need to use a separate validation set to see the progress of the Neural Network, as opposed to using Keras built in functionality. Out of convenience, we will use sklearn’s train_test_split function. We will use pandas get_dummies to convert the labels column to multiple values.

   0  1  2  3  4  5  6  7  8  9
0  0  1  0  0  0  0  0  0  0  0
1  1  0  0  0  0  0  0  0  0  0
2  0  1  0  0  0  0  0  0  0  0
3  0  0  0  0  1  0  0  0  0  0
4  1  0  0  0  0  0  0  0  0  0


## Model Definition

We are going to set up a simple CNN with two convolutional layers. The architecture will be:

• Convolutional Layer (32x3x3) -> ReLu Activation (x2)
• MaxPooling (2,2) -> Dropout (.25)
• Fully Connected Layer (128 Units) -> ReLu Activation -> Dropout (.5)
• Fully Connected Layer (10 Units) -> SoftMax

This is not a particularly creative architecture, and in fact, it’s the same architecture used in Keras’ example.

## Preprocessing

An important piece of this post is the preprocessing we will do to the images. We are going to introduce random noise (jargon: jitter) to the image. We’re going to randomly apply the jitter in the following ways:

• Deleting a column
• Deleting a row
• Shifting the image
• Rotating the image

We will use NumPy for all of these effects. We will perform each action with probability $$p=.7$$. The probability that an image is perturbed is $$1-(1-.7)^4 = .99$$. With a training set of over 32000, we expect approximately 320 images to stay the same. We will repeat this process each epoch.

## Training

In the interest of time, I will only train two epochs. Each epoch we copy the training set so that we can modify the array itself when we apply the jitter. The first epoch we use the original training set, but every epoch afterwards we use the perturbed data set.

Train on 37800 samples, validate on 4200 samples
Epoch 1/1
37800/37800 [==============================] - 508s - loss: 0.2920 - acc: 0.9100 - val_loss: 0.0816 - val_acc: 0.9710
Epoch 00000: val_loss improved from inf to 0.08161, saving model to auto_save_weights.hdf5
Train on 37800 samples, validate on 4200 samples
Epoch 1/1
37800/37800 [==============================] - 506s - loss: 0.2615 - acc: 0.9199 - val_loss: 0.0529 - val_acc: 0.9807
Epoch 00000: val_loss improved from 0.08161 to 0.05290, saving model to auto_save_weights.hdf5


## Predicting on the Test Set

If we were going to submit this model to the competition, then we’d want to retrain the model on the full training set when we are happy with our validation score. If we were using a less time consuming approach, we could choose to use cross validation rather than a validation set. Either way, it’s important to train the final model on the full data set.

But it will not make a difference for our demonstration, so we’ll make our predictions on the test set. As a sanity check, we will validate a prediction by visualizing the input image and comparing to our prediction.

Predicted value: 0


Written on March 12, 2016