Skip to content

TensorFlow implementation of "Very Deep Convolutional Networks for Large Scale Image Recognition", model fine-tuned and adapted for CIFAR-10.

Notifications You must be signed in to change notification settings

eltonlaw/vgg-cifar10

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VGG

TensorFlow implementation of the model described in "Very Deep Convolutional Networks for Large Scale Image Recognition"

Model adapted for CIFAR-10.

Running the script

  1. Clone this repo:

    $ git clone https://github.com/eltonlaw/vgg-cifar10.git
    $ cd vgg-cifar10
  2. Run the setup script. Here's what it does: 1) Creates a virtual machine and installs dependencies 2) Downloads and unzips dataset 3) Creates a logs directory to direct all model output.

    sh setup.sh
  3. Run the model(s):

    $ python3 vgg_original.py

    Default parameters are the ones I found to work best after fine-tuning. To change, them just pass values through the command line.

    • learning_rate
    • batch_size
    • epochs
    $ python3 vgg_original.py --learning_rate 1e-5 --batch_size=256 --epochs 100

About this Implementation

Note: The original paper was performed on scaled-down ImageNet images (following the AlexNet architecture). I first experimented with scaling each image to (224,224,3) using the original parameters from the paper. This was followed by a round of attempts at fine-tuning these parameters. Another experiment was ran on the original, non-scaled images in the same 'spirit' of the original paper (using small filters and deep architecture). To see the results, look below.

Accessing the Data

The dataset used is CIFAR-10, which consists of 60,000 32x32 RGB images in 10 classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck.

After unpickling, you get a 10,000 x 3072 numpy array. The images have been flattened into a 1D 3072 vector. The standard format for images is 32x32x3 so I reshaped each 1D vector and plotted it.

...
img = np.reshape(img, [32, 32, 3])
...

The data is tiled, but luckily someone on StackOverflow knows. Basically, it has to do with the order in which the data is reshaped. The default for a numpy reshape is C which means to read/write elements in C-like index order. Using a Fortran-like index order, F will solve the problem.

...
img = np.reshape(img, [32, 32, 3], order="F")
...

Awesome, they actually look like images now. For some reason everything's rotated 90 degrees counterclockwise. This won't affect classification accuracy so it's not a big problem unless we want to view the images or do transfer learning. Let's say we do (it's not too hard anyways).

...
img = np.reshape(img, [32, 32, 3], order="F")
img = np.rot90(img, k=3)
...

Perfect. The labels correspond correctly and everything else looks fine. We can move on to the machine learning now.

Data Preprocessing

Data preprocessing consists of just a standardization step.

"The only pre- processing we do is subtracting the mean RGB value, computed on the training set, from each pixel."

Experiments

1 - Rescaling to (224, 224, 3) and using the original VGG model

Results

...

References

A. Krizhevsky. Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto, 2009.

A. Krizhevsky. cuda-convnet. https://code.google.com/p/cuda-convnet/, 2012.

A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet classification with deep convolutional neural networks. In NIPS, 2012.

I. Sutskever, J. Martens, G. E. Dahl, and G. E. Hinton. On the importance of initialization and momentum in deep learning. In ICML, volume 28 of JMLR Proceedings, pages 1139–1147. JMLR.org, 2013.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, pages 1929–1958, 2014.

V. Nair and G. Hinton. Rectified linear units improve restricted boltzmann machines. In Proc. 27th International Conference on Machine Learning, 2010.

Y. Boureau, J. Ponce, and Y. LeCun. A Theoretical Analysis of Feature Pooling in Visual Recognition. In International Conference on Machine Learning, 2010.

About

TensorFlow implementation of "Very Deep Convolutional Networks for Large Scale Image Recognition", model fine-tuned and adapted for CIFAR-10.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published