I have been looking at training a Siamese network to predict if two images are similar or different. Siamese networks are a type of Neural network that contain a pair of identical sub-networks that share the same parameters and weights. During training, the parameters are updated identically across both subnetworks. Siamese networks were first proposed in 1993 by Bromley, et al in their paper Signature Verification using a Siamese Time Delay Neural Network. Keras provides an example of a Siamese network as part of the distribution.
My dataset is the INRIA Holidays Dataset, a set of 1491 photos from 500 different vacations. The photos have a naming convenition from which the groups can be derived. Each photo is numbered with six digits - the first 4 refer to the vacation and the last two is a unique sequence number within the vacation. For example, a photo named 100301.jpg is from vacation 1003 and is the first photo in that group.
The input to my network consist of image pairs and the output is either 1 (similar) or 0 (different). Similar image pairs are from the same vacation group. For example, the code snippet displays three photos - the first two are from the same group and the last one is different.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | from __future__ import division, print_function
from keras.preprocessing.image import ImageDataGenerator
from keras.utils import np_utils
from scipy.misc import imresize
import itertools
import matplotlib.pyplot as plt
import numpy as np
import random
import os
DATA_DIR = "../data"
IMAGE_DIR = os.path.join(DATA_DIR, "holiday-photos")
ref_image = plt.imread(os.path.join(IMAGE_DIR, "100301.jpg"))
sim_image = plt.imread(os.path.join(IMAGE_DIR, "100302.jpg"))
dif_image = plt.imread(os.path.join(IMAGE_DIR, "127202.jpg"))
def draw_image(subplot, image, title):
plt.subplot(subplot)
plt.imshow(image)
plt.title(title)
plt.xticks([])
plt.yticks([])
draw_image(131, ref_image, "reference")
draw_image(132, sim_image, "similar")
draw_image(133, dif_image, "different")
plt.tight_layout()
plt.show()
|
The following code snippet loops through the image directory and uses the file naming convention to create all pairs of similar images and a corresponding pair of different images. Similar image pairs are generated by considering all combination of image pairs within a group. Dissimilar image pairs are generated by pairing the left hand image of the similar pair with a random image from some other group. This gives us 2072 similar image pairs and 2072 different image pairs, ie, a total of 4144 image pairs for our training data.
Fearing that this might not be nearly enough to train my network adequately, I decided to use the Keras ImageDataGenerator to augment the dataset. Before Keras, when I was working with Caffe, I would manually augment my input with a fixed number of standard transformations, such as rotation, flipping, zooming and affine transforms (these are all just matrix transforms). The Keras ImageDataGenerator is much more sophisticated, you instantiate it with the range of transformations you will allow on your dataset, and it returns you a generator containing transformations on your input images images from a directory.
I have used the ImageDataGenerator previously to augment my dataset to train a simple classification CNN, where the input was an image and the output was a label. This is the default case the component is built to handle, so its actually very simple to use this. My problem this time was a litle different - my input is a pair of image names from a triple, and I wanted that the identical transformation be applied to both imaages. (This is not strictly necessary in my case, but can't hurt, and in any case I wanted to learn how to do this for another upcoming project).
It seems to be something that others have been looking for as well, and there is some discussion in Keras Issue 3059. In addition, the ImageDataGenerator documentation covers some cases where this can be done, using a pair of ImageDataGenerator instances that are instantiated with the same parameters. However, all these seem to require that you either enumerate the LHS and RHS images in the pair as 4-dimensional tensors (using flow()) or store them in two parallel directories with identical names (using flow_from_directory()). The first seems a bit wasteful, and the second seems incredibly complicated for my use case.
So I went digging into the code and found a private (in the sense of undocumented) method called random_transform(). It applies a random sequence of the transformations you have specified in the ImageDataGenerator constructor to your input image. In this post, I will describe an image generator that I built for my Siamese network using the random_transform() method.
We start with a basic generator that returns a batch of image triples per invocation. The generator is instantiated at each epoch, and the next() method is called to get the next batch of triples.
1 2 3 4 5 6 7 8 9 10 11 12 13 | def image_triple_generator(image_triples, batch_size):
while True:
# loop once per epoch
num_recs = len(image_triples)
indices = np.random.permutation(np.arange(num_recs))
num_batches = num_recs // batch_size
for bid in range(num_batches):
# loop once per batch
batch_indices = indices[bid * batch_size : (bid + 1) * batch_size]
yield [image_triples[i] for i in batch_indices]
triples_batch_gen = image_triple_generator(image_triples, 4)
triples_batch_gen.next()
|
This gives us a batch of 4 triples as shown:
[('149601.jpg', '149604.jpg', 1), ('144700.jpg', '106201.jpg', 0), ('103304.jpg', '111701.jpg', 0), ('133200.jpg', '128100.jpg', 0)]
Calling next() returns the next 4 triples. This is what happens after each batch.
1 | triples_batch_gen.next()
|
[('135104.jpg', '122601.jpg', 0), ('137700.jpg', '137701.jpg', 1), ('136005.jpg', '105501.jpg', 0), ('132500.jpg', '132511.jpg', 1)]
Next, we apply the ImageDataGenerator.random_transform() to a single image to see if it does indeed do what I think it does. My fear was that there needs to e some upstream initialization before I could call the random_transform() method. As you can see from the output, the random_transform() augments the original image into variants that are quite close and could legitimately have been real photos.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | datagen_args = dict(rotation_range=10,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
datagen = ImageDataGenerator(**datagen_args)
sid = 150
np.random.seed(42)
image = plt.imread(os.path.join(IMAGE_DIR, "115201.jpg"))
sid += 1
draw_image(sid, image, "orig")
for j in range(4):
augmented = datagen.random_transform(image)
sid += 1
draw_image(sid, augmented, "aug#{:d}".format(j + 1))
plt.tight_layout()
plt.show()
|
Next I wanted to see if I could take two images and apply the same transformation to both the images. I now take a pair of ImageDataGenerators configured the same way. The individual transformations that are applied to the image in the random_transform() method are all driven using numpy random number generators, so one way to make them do the same thing was to initialize the random number seed to the same random value for each ImageGenerator at the start of each batch. As you can see from the photos below, this strategy seems to be working.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | image_pair = ["108103.jpg", "112003.jpg"]
datagens = [ImageDataGenerator(**datagen_args),
ImageDataGenerator(**datagen_args)]
sid = 240
for i, image in enumerate(image_pair):
image = plt.imread(os.path.join(IMAGE_DIR, image_pair[i]))
sid += 1
draw_image(sid, image, "orig")
# make sure the two image data generators generate same transformations
np.random.seed(42)
for j in range(3):
augmented = datagens[i].random_transform(image)
sid += 1
draw_image(sid, augmented, "aug#{:d}".format(j + 1))
plt.tight_layout()
plt.show()
|
Finally, we are ready to build our final generator that can be plugged in to the Siamese network. I haven't built that yet, so there might be some changes once I try to integrate it in, but here is the first cut. The caching is because I noticed that it takes a while to generate the batches, so caching is hopefully going to spped it up.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | RESIZE_WIDTH = 300
RESIZE_HEIGHT = 300
def cached_imread(image_path, image_cache):
if not image_cache.has_key(image_path):
image = plt.imread(image_path)
image = imresize(image, (RESIZE_WIDTH, RESIZE_HEIGHT))
image_cache[image_path] = image
return image_cache[image_path]
def preprocess_images(image_names, seed, datagen, image_cache):
np.random.seed(seed)
X = np.zeros((len(image_names), RESIZE_WIDTH, RESIZE_HEIGHT, 3))
for i, image_name in enumerate(image_names):
image = cached_imread(os.path.join(IMAGE_DIR, image_name), image_cache)
X[i] = datagen.random_transform(image)
return X
def image_triple_generator(image_triples, batch_size):
datagen_args = dict(rotation_range=10,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
datagen_left = ImageDataGenerator(**datagen_args)
datagen_right = ImageDataGenerator(**datagen_args)
image_cache = {}
while True:
# loop once per epoch
num_recs = len(image_triples)
indices = np.random.permutation(np.arange(num_recs))
num_batches = num_recs // batch_size
for bid in range(num_batches):
# loop once per batch
batch_indices = indices[bid * batch_size : (bid + 1) * batch_size]
batch = [image_triples[i] for i in batch_indices]
# make sure image data generators generate same transformations
seed = np.random.randint(low=0, high=1000, size=1)[0]
Xleft = preprocess_images([b[0] for b in batch], seed,
datagen_left, image_cache)
Xright = preprocess_images([b[1] for b in batch], seed,
datagen_right, image_cache)
Y = np_utils.to_categorical(np.array([b[2] for b in batch]))
yield Xleft, Xright, Y
|
Here is a little snippet to call my data generator and verify that it returns the right shaped data.
1 2 3 | triples_batch_gen = image_triple_generator(image_triples, 32)
Xleft, Xright, Y = triples_batch_gen.next()
print(Xleft.shape, Xright.shape, Y.shape)
|
which returns the expected shapes.
(32, 300, 300, 3) (32, 300, 300, 3) (32, 2)
So anyway, this is all I have so far. Once I have my Siamese network coded up and running, I will talk about it in a subsequent post. I haven't heard about anyone using the ImageDataGenerator.random_transform() directly before, so I thought that it might be interesting to describe my experience. Currently the enhancements seem to be aimed at trying to continue to allow folks to use the flow() and flow_from_directory() methods. I am not sure if more specialized requirements will come up in the future, but I think using the random_transform() method instead might a good choice for many situations. Of course, it is quite likely that I may be missing something, so in case you know of problems with this approach, please let me know.
Have you made any more progress on this project? I am keen to see final results and full code.
ReplyDeleteHey! It's really interesting! waiting for the next part. By the way, where can I get the code? please link to gist.
ReplyDelete@JamesB - progress has been kind of slow because of competing priorities (work and personal). I will post in a few weeks hopefully.
ReplyDelete@Saideep - thanks! I have a github project here for storing the code but all the code is in my local git currently, I will cleanup and push to github once I post the next part (hopefully soon). Also Keras2 has come out with many deprecations but the one that worries me is undocumented functions, not sure if that changes anything currently - need to check that as well.
How did u label the data? It seems image trigger is making random.
ReplyDeleteCan you explain?
Hi Pelin, not sure what you meant when you said image trigger is making random, but here is the way I labeled the data. I used the file naming scheme. There are 1491 photographs from 500 "vacations". A typical filename is 149601.jpg, here 1496 refers to a "vacation ID" and 01 is a sequence number indicating that this is the first photograph in vacation 1496. My target labels are 0 for same vacation and 1 for different vacation. So I just group by the first 4 digits in the file name and then generate all pairs of combinations of photographs in the same group. I then use sort of negative sampling - I randomly select one image from the pair, then pair it up with an image from another group.
ReplyDelete