Salmon Run: Autoencoder as a Feature Detector for Complex Classifiers

Using Autoencoders as anomaly detectors is not a new idea. For examples, see here, here and here (results of a google search "autoencoder anomaly detection"). However, one other possibility could be to use an autoencoder as a feature detector for specific features, for example, to look for specific kinds of objects in images. So assuming you wanted to detect domestic animals in images, you could train a set of autoencoders, each trained on a single animal, then run the image through the set. The intuition is that, assuming the image has a cat in it, an autoencoder trained on cat images should be able to reconstruct the image with lower reconstruction error than it would for other animals. In this post, I describe an experiment where I train an autoencoder on cat images from the Kaggle Cats and Dogs Dataset, and compare reconstruction errors on a held-out set of dog and cat images.

The Kaggle Cats and Dogs dataset contains images of 12,500 cats and 12,500 dogs, both color and monochrome. I picked a random sample of 2,500 images each of cats and dogs as my held-out test set, and trained my cat autoencoder on the remaining 10,000 cat images. The autoencoder was trained on monochrome images for 50 epochs. I then compared a few random images and the reconstructed image (input and output of the autoencoder) and it looks like the autoencoder has learned to reconstruct the images quite nicely, as you can see below.

As expected, the autoencoder seems somewhat better at reconstructing cat images compared to dog images (Root Mean Square Error (RMSE) between original and reconstructed image for the cat and the dog image above is 0.04304 and 0.07650 respectively). I then computed reconstruction errors (RMSE again) for the held out set of 2,500 cat and dog images and plotted the distribution of reconstruction errors, which is shown below.

As you can see, while there is quite a bit of overlap between the two distributions, the distribution of reconstruction errors for cat images peaks earlier and is skewed left a little more than the distribution of reconstruction errors for dog images. It also looks like the distributions are approximately normal, so we can use the t-test to detect if there is statistically significant difference between them, as explained on this page. Turns out that the t-value is 16.94 ≫ 3.291, so we can conclude that the distributions are different at 99% confidence level.

Since the distributions of reconstruction errors are approximately normal, and since we can represent a normal distribution by the mean and standard deviation, we can now compute the probability of an image being that of a cat based on its reconstruction error. For example, our cat and dog exampleis have a 80.3% and 0.15% chance of being a cat respectively (code for calculating these probabilities is provided later in the post). This gives us a way to use the autoencoder as a binary cat classifier.

So that was a brief overview of the idea and the results from my experiments. Now I will describe the experiment in more detail, along with the code, so you can replicate the results for yourself, should you find it interesting enough to do so.

Data cleaning

The images are provided in a directory structure that looks like this. The Cat and Dog directories each contain 12,500 images of varying sizes, both monochrome and color. We randomly sample 2500 images each from each collection as our held-out test set and write the filenames to flat files test-cats.txt and test-dogs.txt. The remaining 10,000 cat images are written into train-cats.txt. We use these files as inputs to a custom image generator.

PetImages/
|
+-- Cat/
|   |
|   +-- cat images
+-- Dog/
|   |
|   +-- dog images

We standardize our input images to all be monochrome and size 128 x 128 pixels. Most images have 3 channels (Red, Green, Blue) which we coalesce to a single channel using the Luminosity formula defined here. Some images had 4 channels (RGB + Alpha channel), for which we first discarded the Alpha channel and applied the same Luminosity formula to the remaining channels. A few files could not be read and were discarded. A few others had issues with its EXIF headers, which seems a spurious problem without a solution in the Pillow package according to this issue, which I ended up discarding as well. Because we are referencing the image files via text files, "discarding" an image is just commenting out its file name. In all, 24 filenames in all were discarded in this manner, 13 from the training set and 5 and 6 from the cat and dog test sets respectively.

The code to pre-process incoming images into a (128, 128) monochrome images is shown below. One thing to note is that the skimage.transform.resize() method not only resizes the image, but also silently rescales the pixel intensities from the [0, 256] range to the [0, 1] range.

from skimage.transform import resize

import os
import matplotlib.pyplot as plt
import numpy as np

IMAGE_HEIGHT, IMAGE_WIDTH, NUM_CHANNELS = 128, 128, 1

def get_processed_image(image):
    image_r = resize(image, (IMAGE_HEIGHT, IMAGE_WIDTH))
    if len(image_r.shape) == 2:  # 2D image
        image_r = np.expand_dims(image_r, axis=2)
    else:
        if image_r.shape[2] == 3:
            image_r = (image_r[:, :, 0] * 0.21 + 
                       image_r[:, :, 1] * 0.72 + 
                       image_r[:, :, 2] * 0.07)
        elif image_r.shape[2] == 4:
            image_r = image_r[:, :, 0:3]
            image_r = (image_r[:, :, 0] * 0.21 + 
                       image_r[:, :, 1] * 0.72 + 
                       image_r[:, :, 2] * 0.07)
        else:
            pass
        image_r = np.expand_dims(image_r, axis=2)
    assert(image_r.shape == (IMAGE_HEIGHT, IMAGE_WIDTH, NUM_CHANNELS))
    return image_r

Discarding the problem images was semi-manual. I would repeatedly loop through the three lists of files, looking for exceptions, using the following code. Once I found one, I would comment it out so it would get ignored in the next run. To save time, I did a little trick with global variables, so I could start with the file I left off, that bit is commented out. The code below should run without errors once all the offending files are commented out of the list.

import warnings

warnings.filterwarnings("ignore", ".*will be changed.*")

DATA_DIR = "../data"

CAT_IMAGE_DIR = os.path.join(DATA_DIR, "PetImages", "Cat")
DOG_IMAGE_DIR = os.path.join(DATA_DIR, "PetImages", "Dog")

CAT_TRAIN_FILE = os.path.join(DATA_DIR, "train-cats.txt")
CAT_TEST_FILE = os.path.join(DATA_DIR, "test-cats.txt")
DOG_TEST_FILE = os.path.join(DATA_DIR, "test-dogs.txt")

#LAST_FOUND_JPG = "573.jpg"
#start_processing = False

def get_source_image(imagedir, imagefile):
    image = plt.imread(os.path.join(imagedir, imagefile))
    return image

def check_images(imagedir, imagelist):
#    global start_processing
    num_processed = 0
    legend = os.path.basename(imagelist)
    flist = open(imagelist, "r")
    for line in flist:
        if line.startswith("#"):
#            if line.strip() == "#" + LAST_FOUND_JPG:
#                start_processing = True
            continue
        if num_processed % 100 == 0:
            print("{:s}: {:d} lines loaded...".format(legend, num_processed))
#        if not start_processing:
#            num_processed += 1
#            continue
        imagefile = line.strip()
        try:
            image = get_source_image(imagedir, imagefile)
            image_p = get_processed_image(image)
            if len(image_p.shape) != 3:
                print(imagefile, image.shape, image_p.shape)
        except IndexError as e1:
            print(imagefile, image.shape)
            raise e1 
        except OSError as e2:
            print(imagefile)
            raise e2
        except UserWarning as e3:
            print(imagefile)
            raise e3
        except AssertionError as e4:
            print(imagefile)
            raise e4
        num_processed += 1
    
    flist.close()
    print("{:s}: {:d} lines loaded, COMPLETE".format(legend, num_processed))

check_images(CAT_IMAGE_DIR, CAT_TRAIN_FILE)
check_images(CAT_IMAGE_DIR, CAT_TEST_FILE)
check_images(DOG_IMAGE_DIR, DOG_TEST_FILE)

Model Training

The autoencoder design is based on the one Aditya Sharma describes in his blog post on Reconstructing Brain MRI Images using Deep Learning (Convolutional Autoencoder). The network consists of 2 blocks of convolution and maxpooling layers for the encoder, followed by a final convolutional block, followed by 2 blocks of convolution and upsampling for the decoder. Input to the network is mini-batches (of 128 images each), each image having the shape (128, 128, 1). Here is the code to build the autoencoder. The loss function is Mean Square Error, since we want to minimize pixel-wise reconstruction error, and the optimizer we use is RMSProp with a learning rate of 0.0003.

import tensorflow as tf

def build_model_2():
    input_img = tf.keras.Input(shape=(128, 128, 1))
    # encoder
    conv_1 = tf.keras.layers.Conv2D(32, (3, 3), activation="relu",
                                    padding="same")(input_img)
    conv_1 = tf.keras.layers.BatchNormalization()(conv_1)
    conv_1 = tf.keras.layers.Conv2D(32, (3, 3), activation="relu",
                                    padding="same")(conv_1)
    conv_1 = tf.keras.layers.BatchNormalization()(conv_1)
    pool_1 = tf.keras.layers.MaxPool2D((2, 2))(conv_1) # (64, 64, 32)
    
    conv_2 = tf.keras.layers.Conv2D(64, (3, 3), activation="relu",
                                    padding="same")(pool_1)
    conv_2 = tf.keras.layers.BatchNormalization()(conv_2)
    conv_2 = tf.keras.layers.Conv2D(64, (3, 3), activation="relu",
                                    padding="same")(conv_2)
    conv_2 = tf.keras.layers.BatchNormalization()(conv_2)
    pool_2 = tf.keras.layers.MaxPool2D((2, 2))(conv_2) # (32, 32, 64)
    
    conv_3 = tf.keras.layers.Conv2D(128, (3, 3), activation="relu",
                                    padding="same")(pool_2)
    conv_3 = tf.keras.layers.BatchNormalization()(conv_3)
    conv_3 = tf.keras.layers.Conv2D(128, (3, 3), activation="relu",
                                    padding="same")(conv_3)
    conv_3 = tf.keras.layers.BatchNormalization()(conv_3) # (32, 32, 128)
    
    # decoder
    conv_4 = tf.keras.layers.Conv2D(64, (3, 3), activation="relu",
                                    padding="same")(conv_3)
    conv_4 = tf.keras.layers.BatchNormalization()(conv_4)
    conv_4 = tf.keras.layers.Conv2D(64, (3, 3), activation="relu",
                                    padding="same")(conv_4)
    conv_4 = tf.keras.layers.BatchNormalization()(conv_4)
    up_1 = tf.keras.layers.UpSampling2D((2, 2))(conv_4) # (64, 64, 64)
    
    conv_5 = tf.keras.layers.Conv2D(32, (3, 3), activation="relu",
                                    padding="same")(up_1)
    conv_5 = tf.keras.layers.BatchNormalization()(conv_5)
    conv_5 = tf.keras.layers.Conv2D(32, (3, 3), activation="relu",
                                    padding="same")(conv_5)
    conv_5 = tf.keras.layers.BatchNormalization()(conv_5)
    up_2 = tf.keras.layers.UpSampling2D((2, 2))(conv_5) # (128, 128, 32)
    
    decoded = tf.keras.layers.Conv2D(1, (3, 3), activation="sigmoid",
                                     padding="same")(up_2) # (128, 128, 1)
    
    autoencoder = tf.keras.Model(inputs=[input_img], outputs=[decoded])
    rmsprop = tf.keras.optimizers.RMSprop(lr=3e-4)
    autoencoder.compile(optimizer=rmsprop, loss="mse")
    return autoencoder

Or in tabular form.

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 128, 128, 1)       0         
_________________________________________________________________
conv2d (Conv2D)              (None, 128, 128, 32)      320       
_________________________________________________________________
batch_normalization (BatchNo (None, 128, 128, 32)      128       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 128, 128, 32)      9248      
_________________________________________________________________
batch_normalization_1 (Batch (None, 128, 128, 32)      128       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 64, 64, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 64, 64, 64)        18496     
_________________________________________________________________
batch_normalization_2 (Batch (None, 64, 64, 64)        256       
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 64, 64, 64)        36928     
_________________________________________________________________
batch_normalization_3 (Batch (None, 64, 64, 64)        256       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 32, 32, 64)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 32, 32, 128)       73856     
_________________________________________________________________
batch_normalization_4 (Batch (None, 32, 32, 128)       512       
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 32, 32, 128)       147584    
_________________________________________________________________
batch_normalization_5 (Batch (None, 32, 32, 128)       512       
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 32, 32, 64)        73792     
_________________________________________________________________
batch_normalization_6 (Batch (None, 32, 32, 64)        256       
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 32, 32, 64)        36928     
_________________________________________________________________
batch_normalization_7 (Batch (None, 32, 32, 64)        256       
_________________________________________________________________
up_sampling2d (UpSampling2D) (None, 64, 64, 64)        0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 64, 64, 32)        18464     
_________________________________________________________________
batch_normalization_8 (Batch (None, 64, 64, 32)        128       
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 64, 64, 32)        9248      
_________________________________________________________________
batch_normalization_9 (Batch (None, 64, 64, 32)        128       
_________________________________________________________________
up_sampling2d_1 (UpSampling2 (None, 128, 128, 32)      0         
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 128, 128, 1)       289       
=================================================================
Total params: 427,713
Trainable params: 426,433
Non-trainable params: 1,280

As I had mentioned earlier, I use a custom image file generator based on my 3 text files, which point to the actual image files. To build it, I combined ideas from Arindam Baidya's blog post on How to increase your small image dataset using Keras ImageDataGenerator and Nilesh's post on Writing Custom Keras Generators. Essentially the idea is to build a generator function that reads the flat files to yield batches of images, which the Keras ImageDataGenerator calls using its flow() method. To be clear, I didn't end up using any image augmentation (which is probably the biggest reason to use the ImageDataGenerator). I did try augmenting initially, but my training was not converging, which I later found to be because I was mistakenly augmenting my validation set as well. In any case, I turned off augmentation for both, since I had 10,000 images, which I figured would be more than enough to train the autoencoder. Here is the code for the generator function images_generator() and the wrapper function custom_generator().

def images_generator(image_list):
    while True:
        indices = np.random.permutation(len(image_list))
        num_batches = len(image_list) // BATCH_SIZE
        for batch in range(num_batches):
            curr_batch = indices[batch*BATCH_SIZE:(batch + 1)*BATCH_SIZE]
            xtrain = np.empty([0, IMAGE_HEIGHT, IMAGE_WIDTH, NUM_CHANNELS],
                             dtype=np.float32)
            ytrain = np.empty([0], dtype=np.int32)
            for id_batch in curr_batch:
                image = get_source_image(IMAGE_DIR, image_list[id_batch])
                image = get_processed_image(image)
                xtrain = np.append(xtrain, [image], axis=0)
                ytrain = np.append(ytrain, [0])
            yield (xtrain, ytrain)

def custom_generator(gen, augment_images):
    if augment_images:
        keras_image_generator = tf.keras.preprocessing.image.ImageDataGenerator(
            rotation_range=15,
            width_shift_range=0.1,
            height_shift_range=0.1,
            shear_range=0.01,
            zoom_range=[0.9, 1.25],
            horizontal_flip=True,
            vertical_flip=False,
            fill_mode="reflect",
            brightness_range=[0.5, 1.5]
        )
    else:
        keras_image_generator = tf.keras.preprocessing.image.ImageDataGenerator()
    for images, labels in gen:
        kigen_batch = keras_image_generator.flow(images, labels,
                batch_size=images.shape[0])
        aug_images, labels = next(kigen_batch)
        yield (aug_images, aug_images)

In the code snippet below, the training set of 10,000 cat images is split 90/10 into a training set and validation set. We then declare custom generators on top of these image lists, and call the autoencoder's fit_generator method, training it for 50 epochs on a AWS p2.xlarge EC2 instance.

from sklearn.model_selection import train_test_split

train_set_images = get_train_images(TRAIN_CATS)
train_images, val_images = train_test_split(train_set_images,
                                            train_size=0.9,
                                            test_size=0.1)

train_gen = images_generator(train_images)
cust_gen = custom_generator(train_gen, False)

val_gen = images_generator(val_images)
cust_val_gen = custom_generator(val_gen, False)

autoencoder = build_model_2()
print(autoencoder.summary())

checkpoint = tf.keras.callbacks.ModelCheckpoint(MODEL_FILE,
                                                save_best_only=True)
tensorboard = tf.keras.callbacks.TensorBoard(log_dir=LOG_DIR, 
                                             batch_size=BATCH_SIZE)
history = autoencoder.fit_generator(cust_gen, 
    steps_per_epoch=len(train_images) // BATCH_SIZE,
    epochs=NUM_EPOCHS,
    callbacks=[checkpoint, tensorboard],
    validation_data=cust_val_gen,
    validation_steps=len(val_images) // BATCH_SIZE)

The learning curve for the autoencoder training shown below shows that both the training and validation losses are decreasing and converging nicely.

Model Evaluation

The ModelCheckpoint callback ensures that the best model (based on the validation loss) is saved, so we can use this saved model to do our evaluations. The first evaluation I did was a few spot checks, where I rendered the input and reconstructed image (by running the input image through the trained autoencoder) side by side and computed the reconstruction error. Code for that is shown below. This code was used to produce the first two images of the cat and dog earlier in this post.

def compute_reconstruction_error(image, image_r):
    assert(image.shape == (IMAGE_WIDTH, IMAGE_HEIGHT))
    assert(image_r.shape == (IMAGE_WIDTH, IMAGE_HEIGHT))
    return np.sqrt(np.mean(np.power(image - image_r, 2)))

test_image = os.path.join(DOG_IMAGE_DIR, "4905.jpg")
    
image_p = get_processed_image(plt.imread(test_image)

model = tf.keras.models.load_model(MODEL_FILE)

image_r = model.predict(np.expand_dims(image_p, axis=0))

image_r = np.squeeze(image_r, axis=(0, 3))
image_p = np.squeeze(image_p, axis=2)
recon_error = compute_reconstruction_error(image_p, image_r)

plt.subplot(121)
image_p = (image_p * 255).astype(np.int32)
plt.xticks([])
plt.yticks([])
plt.title("original")
plt.imshow(image_p, cmap="gray")

plt.subplot(122)
image_r = (image_r * 255).astype(np.int32)
plt.xticks([])
plt.yticks([])
plt.title("reconstructed")
plt.imshow(image_r, cmap="gray")

plt.tight_layout()
plt.show()

print("reconstruction error: {:.7f}".format(recon_error))

I followed this up by computing the reconstruction error across 2,500 cats and 2,500 dogs in the held-out test set, and plotting the results in a pair of histograms. I then computed the t-value to determine if the two distributions were different with some reasonable confidence level. The code for this is shown below. This code was used to produce the overlapping histograms earlier in the post.

def compute_recon_errs(model, testfile, image_dir):
    num_images = 0
    images = []
    ftest = open(testfile, "r")
    for line in ftest:
        if line.startswith("#"):
            continue
        if num_images % 1000 == 0:
            print("{:d} images read".format(num_images))
        image_file = os.path.join(image_dir, line.strip())
        image = plt.imread(image_file)
        images.append(get_processed_image(image))
        num_images += 1
    
    print("{:d} images read, COMPLETE".format(num_images))
    ftest.close()
    
    print("Running model...")
    images = np.array(images)
    images_r = model.predict(images, batch_size=BATCH_SIZE)
    
    print("Computing errors...")
    errors = np.zeros((num_images,), dtype=np.float32)
    for i in range(num_images):
        image = np.squeeze(images[i], axis=2)
        image_r = np.squeeze(images_r[i], axis=2)
        errors[i] = _compute_recon_err(image, image_r)
    return errors

def compute_stats(errors):
    mu = np.mean(errors)
    sd = np.std(errors)
    return mu, sd, errors.shape[0]

def compute_t(cat_stats, dog_stats):
    cat_mu, cat_sig, cat_n = cat_stats
    dog_mu, dog_sig, dog_n = dog_stats
    mu_diff = dog_mu - cat_mu
    sd_sum = math.sqrt(((dog_sig ** 2) / dog_n) + ((cat_sig ** 2) / cat_n))
    return mu_diff / sd_sum


cat_recon_errs = compute_recon_errs(model, TEST_CATS_FILE, CAT_IMAGE_DIR)
dog_recon_errs = compute_recon_errs(model, TEST_DOGS_FILE, DOG_IMAGE_DIR)

plt.hist(cat_recon_errs, bins=50, color="b", alpha=0.3, label="cats")
plt.hist(dog_recon_errs, bins=50, color="r", alpha=0.3, label="dogs")
plt.legend(loc="best")
plt.xlabel("reconstruction error")
plt.ylabel("# of images")
plt.show()

cat_stats = compute_stats(cat_recon_errs)
dog_stats = compute_stats(dog_recon_errs)
print("cat stats:", cat_stats)
print("dog stats:", dog_stats)

t_value = compute_t(cat_stats, dog_stats)
print("t_value:", t_value)
print("lookup pvalue from: https://www.biologyforlife.com/t-test.html")

And finally, here is the snippet for computing the probability of an image being that of a cat, given the reconstruction error from the cat encoder. Essentially, I compute the value of the normal distribution at the reconstruction error point, normalized by the (maximum) value at the mean.

import scipy.stats

cat_mean, cat_sd, cat_n = (0.035493888, 0.011421803, 2495)
dog_mean, dog_sd, dog_n = (0.042051364, 0.0155974245, 2494)

max_cat = scipy.stats.norm(cat_mean, cat_sd).pdf(cat_mean)

cat_recon_err = 0.04304386727035
dog_recon_err = 0.07650210412358

cat_score = scipy.stats.norm(cat_mean, cat_sd).pdf(cat_recon_error) / max_cat
dog_score = scipy.stats.norm(cat_mean, cat_sd).pdf(dog_recon_error) / max_cat

To conclude, in addition to being good at feature compression and anomaly detection, I believe autoencoders can also be useful to build complex classifiers (or feature detectors for more complex classifiers) by building on existing collections of pre-classified data. I did this experiment as a proof of concept to convince myself that this was indeed a viable approach, hopefully I have managed to convince you too.