I was looking for some help on the Internet around Convolutional Neural Networks (ConvNets) recently, and stumbled upon this interesting page on how GIMP uses convolutions for image transformations. Of course, the process described here is the opposite of what happens with ConvNets. Here, we take a known filter and apply it to an image to get a known effect. In case of ConvNets, you start with a random filter and let the magic of backpropagation nudge the values in the filter so that the training objective is maximized.
Anyway, I thought it might be fun to (mis)use the Tensorflow API to compute convolutions on an image with the filters listed in the GIMP documentation page above. For the image, I chose this image of a Red Macaw from FreeDigitalPhotos.net.
Here is the code to apply various convolutions to this image, first using the Scipy API and then using the Tensorflow API.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | # -*- coding: utf-8 -*-
from scipy import signal
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
IMG_PATH = "/path/to/ID-100137333.jpg"
FILTERS = {
"Sharpen" : np.array([[0, 0, 0, 0, 0],
[0, 0, -1, 0, 0],
[0, -1, 5, -1, 0],
[0, 0, -1, 0, 0],
[0, 0, 0, 0, 0]]),
"Blur" : np.array([[0, 0, 0, 0, 0],
[0, 1, 1, 1, 0],
[0, 1, 1, 1, 0],
[0, 1, 1, 1, 0],
[0, 0, 0, 0, 0]]),
"Edge Enhance": np.array([[ 0, 0, 0],
[-1, 1, 0],
[ 0, 0, 0]]),
"Edge Detect" : np.array([[0, 1, 0],
[1, -4, 1],
[0, 1, 0]]),
"Emboss" : np.array([[-2, -1, 0],
[-1, 1, 1],
[ 0, 1, 2]])}
def plot_image(label, image):
print label, image.shape
plt.imshow(image)
plt.xticks([])
plt.yticks([])
plt.show()
def do_scipy_convolution(image, fltr):
layers = []
for z in range(image.shape[2]):
layers.append(signal.convolve2d(image[:, :, z], fltr, mode="same"))
return np.array(layers, dtype="uint8").swapaxes(0, 2).swapaxes(0, 1)
def do_tensorflow_convolution(image, fltr, do_pooling=False):
batched_image = np.array([image[:, :, 0],
image[:, :, 1],
image[:, :, 2]], dtype="float32")
batched_image = batched_image.reshape((batched_image.shape[0],
batched_image.shape[1],
batched_image.shape[2], 1))
conv_fltr = fltr.reshape((fltr.shape[0],
fltr.shape[1], 1, 1)).astype("float32")
conv_image = tf.nn.conv2d(batched_image, conv_fltr,
[1, 1, 1, 1], padding="SAME")
if do_pooling:
conv_image = tf.nn.max_pool(conv_image, [1, 3, 3, 1], [1, 3, 3, 1],
padding="SAME")
with tf.Session() as sess:
output_image = sess.run(conv_image)
output_image = output_image.swapaxes(0, 2).swapaxes(0, 1)
output_image = output_image.reshape((output_image.shape[0],
output_image.shape[1],
output_image.shape[2]))
output_image = output_image.astype("uint8")
return output_image
def plot_images(label, image_sp, image_tf, image_tfp):
print label, image_sp.shape, image_tf.shape, image_tfp.shape
plt.subplot(131)
plt.imshow(image_sp)
plt.xticks([])
plt.yticks([])
plt.subplot(132)
plt.imshow(image_tf)
plt.xticks([])
plt.yticks([])
plt.subplot(133)
plt.imshow(image_tfp)
plt.xticks([])
plt.yticks([])
plt.tight_layout()
plt.show()
# show original image
image = plt.imread(IMG_PATH)
plot_image("Original", image)
for f in FILTERS.keys():
conv_output_sp = do_scipy_convolution(image, FILTERS[f])
conv_output_tf = do_tensorflow_convolution(image, FILTERS[f])
conv_output_tfp = do_tensorflow_convolution(image, FILTERS[f], True)
plot_images(f, conv_output_sp, conv_output_tf, conv_output_tfp)
|
I used the Scipy convolution2d API because the results of Tensorflow's conv2d seemed somewhat non-intuitive compared to the results in the Gimp docs. I tried the depthwise_conv2d call instead but did not get noticeably better results. Finally, I simulated the way I was calling the Scipy convolution2d call layer by layer by treating the three layers as single layers in a batch of 3, and I got results that were identical for both Scipy and Tensorflow.
I also tried to do max pooling just to see what happens. The images below show how the image is transformed using each filter. The leftmost image is the image after being transformed using Scipy's convolution2d function, the center one is the image transformed by Tensorflow's conv2d, and the right most one is the result of Tensorflow's conv2d followed by max_pool. As you can see, in the case of edge detection, the edges seem to show up better after max pooling. Here are the results for the various filters.
Sharpen
Blur
Edge Enhance
Edge Detect
Emboss
I realize this was probably a bit pointless, but I found the idea of doing image transformation using convolutions quite fascinating. I guess this idea might be fairly well known among image processing folks, and that's probably the reason convolutions are used in ConvNets. In any case, it was fun to see yet another area where linear algebra proves useful.
Be the first to comment. Comments are moderated to prevent spam.
Post a Comment