At ElasticON 2016 earlier this year, Doug Turnbull spoke about Image Search in his talk The Ghost in The Search Machine. The idea is simple - treating an image as a bag of pixel intensity "words". Thus a 640 x 480 color image of size is represented by 307,200 words, each word representing a single pixel. Since the image is in color, each pixel has 3 numbers, one for red, green and blue respectively. In our representation, an example word might be 251_253_248 representing (251, 253, 248) for RGB respectively, and each image would be represented by 307,200 such words. Now that the image is just a bunch of words, one can use Lucene (Doug's example was ElasticSearch) to look up similar images given an image.
A similar approach was demonstrated by Adrian Rosenbrock on his blog post Hobbits and Histograms - A How-To Guide to Building your First Image Search Engine in Python, where he uses histograms to detect similar images. Instead of a Lucene index, he uses Chi-squared distance between the histogram for the query image and the other images in the index.
I have been thinking of a hybrid approach where I use a Lucene index to store images as a sequence of pixel words, but where I increase the recall somewhat by doing a bit of feature reduction to have fewer words. So, looking at my original example, since my pixel word is composed of three values in the range 0-255, it can have 16,777,216 possible values. However if I bin them into 25 possible values per channel, my vocabulary size reduces to just 625. This post explores a couple of ideas around binning the pixels.
For my test, I used this image of a butterfly I got from the free stock photos at PhotoRack. The charts on the right represent the histograms for the pixel counts for different intensities for each of the RGB channels.
The code to generate the two images are as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
import matplotlib.pyplot as plt img = plt.imread("/path/to/1132868439-12104.jpg") # original image plt.imshow(img) plt.show() # channel histograms channels = ["red", "green", "blue"] for ix, ch in enumerate(channels): plt.subplot(3, 1, ix+1) plt.hist(img[:, :, ix].reshape(img.shape * img.shape,), color=ch) plt.grid(True) plt.tight_layout() plt.show()
One way to reduce the number of features is to bin the pixel intensities for each channel in equally spaced blocks. Since I want only 25 values in each channel, I create 25 bins. I then create a new image where each pixel is replaced with the mid-point of the range for the bin it belongs to. The code to do this is shown below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
import numpy as np # binning each matrix into 25 bins and replacing with binned value bins = np.linspace(0, 255, 25) binned_img = ((10 * np.digitize(img, bins)) + 5).astype("uint8") plt.imshow(binned_img) plt.show() # channel histograms (binned image) for ix, ch in enumerate(channels): plt.subplot(3, 1, ix+1) plt.hist(binned_img[:, :, ix].reshape(img.shape * img.shape,), color=ch) plt.grid(True) plt.tight_layout() plt.show()
And here is the resulting image and the histogram. Notice that parts of the red and green channels are slightly taller and parts of the blue channel slightly shorter than the original histogram.
Finally, instead of equal sized bins, I use K-Means to cluster the pixel intensities in each channel into 25 clusters, and replace the pixel intensities with the value of the centroid of the cluster it belongs to. Here is the code to do this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
from sklearn.cluster import KMeans temp_img = np.zeros((img.shape * img.shape, 3), dtype="uint8") for ix, ch in enumerate(channels): kmeans = KMeans(n_clusters=25) kmeans.fit(img[:, :, ix].reshape(img.shape * img.shape, 1)) centroids = kmeans.cluster_centers_ labels = kmeans.labels_ for lx, label in enumerate(labels): temp_img[lx, ix] = int(centroids[label]) clustered_img = temp_img.reshape((img.shape, img.shape, img.shape)) plt.imshow(clustered_img) plt.show() # channel histogram (clustered image) for ix, ch in enumerate(channels): plt.subplot(3, 1, ix+1) plt.hist(clustered_img[:, :, ix].reshape(img.shape * img.shape,), color=ch) plt.grid(True) plt.tight_layout() plt.show()
And here are the corresponding image and channel histograms. The histograms seem to be closer to the original than the equal sized bins above.
The nice thing about the two approaches is that the change in image quality seems to be quite minor to the human eye (at least my human eye), but these images can be represented by much smaller vectors. Hopefully this will translate to faster performance and/or smaller Lucene term indexes.