Some time back, I was asked if there was a simple way to automatically classify images as either photographs or drawings. I had initially thought this would involve some complex image processing, but the idea presented in this paper - A Statistical Combined Classifier and its Application to Region and Image Classification (PDF) by Steven J Simske - shows that the problem can be reduced to something similar to a bag of words model commonly used in text classification.
Consider the two images shown below. Clearly (to a human), the one on the left is a photograph, and the second is a chart (or drawing). The main thing that jumps out is how "soft" the color gradations are in the first one compared to the second. For ease of computation, we reduce the RGB values for each pixels to 256 grayscale values using the formula on this page. The corresponding black and white version of the images are shown on the second row.
The third row shows the corresponding histogram plots for the percentage distribution of the pixels across all the 256 gray scale values. This is the reduction suggested by the paper referenced above. Effectively, the image has turned into a "document" which uses a vocabulary of 256 "terms". Since the images can have different sizes (and hence different number of pixels), we normalize the counts to percentages so histograms for different images are comparable to each other.
Notice that the histogram for the photo is shorter and wider and in general smoother than the one for the drawing (the y-axis scale for the first is 0-3 and that for the second is 0-30). The paper describes three features that the authors used for their experiments:
- Pct0.5 - percent of the histogram bins with >0.5% of the pixels.
- Pct2Pk - percent of the histogram range within the largest 2 peaks.
- Bimodal - average of various sums of nearest neighbor pixel differences.
The paper has more detailed about each feature (including details of how to compute them). Since all of these values are derived from the histogram itself, I initially decided to build a classifier that just used the grayscale percentage counts as features instead. My thought was that the features would be mutually independent and thus perform better with something like Naive Bayes, or at least no worse than using three derived features as suggested in the paper.
Turns out I was wrong, but since I used Weka for most of the experimentation, the mistake wasn't overwhelmingly expensive :-).
So, anyway, for my classifier, I decided to use the first two features suggested by the model, plus another one that reflected the "choppiness" of the histograms - the sum of absolute differences between successive grayscale percentage counts which I call AbsDiff. I used AbsDiff instead of Bimodal because the calculation of the Bimodal feature is quite compute intensive (see the paper for details).
I tested out both models using the Weka Explorer with the built-in Naive Bayes classifier. There are (many) other classifiers available in Weka, but its been a while since I read the Weka book, and I chose Naive Bayes just because I was familiar with it.
The training data for both models was a manually annotated training set of approximately 200 images. While tedious, such annotation does not require any specialized domain knowledge (unlike medical text for example). The images were then analyzed and their data (the grayscale percentage counts in the case of the first model and the feature values in the case of the second model) were written out to a file in Weka's ARFF format. Here is a snippet (the rest of it is just more training data in the same format as the first four rows) from the ARFF file for the model I ended up using.
1 2 3 4 5 6 7 8 9 10 11 12 13 | @relation pdc2
@attribute pct05pc real
@attribute pct2pk real
@attribute absdiff real
@attribute klass {photo,drawing}
@data
30.46875,32.77869002006933,0.35458615684431966,photo
20.703125,73.67603966544412,0.45621152596647285,photo
14.453125,67.73598566997008,0.18333175409590724,drawing
3.90625,52.661171146208545,0.15240938475387075,drawing
...
|
Results for both models using Naive Bayes with 10-fold cross validation are shown below:
1 2 3 4 5 6 7 | # Using raw grayscale percentage counts (pdc1)
Correctly Classified Instances 172 88.2051 %
Incorrectly Classified Instances 23 11.7949 %
# Using computed features (pdc2)
Correctly Classified Instances 176 90.2564 %
Incorrectly Classified Instances 19 9.7436 %
|
For testing, I once again manually annotated another set of 200 images, embedded the Weka NaiveBayes classifier inside my Java code (shown below), trained it with the ARFF file for the training data, then passed each of the manually annotated images through the classifier. Here is the code for the classifier, based on the example classifier code provided on this Weka Wiki page.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 | package com.mycompany.classifiers;
import java.awt.Color;
import java.awt.image.BufferedImage;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URL;
import java.net.URLConnection;
import javax.imageio.ImageIO;
import org.apache.commons.collections15.Bag;
import org.apache.commons.collections15.bag.HashBag;
import weka.classifiers.Classifier;
import weka.classifiers.bayes.NaiveBayes;
import weka.core.Instance;
import weka.core.Instances;
public class PhotoOrDrawingClassifier {
private Classifier classifier;
private Instances dataset;
//////////////// train /////////////////
public void train(String arff) throws Exception {
classifier = new NaiveBayes();
dataset = new Instances(
new BufferedReader(new FileReader(arff)));
// our class is the last attribute
dataset.setClassIndex(dataset.numAttributes() - 1);
classifier.buildClassifier(dataset);
}
/////////////// test ////////////////////
public String test(String imgfile) throws Exception {
double[] histogram = buildHistogram(new File(imgfile));
Instance instance = new Instance(dataset.numAttributes());
instance.setDataset(dataset);
instance.setValue(dataset.attribute(0), getPct05Pc(histogram));
instance.setValue(dataset.attribute(1), getPct2Pk(histogram));
instance.setValue(dataset.attribute(2), getAbsDiff(histogram));
double[] distribution = classifier.distributionForInstance(instance);
return distribution[0] > distribution[1] ? "photo" : "drawing";
}
//////////////// helper code /////////////////////////
private static final double LUMINANCE_RED = 0.299D;
private static final double LUMINANCE_GREEN = 0.587D;
private static final double LUMINANCE_BLUE = 0.114;
private static final int HIST_WIDTH = 256;
private static final int HIST_HEIGHT = 100;
private static final int HIST_5_PCT = 10;
/**
* Parses pixels out of an image file, converts the RGB values to
* its equivalent grayscale value (0-255), then constructs a
* histogram of the percentage of counts of grayscale values.
* @param infile - the image file.
* @return - a histogram of grayscale percentage counts.
*/
protected double[] buildHistogram(File infile) throws Exception {
BufferedImage input = ImageIO.read(infile);
int width = input.getWidth();
int height = input.getHeight();
Bag<Integer> graylevels = new HashBag<Integer>();
double maxWidth = 0.0D;
double maxHeight = 0.0D;
for (int row = 0; row < width; row++) {
for (int col = 0; col < height; col++) {
Color c = new Color(input.getRGB(row, col));
int graylevel = (int) (LUMINANCE_RED * c.getRed() +
LUMINANCE_GREEN * c.getGreen() +
LUMINANCE_BLUE * c.getBlue());
graylevels.add(graylevel);
maxHeight++;
if (graylevel > maxWidth) {
maxWidth = graylevel;
}
}
}
double[] histogram = new double[HIST_WIDTH];
for (Integer graylevel : graylevels.uniqueSet()) {
int idx = graylevel;
histogram[idx] +=
graylevels.getCount(graylevel) * HIST_HEIGHT / maxHeight;
}
return histogram;
}
protected double getPct05Pc(double[] histogram) {
double numBins = 0.0D;
for (int gl = 0; gl < histogram.length; gl++) {
if (histogram[gl] > 0.5D) {
numBins++;
}
}
return numBins * 100 / HIST_WIDTH;
}
protected double getPct2Pk(double[] histogram) {
double pct2pk = 0.0D;
// find the maximum entry (first peak)
int maxima1 = getMaxima(histogram, new int[][] {{0, histogram.length}});
// navigate left until an inflection point is reached
int lminima1 = getMinima(histogram, new int[] {maxima1, 0});
int rminima1 = getMinima(histogram, new int[] {maxima1, histogram.length});
for (int gl = lminima1; gl <= rminima1; gl++) {
pct2pk += histogram[gl];
}
// find the second peak
int maxima2 = getMaxima(histogram, new int[][] {
{0, lminima1 - 1}, {rminima1 + 1, histogram.length}});
int lminima2 = 0;
int rminima2 = 0;
if (maxima2 > maxima1) {
// new maxima is to the right of previous on
lminima2 = getMinima(histogram, new int[] {maxima2, rminima1 + 1});
rminima2 = getMinima(histogram, new int[] {maxima2, histogram.length});
} else {
// new maxima is to the left of previous one
lminima2 = getMinima(histogram, new int[] {maxima2, 0});
rminima2 = getMinima(histogram, new int[] {maxima2, lminima1 - 1});
}
for (int gl = lminima2; gl < rminima2; gl++) {
pct2pk += histogram[gl];
}
return pct2pk;
}
protected double getAbsDiff(double[] histogram) {
double absdiff = 0.0D;
int diffSteps = 0;
for (int i = 1; i < histogram.length; i++) {
if (histogram[i-1] != histogram[i]) {
absdiff += Math.abs(histogram[i] - histogram[i-1]);
diffSteps++;
}
}
return absdiff / diffSteps;
}
private int getMaxima(double[] histogram, int[][] ranges) {
int maxima = 0;
double maxY = 0.0D;
for (int i = 0; i < ranges.length; i++) {
for (int gl = ranges[i][0]; gl < ranges[i][1]; gl++) {
if (histogram[gl] > maxY) {
maxY = histogram[gl];
maxima = gl;
}
}
}
return maxima;
}
private int getMinima(double[] histogram, int[] range) {
int start = range[0];
int end = range[1];
if (start == end) {
return start;
}
boolean forward = start < end;
double prevY = histogram[start];
double dy = 0.0D;
double prevDy = 0.0D;
if (forward) {
// avoid getting trapped in local minima
int minlookahead = start + HIST_5_PCT;
for (int pos = start + 1; pos < end; pos++) {
dy = histogram[pos] - prevY;
if (signdiff(dy, prevDy) && pos >= minlookahead) {
return pos;
}
prevY = histogram[pos];
prevDy = dy;
}
} else {
// avoid getting trapped in local minima
int minlookbehind = start - HIST_5_PCT;
for (int pos = start - 1; pos >= end; pos--) {
dy = histogram[pos] - prevY;
if (signdiff(dy, prevDy) && pos <= minlookbehind) {
return pos;
}
prevY = histogram[pos];
prevDy = dy;
}
}
return start;
}
private boolean signdiff(double dy, double prevDy) {
return ((dy < 0.0D && prevDy > 0.0D) ||
(dy > 0.0 && prevDy < 0.0D));
}
/**
* Downloads image file referenced by URL into a local file specified
* by filename for processing.
* @param url - URL for the image file.
* @param filename - the local filename for the file.
*/
protected void download(String url, String filename)
throws Exception {
URL u = new URL(url);
URLConnection conn = u.openConnection();
InputStream is = conn.getInputStream();
byte[] buf = new byte[4096];
int len = -1;
OutputStream os = new FileOutputStream(new File(filename));
while ((len = is.read(buf)) != -1) {
os.write(buf, 0, len);
}
os.close();
}
}
|
I tried to make it so the model would be dumped out into a serialized file after training, and the classification code (test) would only look at the serialized file, similar to how most other toolkits do it. Weka does provide a way to serialize the model, but the serialized model does not contain the data format (the header information in the training ARFF file) in order to classify unseen images, as explained in this thread. So since the training doesn't take that long, rather than supplying an empty ARFF file, I built it so that the classifier and the header from the training data (dataset in the code) are stored in memory and used for testing.
Client code to train the classifer and then use it in a streaming manner on unseen images is shown in the JUnit snippet below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | @Test
public void testModel() throws Exception {
PhotoOrDrawingClassifier classifier = new PhotoOrDrawingClassifier();
classifier.train("/path/to/pdc2.arff");
String filename = null;
// a file containing paths to the test set images
BufferedReader reader = new BufferedReader(
new FileReader(new File("/path/to/test.txt")));
while ((line = reader.readLine()) != null) {
String klass = classifier.test(filename);
System.out.println(filename + " => " + klass);
}
reader.close();
}
|
Running this code classifies the images in my test set, and comparing with my annotations gave me the following results (formatted similarly to how the Weka explorer produces its results, for ease of comparison).
1 2 | Correctly Classified Instances 177 88.5000 %
Incorrectly Classified Instances 23 11.5000 %
|
88.5% seems a bit low for such a simple task, but it can probably be improved by using some other classifier. I guess I need to go through the Weka book again to find candidate classifier algorithms that would work well with the dataset.
Another way to make the classification perform better may be to rethink it a bit. While collecting training and test instances, I found several images for which the classification is not clearcut. These are either image groups which consist of line drawings and photographs together, or labelled shaded color drawings, whose histograms would not have the choppiness associated with charts. When annotating my training and test sets, I used my (programmer's) judgement to classify them as one or the other based on "how close" it was to a photo or drawing. Perhaps we may get more accurate results if we considered classifying the images into more than just two groups.