Monday, November 28, 2016

Image Preprocessing with OpenCV

In my last post, I mentioned that I presented at the Demystifying Deep Learning and Artificial Intelligence event at Oakland. My talk was about using Transfer Learning from and Fine tuning a Deep Convolutional Network (DCNN) trained on ImageNet to classify images in a different domain. The domain I chose was the images of the retina to detect varying stages of Diabetic Retinopathy (DR). The images came from the Diabetic Retinopathy competition on Kaggle.

In order to demonstrate the ideas mentioned in the presentation, I trained a few simple networks with a sample (1,000/35,000) of the data provided. My results were nowhere close to the competition winner, who achieved a Kappa score of 0.85 (a metric indicating agreement of predictions with labels), which is better than human performance (0.83 between a General Physicial and an Opthalmologist and 0.72 between an Optometrist and an Opthalmologist according to this forum post). Although my best model did achieve a Kappa score of 0.75 on my validation set, which puts me at around the 25-26 position on the public leaderboard.

The competition winner Benjamin Graham (min-pooling) posted his a description of his algorithm after the competition. One of the things he did was to preprocess the images so they had more uniformity in terms of brightness and shape. This made sense, since the images vary quite a bit along these dimensions, as you can see below.

I have been recently playing around with OpenCV, so I figured it would be interesting to apply some of these techniques to preprocess the images so they were more similar to each other. This post describes what I did.

I first tried to standardize on the size. As you can see, some images are more rectangular, with more empty space on the left and right, and some are more square. In fact, if you group loosely by aspect ratio, it turns out that there are three major size groups.

My first attempt at standardization was to find the edge of the circle representing the retina, then crop on the vertical tangent to the edge. I ended up not using this approach, but I include it here because I think it is interesting and maybe if I had more time and patience I might have figured out a way to use this approach instead of what I did.

The code to do so is shown below. The image is first read in as a grayscale image and converted to a matrix, then vertical and horizontal Sobel filters are applied to extract edges. Finally, we find the edge farthest from the center (approximated by the vertical center of the image) and crop vertically along this.

import cv2

def compute_edges(image):
    image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    image = cv2.GaussianBlur(image, (11, 11), 0)
    sobel_x = cv2.Sobel(image, cv2.CV_64F, 1, 0)
    sobel_x = np.uint8(np.absolute(sobel_x))
    sobel_y = cv2.Sobel(image, cv2.CV_64F, 0, 1)
    sobel_y = np.uint8(np.absolute(sobel_y))
    edged = cv2.bitwise_or(sobel_x, sobel_y)
    return edged    

def crop_image_to_edge(edged, threshold=10, margin=0.2):
    # find edge along center and crop
    mid_y = edged.shape[0] // 2
    notblack_x = np.where(edged[mid_y, :] >= threshold)[0]
    if notblack_x.shape[0] == 0:
        lb_x = 0
        ub_x = edged.shape[1]
        lb_x = notblack_x[0]
        ub_x = notblack_x[-1]
    if lb_x > margin * edged.shape[1]:
        lb_x = 0
    if (edged.shape[1] - ub_x) > margin * edged.shape[1]:
        ub_x = edged.shape[1]        
    mid_x = edged.shape[1] // 2
    notblack_y = np.where(edged[:, mid_x] >= threshold)[0]
    if notblack_y.shape[0] == 0:
        lb_y = 0
        ub_y = edged.shape[0]
        lb_y = notblack_y[0]
        ub_y = notblack_y[-1]
    if lb_y > margin * edged.shape[0]:
        lb_y = 0
    if (edged.shape[0] - ub_y) > margin * edged.shape[0]:
        ub_y = edged.shape[0]
    cropped = edged[lb_y:ub_y, lb_x:ub_x, :]
    return cropped

image = cv2.imread(image_name)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # left image
edged = compute_edges(gray)                    # middle image
cropped = crop_image_to_edge(gray)             # right image

Although in this (and lots of other) cases, this gave me good results, but it failed on some where the edge could not be detected because the image was so dark. Also, as you can see from the histogram on the left below, aspect ratios of the original uncropped images had two distinct clusters, but after the cropping operation, the distribution is all over the place. Our objective was to standardize on the aspect ratio after the cropping operation, the kind of scenario shown on the histogram on the right.

The approach I came up with was to eyeball the aspect ratios. Most of them were around 1.3 and 1.5, so I decided based on some manual cropping that the best aspect ratio is around 1.2. The resulting histogram of aspect ratios is the one on the right above.

def crop_image_to_aspect(image, tar=1.2):
    # load image
    image_bw = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
    # compute aspect ratio
    h, w = image_bw.shape[0], image_bw.shape[1]
    sar = h / w if h > w else w / h
    if sar < tar:
        return image
        k = 0.5 * (1.0 - (tar / sar))
        if h > w:
            lb = int(k * h)
            ub = h - lb
            cropped = image[lb:ub, :, :]
            lb = int(k * w)
            ub = w - lb
            cropped = image[:, lb:ub, :]
        return cropped

cropped = crop_image_to_aspect(image)

This is what the random sample of 9 retina images looks like after the cropping operation.

Next I tried looking at standardizing the brightnesses. Benjamin Graham's report suggests just subtracting the mean pixel value from each RGB channel, but I decided to do something a little fancier. First I converted each image to the HSV (Hue, Saturation, Value) color space and computed the mean value of V across all images in my sample. The value of V is a measure of the brightness of the image. I then computed the mean V per image. I then added the global V mean and subtracted the local V mean from each V, and converted it back to RGB.

def brighten_image_hsv(image, global_mean_v):
    image_hsv = cv2.cvtColor(image, cv2.COLOR_RGB2HSV)
    h, s, v = cv2.split(image_hsv)
    mean_v = int(np.mean(v))
    v = v - mean_v + global_mean_v
    image_hsv = cv2.merge((h, s, v))
    image_bright = cv2.cvtColor(image_hsv, cv2.COLOR_HSV2RGB)
    return image_bright

vs = []
for image_dir, image_name in get_next_image_loc(DATA_DIR):
    image = cv2.imread(os.path.join(DATA_DIR, image_dir, image_name))
    image_hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
    h, s, v = cv2.split(image_hsv)
global_mean_v = int(np.mean(np.array(vs)))

brightened = brighten_image_hsv(resized, global_mean_v)

As expected, this mean centering operation converts a somewhat skewed distribution of brightnesses to a more balanced one.

After mean centering the brightness values and converting back to RGB, our sample of 9 retina images looks like this. The resulting images are not as clean as the examples shown in the winner's competition report, where he mean centered directly on RGB. But the brightness does look roughly equal now.

In order to mean center by RGB, we compute the global mean of R, G and B channels across all the images, then subtract the individual R, G, and B channel means from the image. Code to do this is shown below:

def brighten_image_rgb(image, global_mean_rgb):
    r, g, b = cv2.split(image)
    m = np.array([np.mean(r), np.mean(g), np.mean(b)])
    brightened = image + global_mean_v - m
    return brightened
mean_rgbs = []
for image_dir, image_name in get_next_image_loc(DATA_DIR):
    image = cv2.imread(os.path.join(DATA_DIR, image_dir, image_name))
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    r, g, b = cv2.split(image_rgb)
    mean_rgbs.append(np.array([np.mean(r), np.mean(g), np.mean(b)]))
global_mean_rgbs = np.mean(mean_rgbs, axis=0)

brightened = brighten_image_rgb(resized, global_mean_rgbs)

The set of sample images, brightened by RGB channel, looks like this:

Sadly, the preprocessing does not actually translate to higher accuracy or Kappa scores. In fact, resizing and brightening the image using HSV results in a Kappa score of 0.68. Resizing and brightening using RGB results in a Kappa score of 0.61. Kappa score without pre-processing was 0.75. So preprocessing images actually had a negative effect in my case. However, knowing how to do this is good to know for the future, so I think it was time well spent.

The entire code for preprocessing the sample images, as well as printing a random sample of 9 images at each step, is available here in my project on GitHub.

4 comments (moderated to prevent spam):

Naveed Afzal said...


Great post. Have you posted anything related to computing tree edit distance algorithm. I am trying to compute tree edit distance between dependency tree of sentences and there exist a zss library in python to do that but there is not much information available regarding it. If you can please explain the working of that library and it's use that would be really great as you explain each step in very detail and that would really help me in understanding that library.

Thanks again and keep up the good work.

Sujit Pal said...

Thanks for the kind words, Naveed. Regarding your question about whether I have posted anything about computing tree based edit distances, here is one I wrote about couple of months back. I haven't used zss, thanks for the pointer, will look at it. There seems to be some decent documentation on zss here, not sure if this is detailed enough for what you are looking at though.

Naveed Afzal said...

Thanks for reply. I am trying to write a wrapper in python that convert spacy dependency tree to the format required by zss library in order to compute tree edit distance. I am not able to accomplish this goal so far and there is not much help available for either of these libraries. Can you please write a post regarding this issue. As most of your post are self-explanatory and easy to understand and follow. Thanks

Sujit Pal said...

Might be something interesting to do, I will put it in my queue, thanks for the suggestion.