Monday, November 28, 2016

Image Preprocessing with OpenCV

In my last post, I mentioned that I presented at the Demystifying Deep Learning and Artificial Intelligence event at Oakland. My talk was about using Transfer Learning from and Fine tuning a Deep Convolutional Network (DCNN) trained on ImageNet to classify images in a different domain. The domain I chose was the images of the retina to detect varying stages of Diabetic Retinopathy (DR). The images came from the Diabetic Retinopathy competition on Kaggle.

In order to demonstrate the ideas mentioned in the presentation, I trained a few simple networks with a sample (1,000/35,000) of the data provided. My results were nowhere close to the competition winner, who achieved a Kappa score of 0.85 (a metric indicating agreement of predictions with labels), which is better than human performance (0.83 between a General Physicial and an Opthalmologist and 0.72 between an Optometrist and an Opthalmologist according to this forum post). Although my best model did achieve a Kappa score of 0.75 on my validation set, which puts me at around the 25-26 position on the public leaderboard.

The competition winner Benjamin Graham (min-pooling) posted his a description of his algorithm after the competition. One of the things he did was to preprocess the images so they had more uniformity in terms of brightness and shape. This made sense, since the images vary quite a bit along these dimensions, as you can see below.

I have been recently playing around with OpenCV, so I figured it would be interesting to apply some of these techniques to preprocess the images so they were more similar to each other. This post describes what I did.

I first tried to standardize on the size. As you can see, some images are more rectangular, with more empty space on the left and right, and some are more square. In fact, if you group loosely by aspect ratio, it turns out that there are three major size groups.

My first attempt at standardization was to find the edge of the circle representing the retina, then crop on the vertical tangent to the edge. I ended up not using this approach, but I include it here because I think it is interesting and maybe if I had more time and patience I might have figured out a way to use this approach instead of what I did.

The code to do so is shown below. The image is first read in as a grayscale image and converted to a matrix, then vertical and horizontal Sobel filters are applied to extract edges. Finally, we find the edge farthest from the center (approximated by the vertical center of the image) and crop vertically along this.

import cv2

def compute_edges(image):
    image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    image = cv2.GaussianBlur(image, (11, 11), 0)
    sobel_x = cv2.Sobel(image, cv2.CV_64F, 1, 0)
    sobel_x = np.uint8(np.absolute(sobel_x))
    sobel_y = cv2.Sobel(image, cv2.CV_64F, 0, 1)
    sobel_y = np.uint8(np.absolute(sobel_y))
    edged = cv2.bitwise_or(sobel_x, sobel_y)
    return edged    

def crop_image_to_edge(edged, threshold=10, margin=0.2):
    # find edge along center and crop
    mid_y = edged.shape[0] // 2
    notblack_x = np.where(edged[mid_y, :] >= threshold)[0]
    if notblack_x.shape[0] == 0:
        lb_x = 0
        ub_x = edged.shape[1]
        lb_x = notblack_x[0]
        ub_x = notblack_x[-1]
    if lb_x > margin * edged.shape[1]:
        lb_x = 0
    if (edged.shape[1] - ub_x) > margin * edged.shape[1]:
        ub_x = edged.shape[1]        
    mid_x = edged.shape[1] // 2
    notblack_y = np.where(edged[:, mid_x] >= threshold)[0]
    if notblack_y.shape[0] == 0:
        lb_y = 0
        ub_y = edged.shape[0]
        lb_y = notblack_y[0]
        ub_y = notblack_y[-1]
    if lb_y > margin * edged.shape[0]:
        lb_y = 0
    if (edged.shape[0] - ub_y) > margin * edged.shape[0]:
        ub_y = edged.shape[0]
    cropped = edged[lb_y:ub_y, lb_x:ub_x, :]
    return cropped

image = cv2.imread(image_name)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # left image
edged = compute_edges(gray)                    # middle image
cropped = crop_image_to_edge(gray)             # right image

Although in this (and lots of other) cases, this gave me good results, but it failed on some where the edge could not be detected because the image was so dark. Also, as you can see from the histogram on the left below, aspect ratios of the original uncropped images had two distinct clusters, but after the cropping operation, the distribution is all over the place. Our objective was to standardize on the aspect ratio after the cropping operation, the kind of scenario shown on the histogram on the right.

The approach I came up with was to eyeball the aspect ratios. Most of them were around 1.3 and 1.5, so I decided based on some manual cropping that the best aspect ratio is around 1.2. The resulting histogram of aspect ratios is the one on the right above.

def crop_image_to_aspect(image, tar=1.2):
    # load image
    image_bw = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
    # compute aspect ratio
    h, w = image_bw.shape[0], image_bw.shape[1]
    sar = h / w if h > w else w / h
    if sar < tar:
        return image
        k = 0.5 * (1.0 - (tar / sar))
        if h > w:
            lb = int(k * h)
            ub = h - lb
            cropped = image[lb:ub, :, :]
            lb = int(k * w)
            ub = w - lb
            cropped = image[:, lb:ub, :]
        return cropped

cropped = crop_image_to_aspect(image)

This is what the random sample of 9 retina images looks like after the cropping operation.

Next I tried looking at standardizing the brightnesses. Benjamin Graham's report suggests just subtracting the mean pixel value from each RGB channel, but I decided to do something a little fancier. First I converted each image to the HSV (Hue, Saturation, Value) color space and computed the mean value of V across all images in my sample. The value of V is a measure of the brightness of the image. I then computed the mean V per image. I then added the global V mean and subtracted the local V mean from each V, and converted it back to RGB.

def brighten_image_hsv(image, global_mean_v):
    image_hsv = cv2.cvtColor(image, cv2.COLOR_RGB2HSV)
    h, s, v = cv2.split(image_hsv)
    mean_v = int(np.mean(v))
    v = v - mean_v + global_mean_v
    image_hsv = cv2.merge((h, s, v))
    image_bright = cv2.cvtColor(image_hsv, cv2.COLOR_HSV2RGB)
    return image_bright

vs = []
for image_dir, image_name in get_next_image_loc(DATA_DIR):
    image = cv2.imread(os.path.join(DATA_DIR, image_dir, image_name))
    image_hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
    h, s, v = cv2.split(image_hsv)
global_mean_v = int(np.mean(np.array(vs)))

brightened = brighten_image_hsv(resized, global_mean_v)

As expected, this mean centering operation converts a somewhat skewed distribution of brightnesses to a more balanced one.

After mean centering the brightness values and converting back to RGB, our sample of 9 retina images looks like this. The resulting images are not as clean as the examples shown in the winner's competition report, where he mean centered directly on RGB. But the brightness does look roughly equal now.

In order to mean center by RGB, we compute the global mean of R, G and B channels across all the images, then subtract the individual R, G, and B channel means from the image. Code to do this is shown below:

def brighten_image_rgb(image, global_mean_rgb):
    r, g, b = cv2.split(image)
    m = np.array([np.mean(r), np.mean(g), np.mean(b)])
    brightened = image + global_mean_v - m
    return brightened
mean_rgbs = []
for image_dir, image_name in get_next_image_loc(DATA_DIR):
    image = cv2.imread(os.path.join(DATA_DIR, image_dir, image_name))
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    r, g, b = cv2.split(image_rgb)
    mean_rgbs.append(np.array([np.mean(r), np.mean(g), np.mean(b)]))
global_mean_rgbs = np.mean(mean_rgbs, axis=0)

brightened = brighten_image_rgb(resized, global_mean_rgbs)

The set of sample images, brightened by RGB channel, looks like this:

Sadly, the preprocessing does not actually translate to higher accuracy or Kappa scores. In fact, resizing and brightening the image using HSV results in a Kappa score of 0.68. Resizing and brightening using RGB results in a Kappa score of 0.61. Kappa score without pre-processing was 0.75. So preprocessing images actually had a negative effect in my case. However, knowing how to do this is good to know for the future, so I think it was time well spent.

The entire code for preprocessing the sample images, as well as printing a random sample of 9 images at each step, is available here in my project on GitHub.

Sunday, November 20, 2016

Trip Report: Demystifying Deep Learning and Artificial Intelligence @ Oakland

This weekend, I was at Oakland, attending the Demystifying Deep Learning and Artificial Intelligence Workshop. The workshop was organized by Accel.AI. The goal of the workshop was to bring together people who are looking to get into Artificial Intelligence and Deep Learning with people who are a little further along in this journey. Some of us from the Deep Learning Enthusiasts meetup group at San Francisco, including myself, presented at this workshop. This is my trip report of the event.

There was an Introductory and Advanced track that ran in parallel. I attended all the advanced tracks and one introductory track. I came away with the humbling realization that my knowledge is filled with more holes than a slice of Swiss cheese. While I understand Deep Learning well enough to build models and get results, each time I hear someone speak, I invariably come away with a fresh perspective about something I hadn't thought about before.

Below I provide a summary of the talks I attended. I don't have the slides and/or github repositories, but once they are made available I will update the post.

Day 1 (Saturday)

Internal workings of a convnet and the process of implementing it on Spark - by Jeremy Nixon

I got in late, and missed the first 10-15 minutes of the talk. It was a good introduction to Convolutional Neural Networks (CNN). One thing I got from this talk was a different way of thinking about weight sharing in CNNs. Instead of each neuron in a layer connecting to all the neurons in the next layer as happens for Fully Connected Networks, the talk described neurons in a CNN as connecting to their corresponding neuron and its immediate neighbors in the next layer. I thought that was quite insightful, compared to my prior mental model of alternating convolutions and pooling. Jeremy also briefly touched upon how a DL model would be implemented on Spark (using a parameter server). I spoke to him briefly after the talk and it turns out that a Spark based CNN is under development and should be available shortly as part of Spark.

Interactive Group Presentations

Attendees from both tracks got together and were broken up into 10 groups. Each group was given a topic and 5 minutes to come up with a short group presentation. We got Convolutional Neural Networks. Most groups presented their topics in non-ML terms, such as K-Nearest Neighbors (KNN) in terms of asking what your friends are getting for lunch and ordering based on that. Our presentation was a bit more specialized and computer science-y, mainly because we didn't anticipate that other groups would do it that way. Also, we couldn't think of a way to present CNNs in that way.

Overfitting and regularization in Machine Learning - by Dmitry Lituev

Dmitry presented examples of overfitting and subsequent regularization for different Machine Learning (including Deep Learning) models. He demo-ed overfitting and fixing them with Regularization and Dropout on Scikit-Learn and Keras models. You can find his Jupyter Notebooks here. While I knew about regularization and dropout, Dmitry's presentation helped me truly appreciate what happens during the regularization process.

Transfer Learning and Fine Tuning for Cross-Domain Image Classification with Keras - by Sujit Pal

This was my presentation. I had done some work with Transfer Learning using Caffe pretrained models in the past, so I decided to try it again (including Fine Tuning) using Keras pretrained models and (a sample of) the Diabetic Retinopathy Competition data from Kaggle. I covered transfer learning and fine tuning a pretrained VGG-16 network trained with IMAGENET data. Here are the slides for the talk, and the Github repository for the code.

Day 2 (Sunday)

Deep Learning for Recommendation systems - by Rumman Chowdry

Once again, I missed the first 10-15 minutes. Fortunately, the first part of Rumman's talk had quite an in-depth coverage of Recommendation System basics, so I got in before she started on the Deep Learning part, the reason I wanted to attend her talk in the first place. The Deep Learning models she discussed in her talk were Google's Deep and Wide Network, Spotify's DL based Music Recommender and Youtube's Deep Neural Network based Recommender. Very interesting approaches, definitely something to look into in the coming months.

Introduction to Deep Learning for Images in Keras - by St├ęphane Egly and Malaikannan Sankarasubbu

One of the speakers in the Advanced track had to reschedule, so I got to go to the second part of this talk. Malaikannan showed a very good visualization for convolutions which I liked very much. He has been working with Deep Learning full time for at least the last two years. Thanks to him, I learned that his company,, has open-sourced Recurrent Shop, a framework for building complex recurrent neural networks with Keras. I also had a nice conversation with St├ęphane Egly after the presentation.

Lightning Talks

This section had 4 lightning talks, each about 10-15 minutes long.

Did Big Data Fail us in the Presidential Elections? - by Rumman Chowdry

Rumman pointed out various errors in polling from the point of view of a Data Scientist. The discussion was mostly around error margins and how they didn't carry over into the media reports to the public. She concluded that we as Data Scientists failed Big Data by failing to educate the media and subsequently the general public.

Using Convolutional Neural Networks to classify Monet paintings - by Samuel Bozek

Samuel describes his CNN that classifies Monet paintings with 85% accuracy. It is inspired by A Neural Algorithm of Artistic Style. In the talk I learned that Monet suffered from Macular Degeneration over his lifetime and his painting style reflects that, and can be subdivided into 3 distinct genres. The classifier has better performance on the early and middle periods than on the late period.

Developing Chatbots with AI - by Masha Kubyshina

Masha is an experienced chatbot consultant who decided to experiment with a different way to build chatbots. Rather than be driven by the development team or client, she decided to let users build their own chatbot. The participants in her experiment was her school-age daughter and her classmates. She demo-ed an ice-cream recipe chatbot, imagined and created by her daughter and friends, built on the Recast.AI platform.

Incorporating ML into Robotics and Computer Vision - by Carlos Uranga

Carlos is from Singularity University and he spoke of the Singularity when artificial intelligence will be able to learn by itself without help from humans. He showed a wearable EEG device (a helmet) that can be used to train an artificial hand to move with the power of thought. He also described an AI bartender that can learn how to mix drinks based on your personality.

The lightning talks were then followed by the following 2 regular talks.

In depth look at Word2Vec - by Andy Zhang

Andy's presentation is an attempt to describe Word2Vec from first principles. Andy led us through a bunch of simple examples and described how different pairs of words would align differently in the Word2Vec vector space, and how these alignments match up with our intuitions as demonstrated by word analogies. He also spoke very briefly about extending the idea to images (and images jointly with text).

Exploding / Vanishing Gradient Problem - by Alex Shim

Alex described why exploding or vanishing gradients occur in Recurrent Neural Networks (RNN), and described the internals of Long Short Term Memory (LSTM) and the modification for Gated Recurrent Unit (GRU). So far, I had taken the forget gate on faith (i.e, accepted that it does what it does without thinking too much about it), but Alex's talk gave me a good idea of how the forget gate operates to keep exploding and vanishing gradients in check.

Overall, I thought the event went very well. I learned quite a few things and made a number of new friends with whom I can compare notes in the future. Unlike more traditional conference/workshop settings, the presentation had lots of time for questions and interactions between the speaker and audience. Congratulations to the organizer Laura Montoya and the volunteers for such a great job!