Saturday, September 30, 2017

Serving Keras models using Tensorflow Serving


One of the reasons I have been optimistic about the addition of Keras as an API to Tensorflow is the possibility of using Tensorflow Serving (TF Serving), described by its creators as a flexible, high performance serving system for machine learning models, designed for production environments. There are also some instances of TF Serving being used in production outside Google, as described in Large Scale deployment of TF Serving at Zendesk. In the past I have built custom microservices that wrapped my machine learning models, which could then be consumed by client code in a language agnostic manner. But this is a repetitive task one has to do at some point for each new model being deployed, so the promise of a generic application into which I could just drop my trained model and have it be immediately available for use was too good to pass up, and I decided to check out TF Serving. In this post, I describe my experiences, hopefully it is helpful.

I installed TF Serving from source on my Linux Ubuntu 16.04 based notebook following the instructions on the TF Serving Installation page. This requires you to download the bazel build tool and install the grpc Python module. Compiling takes a while but is uneventful if you have all the prerequisites (listed in the instructions) set up correctly. Once done, the executables are available in the bazel-bin subdirectory under the TF Serving project root.

My initial thought was to create my model using Tensorflow and the embedded Keras API, that way the model would be serialized into the Tensorflow format rather than the HDF5 format that Keras uses. However, it turns out that TF Serving uses yet another format to serialize and export trained models, so you have to convert to it from either format. Hence there is no advantage to the hybrid Keras/TF approach over the pure Keras approach.

In fact, the hybrid Keras/TF approach has the problem of having to explicitly specify the learning_phase. Certain layers such as Dropout and BatchNormalization function differently during training and testing. Keras calls the fit() and predict() functions respectively during training and testing, so it is able to differentiate the necessary behaviors. Tensorflow, however, calls session.run() for both training and testing, so the learning_phase parameter needs to be supplied as an additional boolean placeholder tensor during this call for it to differentiate between the two steps.

I was able to build and train a hybrid CNN Keras/TF model to predict MNIST digits using the Keras API embedded in TF, and save it in a format that TF Serving recognized and is able to serve up through gRPC, but I was unable to consume the service successfully to do predictions. The error message indicates that the model expects an additional input parameter, which I suspect is the learning_phase. Another issue is that it forces me to input both image and label, an artefact of how I built the model to begin with. The labels need to be passed in because we are computing training accuracy. I didn't end up refactoring this code because I found a way to serve native Keras models directly using TF Serving, which I describe below. For completeness, the links below point to notebooks to build and train the hybrid CNN Keras/TF model, to serialize the resulting TF model to a form suitable for TF Serving, and the client code to consume the service offered by TF Serving.


In case you want to investigate this approach further, there are two open source projects that attempt to build on top of TF Serving. They are keras-serving and Amir Abdi's keras-to-tensorflow. Both start from native Keras models and convert them to TF graphs, so not exactly identical, but their code may give you ideas on how to get around the issues I described above.

Next I tried using a native Keras FCN model that was trained using an existing notebook. For what it is worth, this approach finds support in Francois Chollet's Keras as a simplified interface to TF (slightly outdated) blog post, as well as his Integrating Keras and Tensorflow: the Keras workflow, expanded presentation at the TF Dev Summit 2017. In addition, there are articles such as Exporting deep learning models from Keras to TF Serving which also advocate this approach.

I was able to adapt some code from TF Serving Issue # 310, specifically the suggestions from @tspthomas, in order to read the trained Keras model in HDF5 format, and save it to a format usable by TF Serving. The code to consume the service was adapted from a combination of the mnist_client.py example in the TF Serving distribution, plus some online sources. Links for the two notebooks are shown below.


TF Serving allows asynchronous mode operation where requests do not have to wait until the model does the prediction, as well as batched prediction payloads, where the client can send a batch of records for prediction at a time. However, I was only able to make it work synchronously and with one test record at a time. I feel that examples and better documentation would go a long way to increasing the usability (and production use outside Google) of this tool.

Also, as I learn more about TF, I am beginning to question the logic of the Keras move to tf.contrib.keras. Although, to give credit where it is due, my own effort to learn more TF is driven in large part because of this move. TF already has a Layers API which is very similar to the Keras abstraction. More in line with the TF way of doing things, these layers have explicit parameters which can be set to indicate the learning phase instead of a magic learning phase that is handled internally. Also, it appears that pure TF and pure Keras models are both handled well with TF Serving, so I don't see a need for a hybrid model anymore.

Overall, TF Serving appears to be powerful, at least for Keras and TF models. At some point, I hope the TF Serving team decides to make it more accessible to casual users by providing better instructions and examples, or possibly higher level APIs. But until then, I think I will continue with my custom microservices approach.


4 comments (moderated to prevent spam):

Anonymous said...

Hello, I enjoy reading all of your post. I wanted to write a
little comment to support you.

Sujit Pal said...

Hi Anonymous, glad you enjoy reading my posts, and thank you for your support.

Anonymous said...

hi sujit thanks for your post

do you also have an idea on if it is possible to avoid the disk for saving a model and transfer model (As json string) and weights (As numpy arrays) to a process and setup a model and run inference on it?

kindly advise.

Sujit Pal said...

That is a cool idea, your use case is probably to POST the model to a serving application? I don't know if there is anything out of the box, but Keras saves its models using HDF5, and there is a HDF5 to JSON converter available that would probably do what you want.