Machine Learning: serving models with Kubeflow on Ubuntu, Part 1

Carmine Rimi

on 8 July 2019

Tags: AI/ML , Kubeflow , machine learning , MLOps

This article was last updated 2 years ago.

This article is the first in a series of machine learning articles focusing on model serving. I assume you’re reading this article because you’re excited about machine learning and quite possibly Kubeflow as well. You might have done some model training and are now trying to understand how to serve those models in production. There are many ways to serve a trained model in both Kubeflow and outside of Kubeflow. This post should help the reader explore some of the alternatives and what to consider.

Here’s a summary of what we’ll explore in this article:

What is model serving?
How do applications interact with models?
What is the Kubeflow approach to model serving?
Model serving examples
Developer Setup

As the title suggests, this article is only the first part in a series of posts. Sign up to the newsletter to be notified of the next post in this series, as well as technical posts discussing:

TensorFlow Serving
TensorRT Serving
TensorFlow.js
Seldon Core
Kubeflow Serving

What is model serving?

In simple terms, it is making a trained model available to other software components. How you’ve arrived at a trained model – what framework you used to produce it – will play a role in what options are available to you. And you may not have produced the trained the model yourself – there are open source, pre-trained models that can be used today, models that were trained on data that you may not have access to. BERT is an example of an area that produces pre-trained models. We’ll discuss BERT in more detail in a future article.

How do applications interact with models?

Probably the most immediate concern is determining how you want to integrate the model into your application. Should it be embedded? Should other systems be able to access it? Is scaling a concern?

For embedded model serving, the model can be compiled into the application and accessed via native function calls. This could be done within a Python application, or it could be done from within a JavaScript application in a browser.

For API model serving – where others can access your model dynamically – the most common approach is to put a REST API in front of the model. Most of the popular frameworks like TensorFlow come with native mechanisms for this, and there are some links below. But API model serving creates another concern – does it need to scale? For instance, assume your model can handle 100 requests a second. Is that enough? Could there be a spike of 5000 requests a second? If so, you need to think about scaling the model.

What is the Kubeflow approach to model serving?

Fortunately there are a few frameworks included with Kubeflow that will help accomplish both tasks – put an API in front of your model, and allow it to scale based on demand. The Kubeflow community has included a couple of examples, using different frameworks – a TensorFlow serving example and a Seldon example. The community is also in the middle of creating a new, generic approach to model serving. This new approach is in flight and we will write about this more later, once it is closer to release.

Model serving examples

Using a crawl, walk, run approach, one of the best next steps is to run through some of the examples below so that you can get grounded in the manual approach to serving models. After a low level understanding of how these things work, try the more automated approach with Kubeflow. In summary, if you are just getting started, I suggest these steps:

Developer Setup

An easy way to explore the examples above is to get access to the Ubuntu platform. This starts with the Ubuntu operating system. If you’re on a Windows or a Mac desktop, you can start with Multipass – a native application for Windows, Mac, and Linux that will let you create a virtual machine. Here’s a complete list of software that you are free to use:

Multipass – A mini-cloud on your Mac, Windows or Linux workstation.
MicroK8s – A single package of K8s that installs on Linux
Kubeflow – The Machine Learning Toolkit for Kubernetes

Resources

TensorFlow Serving – BASIC Example
TensorFlow Serving – REST Example
TensorFlow Serving – Kubernetes Example
Model Serving in PyTorch – a PyTorch example
BERT Server – a dedicated approach to serving this type of model
Machine Learning Systems – an entire book devoted to the topic

Run Kubeflow anywhere, easily

With Charmed Kubeflow, deployment and operations of Kubeflow are easy for any scenario.

Charmed Kubeflow is a collection of Python operators that define integration of the apps inside Kubeflow, like katib or pipelines-ui.

Use Kubeflow on-prem, desktop, edge, public cloud and multi-cloud.

Learn more about Charmed Kubeflow ›

What is Kubeflow?

Kubeflow makes deployments of Machine Learning workflows on Kubernetes simple, portable and scalable.

Kubeflow is the machine learning toolkit for Kubernetes. It extends Kubernetes ability to run independent and configurable steps, with machine learning specific frameworks and libraries.

Learn more about Kubeflow ›

Install Kubeflow

The Kubeflow project is dedicated to making deployments of machine learning workflows on Kubernetes simple, portable and scalable.

You can install Kubeflow on your workstation, local server or public cloud VM. It is easy to install with MicroK8s on any of these environments and can be scaled to high-availability.

Install Kubeflow ›

Machine Learning: serving models with Kubeflow on Ubuntu, Part 1

Carmine Rimi

What is model serving?

How do applications interact with models?

What is the Kubeflow approach to model serving?

Model serving examples

Developer Setup

Resources

Run Kubeflow anywhere, easily

What is Kubeflow?

Install Kubeflow

Newsletter signup

Related posts

Edge AI: what, why and how with open source

AI in 2024 – What does the future hold?

Meet Canonical at KubeCon + CloudNativeCon North America 2024