In machine learning, serving a trained model means making it available for people to use it to get predictions from their data and it is a fundamental step of bringing any NLP research outcome to production.
Here we will see how to set up a high-performing inference server capable of running models saved in different formats. We will be using the TensorRT Inference Server (TRTIS from now on), developed by nvidia. I’ll show how to deploy a model created with Tensorflow Keras, but TRTIS supports many popular ML model serialization formats, such as ONNX, PyTorch, and Caffe2.
Here is what we’ll be doing:
- train a sentiment analysis model and serialize it to disk
- setup TRTIS using docker
- deploy the sentiment analysis model
- send a few requests via HTTP to the inference server and get back the sentiment predictions
Share this recording
Link
Append ?t=30
to start the playback at 30s, ?t=3:20
to start the playback at 3m 20s.
Embed image link
Use snippets below to display a screenshot linking to this recording.
Useful in places where scripts are not allowed (e.g. in a project's README file).
HTML:
Markdown:
Embed the player
If you're embedding on your own page or on a site which permits script tags, you can use the full player widget:
Paste the above script tag where you want the player to be displayed on your page.
See embedding docs for additional options.
Download this recording
You can download this recording in asciicast v2 format, as a .cast file.
DownloadReplay in terminal
You can replay the downloaded recording in your terminal using the
asciinema play
command:
asciinema play 311662.cast
If you don't have asciinema CLI installed then see installation instructions.
Use with stand-alone player on your website
Download asciinema player from
the releases page
(you only need .js
and .css
file), then use it like this:
<!DOCTYPE html>
<html>
<head>
<link rel="stylesheet" type="text/css" href="asciinema-player.css" />
</head>
<body>
<div id="player"></div>
<script src="asciinema-player.min.js"></script>
<script>
AsciinemaPlayer.create(
'/assets/311662.cast',
document.getElementById('player'),
{ cols: 133, rows: 25 }
);
</script>
</body>
</html>
See asciinema player quick-start guide for full usage instructions.
Generate GIF from this recording
While this site doesn't offer GIF conversion at the moment, you can still do it yourself with the help of asciinema GIF generator utility - agg.
Once you have it installed run the following command to create GIF file:
agg https://asciinema.org/a/311662 311662.gif
Or, if you already downloaded the recording file:
agg 311662.cast 311662.gif
Check agg --help
for all available options. You can change font
family and size, select color theme, adjust speed and more.
See agg manual for full usage instructions.