GNU/Linux xterm-256color bash 141 views

This demonstrates using k8shazgpu on Kubernetes to perform a “vLLM run” where we launch changes to our vLLM code on a remote Kubernetes cluster, which handles all the GPU allocation for us, and we get almost instant feedback about our changes, and can even queue up runs when the server is loaded and get results when its finished.