Modular Flux Framework install running LAMMPS

by vs
GNU/Linux ◆ xterm-256color ◆ bash 437 views

This is a demo that solves a problem that has been keeping me up for about a year. When using the Flux Operator, the application logic is tangled with Flux. You always have to install your application alongside Flux, which (imho) is a bad design. What we needed is a strategy that can add Flux to any application container on the fly.

However, given how tangled Flux is with shared libraries, etc. this would prove to be hard. Flux is written in C/C++ and you can’t just, for example, wget a Go binary. But recent work on HPCToolkit has led us to this idea that we can create an isolated view of a complex package like this, and then use some tricks to copy it over from one sidecar container into another. More specifically:

  • We build the software into an isolated spack “copy” view
  • The software is then (generally) at some /opt/view and /opt/software
  • The flux container is added as a sidecar container to your pod for your replicated job. Additional setup / configuration is done here, directly into the spack view. This step assumes that the destination path will be different, and writes files appropriately.
  • We can then create an empty volume that is shared by your metric or scaled application
  • The entire tree is copied over into the empty volume
  • When the copy is done, indicated by the final touch of a file (and I have a small Go library that I created just for waiting/watching for these events), final configuration is done. This usually means copying over the shared curve.certificate, which only the launcher owns and does, and checking paths, permissions, running user requested logic, etc.
  • It also means we install munge, which is arguably the slowest part of this setup (that could be improved upon)
  • The updated container entrypoint is then run. For a worker, we just start a follower broker and wait. For the launcher, we update the original command (a lammps command) and wrap it in a flux submit or run (or however you configure the addon).
  • Flux is then running your application!

It’s astounding! 🦩️

The reason this is so amazing is because the application logic is now separate from the logic to setup Flux. We do need to add views that are built on different OS (this original one was Rocky) and add support for running as the Flux user (I was mostly lazy) but wow - I pulled this off in an evening and am feeling so happy!