macOS ◆ xterm-256color ◆ bash 139 views

Demonstration showing our ability to handle errors that are not resolved by reboot.

Flow:

  • Disable kubelet on worker-1
  • MachineHealthCheck detects the failure and places the external remediation annotation
  • Metal3 actuator saves the existing Node labels and annotations
  • Metal3 actuator requests the node be power cycled
  • Node does not return to a healthy state within the expected 10 minutes
  • Metal3 actuator deletes the Node and Machine, causing a reprovisioning cycle
  • Replacement Machine is added to the cluster
  • (Not shown) Metal3 actuator restores labels and annotations