Skip to main content
Version: v0.9.0

Just-in-time Workload Clusters

Nova optionally supports putting an idle workload cluster into standby state, to reduce resource costs in the cloud. When a standby workload cluster is needed to satisfy a Nova scheduling operation, the cluster is brought out of standby state. Nova can also optionally create additional cloud clusters, cloned from existing workload clusters, to satisfy the needs of policy-based or capacity-based scheduling.

Suspend/Resume Standby Mode

In "suspend/resume" standby mode (default), all node groups/pools in a cluster in standby state are set to node count 0. This setting change causes removal of all cloud cluster resources (except those in the hidden cloud provider control plane) in ~2 minutes. In standby, the status of the non-Nova-scheduled items (including the Nova agent) that are deployed in the cluster switches to pending. EKS, GKE, and standard-tier AKS clusters in standby state cost $0.10/hour.

When the cluster exits standby, the node group/pool node counts are set back to their original values, which had been recorded by Nova in the cluster's custom resource object. This setting change causes the restoration of the cluster resources in ~2 minutes, allowing its pending items (including the Nova agent) to resume running as well as allowing Nova-scheduled items to be placed successfully.

Delete/Recreate Standby Mode

In "delete/recreate" standby mode (optional alternative to suspend/resume mode), a workload cluster in standby state is completely deleted from the cloud, taking ~3-10 minutes.

When the cluster exits standby, the cluster is recreated in the cloud, taking ~3-15 minutes, and the Nova agent objects are redeployed. The "delete/recreate" standby mode engenders greater cost savings than "suspend/resume", but the latencies to enter and exit standby state are significantly higher.

Cluster Create/Clone

When the "create" option is enabled, a workload cluster is created via cloning an existing accessible (i.e., ready or can become ready via exiting standby) cluster to satisfy the needs of policy-based or capacity-based scheduling. Cluster creation depends on the Nova deployment containing a cluster appropriate for cloning, i.e., that there is an existing accessible cluster that satisfies the scheduling policy constraints and resource capacity needs of the placement, but mismatches either the policy's specified cluster name or the placement's needed resource availability.

The "create" option requires that "delete/recreate" standby mode be enabled. Created clusters can subsequently enter standby state. The number of clusters that Nova will create has a configurable limit.

Note that Nova with the "create" option enabled will not choose to create a cluster to satisfy resource availability if it detects any existing accessible candidate target clusters have cluster autoscaling enabled. Instead Nova will choose placement on an accessible autoscaled cluster. Nova's cluster autoscaling detection works for installations of Elotl Luna and of the Kubernetes Cluster Autoscaler.