Luna: a Cluster Autoscaler
Luna is an intelligent autoscaler for Kubernetes clusters running in public clouds (EKS/GKE/OKE/AKS). It provisions and cleans up Kubernetes nodes as needed based on your workloads’ requirements.
Because each workload is different, Luna uses the pods’ requirements to provision the best nodes for each workload.
Pods with heavy and/or special requirements are assigned to dedicated nodes, while pods with light requirements are placed on shared nodes to optimize cost, resiliency, and availability.
By default Luna will consider pods labeled with
elotl-luna=true. You can instead
set custom tags to match your workload if you desire.
There’s no need to modify your workload otherwise. Just add a label or annotation to your pods, or configure Luna to match existing labels or annotations, and Luna will be able to run them dynamically and efficiently.
Shared and dedicated nodes
Luna allows administrators to set the parameters that Luna uses to decide which pods run on their own nodes, and which run on shared nodes.
Dedicated nodes are useful to ensure optimal performance of large or special pods. Nodes with GPU are all dedicated nodes, and cannot be used as shared nodes.
Shared nodes are used for pods with modest requirements. Luna allows administrators to configure these nodes’ specification and lifecycle.
Specify instance famillies
Luna allows you to include and exclude instance famillies when making scaling decisions. This is useful because sometimes the cheapest instance that matches the given requirements isn’t good enough. For example, a pod with small CPU and memory requirements may do large number of IOs and cheaper instance types can be limited in that area.
Specify GPU requirements
Luna supports GPU instances for dedicated nodes. You can specify the type of GPU you would like to run on your nodes:
Spot For EKS and AKS
Luna can run fault-tolerant workloads via Spot on EKS and AKS. Spot nodes can be up to 90% cheaper than regular ones, but Spot nodes may be interrupted at any time. If you have fault-tolerant workloads, running them on Spot can reduce costs significantly.
Luna can optionally consider arm64 nodes, in addition to amd and intel nodes. If you use this option, your images must be compatible with both amd64 and arm64 architectures.
Persistent Volume Claim support
You can configure Luna’s behavior for persistent volume claims depending on what you wish to achieve.
By default Luna ignores pods with bounded Persistent Volume Claims (PVC), unless the placeBoundPVC parameter is true. Luna can also ignore pods with local storage.
Luna allows you to use AWS Fargate to further optimize your cloud cost. AWS Fargate nodes have flexible specifications. EC2 virtual machines have pre-determined configurations, while Fargate allows you to use exactly the amount of CPU and memory needed.
If cost optimization is important and your workloads don’t fit well on preset EC2 instances, Fargate may help reduce costs further while running Luna.
Luna helps you to:
Simplify cluster operations (no need to create/maintain cluster autoscaling knobs)
Empower DevOps to focus on building their core business (quicker go-to-market)
Prevent wasted spend
Multiple cloud providers
How Luna works
Luna watches pods on your Kubernetes cluster and creates nodes based on the new pods’ requirements. Depending on the resource needs, Luna will allocate one or more nodes. In the case of pods with lower resource needs, bin-packing will be chosen where compute is allocated for multiple pods. For larger pods, bin-selection will be chosen; in this case, the right size node will be allocated for each pod. For other use-cases such as workloads with GPU requirements, bin-selection is used. As pods terminate and compute can be reclaimed, Luna will automatically clean up and remove the unneeded nodes it is managing.
Luna chooses the most economical available node type that satisfies the bin-packing or bin-selection resource requirements. By default, Luna chooses x86_64 nodes. If the option includeArmInstance is enabled, Luna will also consider ARM64 nodes; in this case, the workloads to be placed by Luna must use multi-architecture images that include both x86_64 and ARM64.
Shared nodes with bin-packing
In this mode, Luna tries to bin-pack each pod taken into consideration on bigger nodes. Each pod will get a fixed nodeSelector: node.elotl.co/destination=bin-packing and will be scheduled on one of bigger nodes that luna starts with this label set.
Dedicated nodes with bin-selection
In this mode, each pod taken into consideration will get a nodeSelector with value for a new node, right-sized based on the pod's resource request.
For finding the best node, the following method is used:
Luna will take a list of all instance types, then that list will be narrowed down to the ones that can handle resource request. Then, Luna sorts them based on on-demand pricing (cheapest first) and Luna picks the first one (the cheapest).
As compared to bin-packing, node scale down with bin-selection is more aggressive.
The below diagram illustrates both bin-packing and bin-selection.
Pods considered by Luna
By default, Luna provisions compute for pods marked with specific labels or annotations (set by the labels or podAnnotations parameter) with the following exceptions:
- Daemonset pods are not considered.
- Pods in kube-system namespace are not considered, unless that namespace is removed from the namespacesExclude parameter.
- Pods with an existing node selector (pod.Spec.NodeSelector) are not considered, unless the placeNodeSelector parameter is set to true.
- Pods with bounded Persistent Volume Claims (PVC) are not considered, unless the placeBoundPVC parameter is set to true.
If a non-daemonset pod is not being considered by Luna, it won't be placed on Luna-allocated nodes.
Luna does not manage all the nodes in your cluster: there is a subset of nodes in the cluster needed before deploying Luna (so it can run somewhere). A minimum of one node not managed by Luna is needed in a cluster to which Luna can be deployed. Luna will only scale down nodes that it started and manages.
Luna is only interested in the nodes labeled with:
Luna adds these labels to all nodes which it starts.