Luna Configuration
Pod configuration
In order for Luna Manager to manage a pod's scheduling, the pod configuration must include a label or annotation that matches Luna's configured pod designation setting. By default, Luna's setting specifies that the following label is applied:
metadata:
labels:
elotl-luna: "true"
You can change the list of labels Luna will consider with the labels
Helm value:
--set labels='key1=value1,key2=value2'
You can change the list of annotations Luna will consider with the podAnnotations
Helm value:
--set podAnnotations='key1=value1,key2=value2'
To prevent Luna from matching a given pod annotate it with pod.elotl.co/ignore: true
.
Instance family configuration
Bin selection
To avoid a given instance family, annotate the pod like this:
metadata:
annotations:
node.elotl.co/instance-family-exclusions: "t3,t3a"
In the example above Luna won’t start any t3 or t3a instance type for the pod.
To use a given instance family, annotate the pod like this:
metadata:
annotations:
node.elotl.co/instance-family-inclusions: "c6g,c6gd,c6gn,g5g"
In the example above Luna will choose an instance type from the c6g, c6gd, c6gn, or g5g instance families for the pod.
To specify the instance type, you can utilize a regular expression. For intance, if you'd like to specify the instance type to be r6a.xlarge, annotate the pod like this:
metadata:
annotations:
node.elotl.co/instance-type-regexp: "^r6a.xlarge$"
In the example above, Luna will only consider the r6a.xlarge instance type.
You can combine the instance-type and instance-family annotations like this:
metadata:
annotations:
"node.elotl.co/instance-type-regexp": "^*.xlarge$",
"node.elotl.co/instance-family-exclusions": "r6a",
In the example above, Luna will exclusively consider instance types ending with ".xlarge" and exclude types from the r6a family.
If any of these annotations are present, Luna will schedule the pods on nodes that fulfill all these constraints as well as the resource requirements of the pods. However, if the instance type constraints and the pod's resource requirements are incompatible, no node will be added and the pod will be stuck in the pending state.
Bin packing
Bin packing instance family and type can be configured via the
global option binPackingNodeTypeRegexp
. Only the instances matching the regular expression will be considered.
For example if you would like to use t3a nodes in AWS, you would set: binPackingNodeTypeRegexp='^t3a\..*$'
.
Removal of Under-utilized nodes and possible pod eviction
Luna is designed to remove under-utilized nodes. A node that is running no Luna-managed pods is under-utilized. Additionally, in the case of bin-packing, a node is considered under-utilized if its Luna-managed pods' total resource requests are below scaleDown.binPackNodeUtilizationThreshold
, set to 10% by default. If a node has been under-utilized for longer than scaleDown.nodeUnneededDuration
, set to 5 minutes by default, and if all Luna-managed pods running on it can be placed on another node, Luna will evict the pods running on the node and remove the node.
To avoid Luna evicting a pod running on an under-utilized node, the pod must be annotated with pod.elotl.co/do-not-evict: true
as shown below:
apiVersion: v1
kind: Pod
metadata:
name: my-pod
annotations:
pod.elotl.co/do-not-evict: "true"
spec:
...
The annotation cluster-autoscaler.kubernetes.io/safe-to-evict: false
is also supported.
Note that if Luna-managed bin-packing pods have no resource settings or if their resource settings are inaccurately very low, Luna's detection of under-utilized bin-packing nodes will be wrong. In this case, scaleDown.binPackNodeUtilizationThreshold
should be set to 0.0 to avoid Luna evicting pods from bin-packing nodes incorrectly categorized as under-utilized. Please see the next section for more information relevant to such pods.
Management of Over-utilized nodes and possible pod eviction
Luna allocates node resources for pods based on the pods' resource settings. If Luna-managed pods have no resource settings or if their settings are inaccurately too low, Luna-allocated nodes may become over-utilized, causing performance problems.
Luna can be configured to use Kubernetes metrics server data to monitor the CPU and memory utilization of Luna-allocated nodes, and to take action to avoid or reduce high CPU or memory utilization. If the Luna option manageHighUtilization.enabled
(default false) is set true, Luna uses metrics server node and pod CPU and memory utilization data as described below.
When a node's CPU utilization >= manageHighUtilization.yellowCPU
(default 60) or its memory utilization >= manageHighUtilization.yellowMemory
(default 65), Luna adds a taint to the node to prevent the kube scheduler from scheduling more pods on the node. This avoids CPU or memory over-utilization.
When a node's CPU utilization >= manageHighUtilization.redCPU
(default 80) or its memory utilization >= manageHighUtilization.redMemory
(default 85), Luna performs an eviction of the highest CPU- or memory-demand Luna-scheduled pod that meets the same pod eviction restrictions applied for scale-down.
When a node's CPU utilization < manageHighUtilization.greenCPU
(default 10) and its memory utilization is < manageHighUtilization.greenMemory
(default 15) and the node has a high utilization taint, that taint is removed from the node. This allows nodes that no longer have high CPU or memory utilization to again host additional pods.
Note that if Luna-managed bin-packing pods have no resource settings or if their resource settings are inaccurately very low, Luna's detection of under-utilized bin-packing nodes will be wrong. Please see the previous section for more information relevant to such pods.
GPU SKU annotation
To instruct Luna to start an instance with a specific graphic card:
metadata:
annotations:
node.elotl.co/instance-gpu-skus: “v100”
This will start a node with a V100 GPU card.
Each pod with this annotation will be bin-selected, regardless of the pod’s resource requirements.
Advanced configuration via Helm Values
This is a list of the configuration options for Luna. These values can be passed to Helm when deploying Luna.
The keys and values are passed to the deploy script as follows:
./deploy.sh <cluster-name> <cluster-region> \
--set binSelectPodCpuThreshold=3.0 \
--set binSelectPodMemoryThreshold=2Gi \
--set binSelectPodGPUThreshold=1 \
--set binPackingNodeCpu=3250m \
--set binPackingNodeMemory=7Gi \
--set binPackingNodeMinPodCount=42 \
--set binPackingNodeTypeRegexp='^t3a.*$' \
--set binPackingNodePricing='spot,on-demand' \
--set labels='key1=value1,key2=value2'
These configuration options can be modified in the configuration map
elotl-luna
located in the namespace where Luna manager runs. Once the
configuration map has been modified Luna manager and its admission webhook
must be restarted for the new configuration to be used.
$ kubectl -n elotl rollout restart deploy/elotl-luna-manager
...
$ kubectl -n elotl rollout restart deploy/elotl-luna-webhook
...
labels
Specify the labels that Luna will use to match the pods to consider.
labels is a list of comma separated key value pairs:
key1=value1\,key2=value2; pods with any of the labels will be considered by
Luna. The default value is elotl-luna=true
.
--set labels='key1=value1\,key2=value2'
podAnnotations
Specify the annotations that Luna will use to match the pods to consider.
Similar to labels, podAnnotations is a list of comma separated key value pairs: key1=value1\,key2=value2; pods with any of the annotations will be considered by Luna. podAnnotations is empty by default.
--set podAnnotations='key1=value1\,key2=value2'
pod.elotl.co/ignore: true
This annotation instructs Luna to ignore a given pod even if it matches labels
or podAnnotations
.
It's important to note that ignored pods may still be scheduled on Luna-managed nodes, unless these nodes have a specific taint configured. Ignored pods don't have a node selector, so the Kubernetes scheduler will assign them to any available node. If Luna nodes don't have a taint set up, pods that aren't handled by Luna might be scheduled there.
To prevent pods that aren’t managed by Luna from running on Luna-managed nodes, you can utilize node and pod affinity configuration. Node affinity allows you to specify rules that restrict which nodes a pod can be scheduled on, while pod affinity enables you to define rules for co-locating or spreading pods across nodes based on labels.
By combining taints, tolerations, and affinity rules, you can have finer control over pod scheduling and ensure that ignored pods are not inadvertently scheduled on Luna-managed nodes.
loopPeriod
How often the Luna main loop runs, by default 10 seconds. Increasing this value will ease the load on the Kubernetes control plane, while lowering it will intensify the load on the Kubernetes control plane.
--set loopPeriod=20s
daemonSetSelector
daemonSetSelector
is a label selector for the daemon sets that will run on the Luna nodes.
Luna cannot predict in advance which daemon sets will run on a given node. Since the conditions for daemon sets are dynamic, Luna must estimate which ones will end up on the node, potentially impacting cost optimization.
The daemonSetSelector
configuration option allows you to specify the daemon sets Luna should consider in its capacity calculations.
By default, this option is empty, meaning all daemon sets are selected.
For example, to have Luna only consider the impact of the GPU driver daemon set, you can specify:
--set daemonSetSelector=name=nvidia-device-plugin-ds
daemonSetExclude
daemonSetExclude
is a comma-separated list of daemon set names that you want to exclude from Luna's list of active daemon sets for newly added nodes.
It is empty by default.
After selecting daemon sets using daemonSetSelector
, the sets are further filtered based on the daemonSetExclude
list.
Use this option to prevent Luna from reserving resources for daemon sets you do not expect to be active on new nodes. For example, if you are running Luna on a GKE cluster and only plan to use the --logging-variant=DEFAULT
, you might exclude the unused daemon sets as follows:
--set daemonSetExclude="fluentbit-gke-256pd\,fluentbit-gke-max\,gke-metrics-agent-scaling-500"
This option may be used along with daemonSetExcludeDesired0.
daemonSetExcludeDesired0
daemonSetExcludeDesired0
is a boolean that you set true if you want to exclude daemonsets that currently have a Desired count of 0 from Luna's list of active daemon sets for newly added nodes.
It is false by default.
After selecting daemon sets using daemonSetSelector
, if daemonSetExcludeDesired0 is true, the sets are further filtered by those that have a Desired count of 0.
Use this option to prevent Luna from reserving resources for daemon sets that are not active on current nodes and that you do not expect to be active on new nodes.
--set daemonSetExcludeDesired0=true
This option may be used along with daemonSetExclude.
newPodScaleUpDelay
Age of the pod to be considered for scaling up nodes. It is set to 10 seconds by default.
Because pod creation may be scattered, it isn’t desirable for Luna to immediately react to pod creation. Lowering this delay may result in less efficient packing, while increasing it will delay the creation of the nodes and increase the mean time to placement of pods.
--set newPodScaleUpDelay=5s
scaleUpTimeout
Time to allow for the new node to be added and the pending pod to be scheduled before considering the scale up operation expired and subject to retry. It is set to 10 minutes by default. This value can be tuned for the target cloud.
includeArmInstance
Whether to consider Arm instance types. It is set to false by default.
If this option is enabled, all the images of the pods run by Luna must support both the AMD64 and ARM64 architecture. Otherwise pod creation may fail.
placeBoundPVC
Whether to consider pods with bound PVC. It is set to false by default.
placeNodeSelector
Whether to consider pods with existing node selector(s). It is set to false by default. When set to true, a pod's existing node selector(s) must be satisfiable by the Luna and pod settings; otherwise, Luna may allocate a node that cannot be used by the pod.
namespacesExclude
List of comma-separated names of namespaces whose pods should be excluded from Luna management. It is set to kube-system only by default. For example, to run with no namespace restrictions on Luna management, use:
--set namespacesExclude={}
To add the namespace test
to the exclusion list specify:
--set namespacesExclude='{kube-system,test}'
Note that if the kube-system namespace is not part of the namespacesExclude list, Luna can spin up additional nodes for kube-system pods marked for luna placement that are in the Pending state for too long.
reuseBinSelectNodes
Whether to reuse nodes for similar bin-select placed pods. It is set to true by default.
skipIgnoredPods
Whether to add a node selector to pods not labeled for placement by Luna or to skip adding a node selector to such pods. It is set to false by default.
By default, the Luna webhook sets a node selector for each non-daemonset pod placement request it examines. If a pod is labeled for placement by Luna, its node selector is set to point to a Luna-created node. If a pod is not labeled for placement by Luna, its node selector is set to exclude any Luna-created node; the latter setting is skipped if skipIgnoredPods is set true.
prometheusListenPort
The port number on which Luna manager and webhook will expose their prometheus metrics. It is 9090 by default.
clusterGPULimit: 10
The maximum number of GPUs to run in the cluster. It is set to 10 by default.
clusterGPULimit specifies the GPU limit of the cluster; if gpu count in the cluster reaches this number, luna will stop scaling up GPU nodes.
nvidiaGPUTimeSlices
The number of GPU time-slices for NVIDIA GPUs in cluster. It is set to 1 by default. When its value is greater than 1, Luna treats GPUs in cloud instances as N copies of themselves with respect to scheduling GPU resource requests. This value must match the NVIDIA GPU time slices setting for GPU nodes in the cluster for Luna GPU allocation to operate consistently with that setting.
On AKS, EKS, and OKE clusters, the NVIDIA time-slices setting is transparent to the cluster control plane and GPU workloads running in the cluster. The number of NVIDIA GPU time-slices can be set when installing the nvidia-device-plugin helm chart. The time-slices setting will automatically be configured for all NVIDIA GPUs in the cluster, and cluster nodes will use that value when they report their GPU capacity. GPU workloads transparently get a slice for each GPU resource they request.
On GKE clusters, the NVIDIA time-slices setting is visible to the cluster control plane and to GPU workloads running in the cluster. Luna configures the GPU slice count in the GKE node pool used for GPU node allocation. Note that GPU pods running on GKE clusters with time-sliced GPUs must include nodeSelectors indicating the workload can use time-shared GPUs and specifying the max clients-per-gpu value allowed. And the GPU pods running on time-sliced GPUs cannot specify a nvidia.com/gpu resource limit value greater than 1. Please see the associated GCP documentation for more details.
binSelectMaxPodsPerNode & binPackingMaxPodsPerNode
These configuration options control the maximum number of pods that can run on each node. Setting lower values can minimize the use of network resources like interfaces or IP addresses on the nodes.
AWS
When binSelectMaxPodsPerNode
or binPackingMaxPodsPerNode
is set to 0 (the default), Luna uses the AWS defined ENI limit as maximum pods per node value. By default, Luna does not explicitly set the maximum number of pods on the nodes. If you set a value greater than 0, Luna will set the specified maximum number of pods on the nodes.
For nodes with up to 30 VCPUs, the maximum number of pods per node is capped at 110. For nodes with more than 30 VCPUs, the maximum increases to 250.
GCP
When binSelectMaxPodsPerNode
or binPackingMaxPodsPerNode
is set to 0 (the default), Luna defaults to a limit of 110 pods per node. These values must be between 8 and 256; otherwise, the API will produce an error, and nodes will not be created.
Azure
When binSelectMaxPodsPerNode
or binPackingMaxPodsPerNode
is set to 0 (the default), Luna uses the node’s default max pods per node. For clusters using Kubenet networking, this default is 110 pods per node; for clusters using CNI networking, it is 250 pods per node. The maximum value either can be set to is 250.
OCI
When binSelectMaxPodsPerNode
or binPackingMaxPodsPerNode
is set to 0 (the default), Luna defaults to a limit of 110 pods per node.
This option only applies to OKE clusters that use OCI_VCN_IP_NATIVE networking, and indicates how Luna should set max pods per node on nodes it allocates. If MaxPodsPerNode is 0 (default), Luna sets max pods per node to the maximum supported by compute shape vNICs. If MaxPodsPerNode is greater than 0, Luna sets max pods per node to min(MaxPodsPerNode, maximum supported by compute shape vNICs).
nodeLabels
Labels to add to the nodes. It is a mapping of key-value. It is empty by default.
For example, to add a label foo=bar
to your nodes, use the following flag:
--set nodeLabels.foo=bar
Note that if you need to include dots in the label’s key you will have to escape them with \
:
--set nodeLabels.my\.label\.example=value
nodeTags
Tags to add to the cloud instances. It is a mapping of key-value. It is empty by default.
This can be useful to track and clean-up stale cloud instances. For instance, to add tags key1=value1
and key2=value2
, use:
--set nodeTags.key1=value1
--set nodeTags.key2=value2
Note that the nodeTags
option is not supported on GKE.
nodeTaints
To add taints to the nodes created by Luna, use the taints
configuration option:
--set nodeTaints='{key1=value1:NoSchedule,key2=value2:NoExecute}'
Note that the nodeTaints
option is not supported under Oracle Container Engine for Kubernetes (OKE).
loggingVerbosity
How verbose Luna manager and webhook are. It is set to 2 by default.
0 critical, 1 important, 2 informational, 3 debug
scaleDown.nodeUnneededDuration
If a node remains idle for longer than nodeUnneededDuration
, Luna manager will scale it down. Default: 5m.
--set scaleDown.nodeUnneededDuration=1m
scaleDown.skipNodeWithSystemPods
Determines whether to skip nodes running pods from the kube-system namespace. Daemonset pods are never considered by Luna; this only applies to deployment pods. Default: false.
scaleDown.skipNodesWithLocalStorage
When true, Luna manager will never scale down nodes with local storage attached to a pod. Default: true.
scaleDown.skipEvictDaemonSetPods
When true, Luna manager will skip evicting daemonset pods from nodes removed for scale down. Default: false.
scaleDown.minReplicaCount
The minimum replica count ensures that the specified number of replicas are always available during node scale-down. Default: 0.
scaleDown.binPackNodeUtilizationThreshold
Defines the utilization threshold to scale down bin-packed nodes, ranging from 0.0 (0% utilization) to 1.0 (100% utilization). Default: 0.1 (10%).
Note that the Helm option --set
cannot parse floating point numbers. Use --set-json
to define scaleDown.binPackNodeUtilizationThreshold.
scaleDown.minNodeCountPerZone
For clusters supporting zone spread (currently only EKS clusters and GKE regional clusters), indicates the minimum number of nodes (0 or 1) that Luna should keep running per zone in target pools into which zone spread pods may be placed. This minimum is maintained even when no normal (not daemonset or mirror) Luna pods are currently running in the pool. Default: 0. Note that EKS does not support setting this value to 1.
In general, Luna keeps a minimum of 1 node per zone in node pools that may be used for zone spread, to ensure kube-scheduler can see all the zones in its target node set and hence can make the desired zone spread choices. Setting scaleDown.minNodeCountPerZone to 1 to maintain a min of 1 node per zone even when the associated count of normal (not daemonset or mirror) Luna pods is 0 avoids a possible race where kube-scheduler sees zone-spread pods arrive for scheduling when some but not all of a node pool's per-zone nodes have scaled down.
scaleDown.nodeTTL
When > 0, enables Luna support for node time-to-live. When scaleDown.nodeTTL is set to a non-zero value, it must be set to a value greater than or equal to scaleUpTimeout. If scaleDown.nodeTTL is less than scaleUpTimeout, Luna will set it to scaleUpTimeout internally and will emit a warning in the logs. Default: 0m (time-to-live unlimited).
When scaleDown.nodeTTL is set to a non-zero value, Luna uses the value as a time-to-live for its allocated nodes; Luna cordons, drains, and terminates its allocated nodes once they have been running longer than the specified scaleDown.nodeTTL time.
If a nodeTTL-expired node contains any pods with do-not-evict annotatations (i.e., pod.elotl.co/do-not-evict:true or cluster-autoscaler.kubernetes.io/safe-to-evict:false), Luna supports the node's graceful termination by cordoning it, draining its non-kube-system non-daemonset pods except the do-not-evict pods, and then adding the configurable annotation scaleDown.drainedAnnotation to it. An external controller monitoring nodes for that annotation can perform eviction-related operations with respect to the do-not-evict pods and then remove the their do-not-evict annotation. Once a nodeTTL-expired node contains no do-not-evict pods, Luna terminates the node.
scaleDown.managedNodeDelete
Set true to enable Luna support for graceful termination of nodes that are externally-deleted (e.g., "kubectl delete node/node-name"). Default: true.
When scaleDown.managedNodeDelete is set true, Luna adds a finalizer to its allocated nodes, allowing Luna to detect external deletion operations on those nodes. When Luna detects external deletion of an allocated node, if that node contains any do-not-evict pods, Luna performs the graceful termination steps outlined in scaleDown.nodeTTL. Once an externally-deleted Luna-allocated node contains no do-not-evict pods, Luna removes its finalizer from blocking the K8s node deletion and deletes the node from the cloud.
Note that if scaleDown.managedNodeDelete is set, the deletion of Luna-allocated nodes requires the removal of the Luna finalizer; hence, if Luna is disabled with some of its allocated nodes remaining and you later want to remove those nodes, you will need to manually remove the finalizer.
scaleDown.drainedAnnotation
Annotation used during graceful node termination; see scaleDown.nodeTTL or scaleDown.managedNodeDelete. Default: key: node.elotl.co/drained; value: true.
Pod retry
Luna cannot guarantee that a pod will run on one of its node, the node and pod have to be properly configured. If a pod is still in the pending state once the requested node is online, Luna will retry after configurable delay, up to a configurable number of times.
How pod retry works:
- A new pod is created, the Luna webhook matches it, and a new node is provisioned by Luna manager.
- Luna manager waits for the node to come online or wait until
scaleUpTimeout
has passed, whichever happens first. - Once the node is online or the request has timed out, Luna checks the pod’s status after
podRetryPeriod
elapsed. - If the pod is still in the pending state we have two cases:
- The pod has been retried less than
maxPodRetries
times, the annotationpod.elotl.co/retry-count
is added or incremented to the pod, and the pod will be retries afterpodRetryPeriod
. - The pod has been retried
maxPodRetries
times, the annotationpod.elotl.co/ignore: true
is added to the pod. The pod will now be ignored by Luna until the annotation is removed.
- The pod has been retried less than
maxPodRetries
Sets the maximum retry attempts for a pod. Each retry increments the annotation pod.elotl.co/retry-count
on the pod. Once this limit is exceeded, the pod is annotated with pod.elotl.co/ignore: true
, indicating Luna should ignore the pod until the annotation is removed.
Default: 3
podRetryPeriod
Determines the delay before Luna retries deploying a pod that remains in the pending state, even after its node is available. This period must allow adequate time for Kubernetes to schedule the pod, otherwise Luna may create unnecessary node(s) temporarily.
Default: 5 minutes
Bin-selection
Bin-selection is a process where Luna provisions a dedicated node to run a pod with high resource requirements.
When a pod’s resource needs exceed certain thresholds, Luna automatically allocates a dedicated node for that pod. This process involves determining the optimal node configuration based on the pod’s requirements, adding a new node to the cluster, and scheduling the pod to run on this dedicated node.
Bin-selection is triggered when a pod’s resource requirements meet or exceed any of the following thresholds:
- CPU:
binSelectPodCpuThreshold
- Memory:
binSelectPodMemoryThreshold
- GPU:
binSelectPodGPUThreshold
Bin-packing
Bin-packing means running the pod with other pods on a shared node.
binPackingNodeCpu
, binPackingNodeMemory
, and binPackingNodeGPU
let you configure the shared nodes’ requirement. If you have an instance type in mind, set these parameters slightly below the node type you are targeting, to take into account the kubelet and kube-proxy overhead. For example if you would like to have non-GPU nodes with 8 VCPU and 32 GB of memory, set binPackingNodeCpu
to "7.5" and binPackingNodeMemory
to "28G".
Bin-selection thresholds must be lower than the bin-packing node requirements. Otherwise the system will log a warning, and any bin-packing node requirements that are too low will be increased to match the corresponding bin-selection thresholds.
Each node type can only run a limited number of pods.
binPackingNodeMinPodCount
lets you request a node that can support a minimum
number of pods.
binPackingNodeTypeRegexp
allows you to limit the instances that will be
considered. For example if you would only like to run instances from "t3a"
family in AWS you would do: binPackingNodeTypeRegexp='^t3a\..*$'
binPackingMinimumNodeCount
allows you to specify the minimum number of bin packed nodes. The nodes will be started immediately and will stay online even if no pods are running on them.
Spot and on-demand pricing
Spot pricing is a cloud pricing model where providers offer unused compute capacity at significantly discounted rates. Users can access these resources at lower costs, but instances may be reclaimed with short notice when demand increases.
To specify whether to use on-demand or spot pricing for bin-selected nodes, you can add the node.elotl.co/instance-offerings
annotation to the pod’s definition. This annotation allows you to choose between different pricing options:
Spot pricing: To run nodes exclusively on spot instances, use:
node.elotl.co/instance-offerings: "spot"
On-demand pricing: For regular pricing model, use:
node.elotl.co/instance-offerings: "on-demand"
To use spot instances when possible and fall back to on-demand if spot isn’t available, use:
node.elotl.co/instance-offerings: "spot,on-demand"
Here’s an example of a pod definition utilizing bin-selection with spot pricing:
apiVersion: v1
kind: Pod
metadata:
name: high-resource-pod
annotations:
node.elotl.co/instance-offerings: "spot"
spec:
...
Bin-packed nodes also support spot pricing. The configuration option binPackingNodePricing
allows you to indicate the price offerings category for the instances that will be considered. For example if you would only like to run instances from the "spot" category:
binPackingNodePricing: spot
Spot pricing is supported on EKS, AKS, GKE, and OKE.
Special consideration for using Spot on Azure AKS
AKS nodes with spot pricing have a taint automatically applied to them. This means pods running on Spot nodes in AKS must have a toleration set in order to be scheduled and run on the nodes with Spot pricing.
In order to get the pods running on the spot nodes the operator must add a toleration corresponding to the kubernetes.azure.com/scalesetpriority=spot:NoSchedule
taint.
spec:
containers:
- name: spot-example
tolerations:
- key: "kubernetes.azure.com/scalesetpriority"
operator: "Equal"
value: "spot"
effect: "NoSchedule"
Special consideration for using Spot on OCI OKE
OKE nodes with spot pricing have a taint automatically applied to them. This means pods running on Spot nodes in OKE (called preemptible instances) must have a toleration set in order to be scheduled and run on the nodes with Spot pricing.
In order to get the pods running on the spot nodes the operator must add a toleration corresponding to the oci.oraclecloud.com/oke-is-preemptible
taint.
spec:
containers:
- name: spot-example
tolerations:
- key: oci.oraclecloud.com/oke-is-preemptible
operator: Exists
effect: "NoSchedule"
Spot interruption message option on AWS EKS
The user can set up an AWS SQS queue to receive Spot interruption messages, delivered two minutes before termination, and can provide that queue name to Luna via the AWS option spotSqsQueueName. When Luna receives a Spot termination message, it marks the node with node.elotl.co/spot-event: termination
. Nodes with this annotation are targeted in Luna's scale down selection.
Luna’s own deployment and pod configuration
Annotations, tolerations, and affinity
Use the Helm value annotations
to add custom annotations to Luna manager and webhook deployments:
$ helm install ... --set annotations.foo=bar --set annotations.hello=word ...
To add custom tolerations to Luna’s own pods use the configuration option tolerations
.
The tolerations specification is rather complex, therefore we recommend you define it in a Helm values file and pass its filename with the -f
or --values
options:
$ cat tolerations.yaml
tolerations:
- key: "foo"
value: "bar"
operator: "Equal"
effect: "NoSchedule"
$ helm install ... --values tolerations.yaml ...
To add custom affinity to Luna’s own pods use the configuration option affinity
.
The affinity specification is rather complex, therefore we recommend you define it in a Helm values file and pass its filename with the -f
or --values
options:
$ cat affinity.yaml
# Helm values
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- us-central1-f
$ helm install ... --values affinity.yaml ...
Note that setting the affinity parameter will override the default affinity which prevent Luna pods from running on Luna managed nodes:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node.elotl.co/managed-by
operator: DoesNotExist
Add this snippet to your own affinity definition to prevent Luna pods from running on Luna managed nodes.
Webhook port
You can change the port of the mutation webhook with webhookPort
configuration option:
$ helm install ... --set webhookPort=8999 ...