Troubleshooting
Luna manager and webhook are down
In case when luna-manager and luna webhook pods have not ready containers, it is likely that there was an issue in the manager. If manager pod is not running, Luna automatically marks webhook pod(s) as not ready. You may notice it by listing pods in Luna namespace (elotl
by default):
$ kubectl get pods -n elotl
elotl-luna-manager-d495f9f96-v7zng 1/2 Running 0 115d
elotl-luna-webhook-d495f9f96-z5vmh 1/2 Running 0 115d
elotl-luna-webhook-d495f9f96-w3prk 1/2 Running 0 115d
This is to avoid situation when Luna webhook adds nodeSelector, which cannot be satisfied because Luna manager is down. This way, pods are not mutated with nodeSelector, so they can still run on any node in the cluster.
To find out what's the root cause, check luna manager logs:
$ kubectl logs -n elotl -l app.kubernetes.io/component=luna-manager
Node is idle but not scaled down
There may be a cases when node utilization is low, pod that was meant to run on this node is already deleted, but the node is still in the cluster. In this case, you might see a log message in luna manager specifying which pod(s) are blocking node removal:
Node X cannot be removed because pod Y on this node is blocking the removal due to <Reason>
where reason is explanation why pod cannot be moved to another node.
Another reason could be not enough capacity on the other nodes to run pods running on the node. In this case, log message will say
Node X cannot be removed because these pods can't be scheduled on another of N node choices: pod-A, pod-B, ...
It is on cluster operator to move those pods manually and unlock node removal.
My pods are in Pending state for too long
Luna adds nodeSelector to the pods under management, and then adds a node with a labels matching this node selector. Before the node(s) are added to the cluster and ready, the pod(s) will remain in a Pending state. For bin-selected nodes, it usually it takes less than 5 minutes for pod(s) to transition from creation to Running state. This may vary between cloud providers. For pod requesting GPU it may take a bit longer - usually before Kubernetes marks node as ready and schedulable, there is a DaemonSet pod which needs to install GPU drivers. It's common for those pods to use heavy docker images. If you're seeing that one or more pods are in Pending state for longer than expected, here's a debugging checklist:
Are Pending pods labeled with Luna labels?
All pods managed by Luna are first labeled by luna-webhook with the pod.elotl.co/managed-by
key,
whose value includes the Luna instance name and the pod placement strategy (bin-pack vs bin-sel).
Luna-webhook adds this label to all pods designated to be luna-managed as per the relevant helm chart value, which is by default elotl-luna=true
.
If Pending pods are controlled by Deployment, StatefulSet, Job, or any other pod controller, make sure labels are set on Pod template, not on the Deployment etc.
Luna labels are missing
It is likely that luna-webhook is having issues. Check out debugging luna webhook tips
The pod is annotated with pod.elotl.co/ignore: true
This annotation directs Luna to ignore the pod. It is applied to pods that exceeded the configured maximum retry limit. This means that Luna provisioned a number of nodes to get the pod running, but no node(s) could run the pod. This failure typically stems from misconfigurations of either the pod or the node. To prevent Luna from continuously attempting to schedule such pods in a pending state, they are tagged with pod.elotl.co/ignore: true
.
Do pending pods have nodeSelector set?
Luna webhook also sets a nodeSelector. In case of bin-packing, nodeSelector is set to fixed value:
node.elotl.co/destination: bin-packing
In case of bin-selection, each pod will get node.elotl.co/destination: <unique-value>
.
NodeSelector not set
If nodeSelector is set, it may indicate problems with Luna webhook. Check out debugging luna webhook tips
NodeSelector and labels set properly
It is likely an issue with luna-manager. Checking luna manager logs may give you a better understanding of what went wrong.
Pod is Pending, but no new node is created
If your pod is stuck in Pending state and has the nodeSelector added by luna webhook, but there isn't any newly added node which matches the node selector, it is possible that Luna couldn't find an instance type which would satisfy Pod's resource requests. In this case, you should see similar log message in luna-manager logs:
unable to find node type for pod my-pod
This may happen in two cases:
- when there is no matching instance type satisfying pod's resource request combined with Luna's node type inclusions / exclusions (specified in pod's annotation, see Luna configuration). In this case it's a configuration error on user side.
- When matching instance type exists, but it isn't available in the cloud at this moment. In this case, it's a transient error.
On GKE, there is also a third case, when the matching instance type is found, but it cannot start because the quota limit on given CPU/GPU type was reached in the region. In that case, you will see similar log message in luna manager:
rpc error: code = PermissionDenied desc = Insufficient regional quota to satisfy request: resource "N2D_CPUS": request requires '4.0' and is short '4.0'. project has a quota of '0.0' with '0.0' available. View and manage quotas at https://console.cloud.google.com/iam-admin/quotas?usage=USED&project=youre-project.
Pod is Pending, even after new node is created
If your pod is stuck in Pending state, even though luna webhook has added the expected nodeSelector and luna manager has added a new node with that nodeSelector, and if you have enabled placeNodeSelector, check if the pod has additional nodeSelectors that are not satisfied by the luna-added node, and if so, add associated pod or luna configurations to impact luna node selection.
Luna webhook debugging
You can verify it's status with kubectl:
$ kubectl get deployment -n elotl
If luna webhook pods are up and running, you can check whether you pod (e.g. named my-pod
) was processed or skipped by webhook:
$ kubectl logs -n elotl deployment/elotl-luna-webhook | grep my-pod
If pod was skipped, you can see one of following log messages:
skipping pod my-pod because it doesn't match labels
or
skipping pod my-pod because it's a part of daemonset
If pod was processed, you should see something similar to:
handler.go:69] AdmissionReview Kind="/v1, Kind=Pod"
Namespace="my-namespace" Name="my-pod"/"" UID=57c7b78a-4545-42b8-91cb-9d9519c4a16f ...
AdmissionResponse=[{add /metadata/labels map[app:my-app elotl-systest:true pod.elotl.co/managed-by:elotl-luna-bin-sel]} {add /metadata/annotations map[node.elotl.co/node-pool-name:anne-regional-0443af23]} {add /spec/nodeSelector map[node.elotl.co/destination:0625973540358a61add307b0690afe0c]}]
Daemonset pods are reported as pending/running/terminating on nodes that have been scaled down
In some cases, daemonset pods can get stuck in pending/running/terminating state after their target nodes are scaled down, e.g., this has been observed wrt to telegraf-ds pods running on Azure clusters using CNI networking. Luna runs a pod garbage collection pass in its processing loop to detect and force the deletion of orphaned daemonset pods.
Pods are reported in status failed for reason NodeAffinity with the message "Pod Predicate NodeAffinity failed"
If the Kube Scheduler places a pod with a host-label NodeSelector on a newly-added host it expects to have the label, while that host's kubelet has not yet reconciled all its host labels, the kubelet will mark the pod as failed for reason NodeAffinity with the message "Pod Predicate NodeAffinity failed". If there's a controller recreating the pod, then the recreated pod will be successfully started on the node once the kubelet fully reconciles its node labels. However, the failed pod remains in the inventory. Luna runs a pod garbage collection pass in its processing loop to detect and force the deletion of failed NodeAffinity pods.
Luna-allocated nodes cannot be removed after Luna is uninstalled
When scaleDown.managedNodeDelete is enabled (default), Luna adds a finalizer to each node it allocates, to allow it to orchestrate an orderly node shutdown if the node is deleted. This finalizer remains on Luna-allocated nodes if Luna is uninstalled. To remove this finalizer on all affected nodes, run the following command:
kubectl get nodes -ojsonpath='{range .items[*].metadata}{@.name}:{@.finalizers}{"\n"}' | grep "node.elotl.co/graceful-termination" | cut -d ':' -f 1 | xargs kubectl patch node --type='json' -p='[{"op": "remove", "path": "/metadata/finalizers"}]'
Luna created a new version of my bin-packing node pool
When Luna prepares to create node(s) for pending bin-packing pod(s) on GKE or AKS (which use node pools), Luna checks that the current bin-packing node pool configuration matches that needed for the pending pod(s). If the configuration does not match, Luna creates a new version of the bin-packing node pool to be used for new pods. You can check the Luna manager logs to see what configuration change led to Luna creating a new version of the bin-packing node pool.