AKS
Luna AKS Testing Tutorials
A few very basic test examples can be used to validate the functionality and operation of Luna.
ML Use-Cases /w NVIDIA gpu
By default, Luna will autoscale pods with the label elotl-luna=true. Kindly execute the following command kubectl apply -f
followed by the provided YAML file.
You will observe a pod running and completing its execution on a Standard_NC4as_T4_v3 instance (which has nvidia t4 GPU)..
Upon completion of the pod, the corresponding node will be automatically terminated.
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: gpu-test
annotations:
node.elotl.co/instance-gpu-skus: "t4"
labels:
elotl-luna: "true"
spec:
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
image: "k8s.gcr.io/cuda-vector-add:v0.1"
resources:
limits:
nvidia.com/gpu: 1
EOF
You can monitor the creation of the new pod and node by running the command watch kubectl get pods,nodes
. The test pod will only be active for a brief period after it starts. The GPU node that was added to support the pod will persist for a few more minutes.
To confirm the presence of a GPU on the node, you can run the kubectl describe node
command and look for the "nvidia.com/gpu" entry or alternatively, you can run the following command:
kubectl get nodes "-o=custom-columns=NAME:.metadata.name,KUBELET STATUS:.status.conditions[3].reason,CREATED:metadata.creationTimestamp,VERSION:.status.nodeInfo.kubeletVersion,NVIDIA GPU(s):.status.allocatable.nvidia\.com/gpu"
General Testing (non-ML)
Luna will attempt to consolidate multiple smaller pods onto a newly deployed node, or in the case of larger pods, it will allocate a dedicated node for the pod, as occurs with the bin-selection packing mode.
You can perform simple testing using busybox or other pods of varying sizes. The following YAML files can be utilized, and the number of pods can be adjusted to observe Luna's dynamic response.
Small busybox deployment
Several busybox pods will be co-located within a single node
busybox
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: busybox
spec:
replicas: 6
selector:
matchLabels:
app: busybox
template:
metadata:
labels:
app: busybox
elotl-luna: "true"
spec:
containers:
- name: busybox
image: busybox
resources:
requests:
cpu: 200m
memory: 128Mi
limits:
cpu: 300m
memory: 256Mi
command:
- sleep
- "infinity"
EOF
Larger busybox deployment
A single busybox pod will hit the threshold for bin-selection and be located on its own node
busybox-large
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: busybox-large
spec:
replicas: 1
selector:
matchLabels:
app: busybox-large
template:
metadata:
labels:
app: busybox-large
elotl-luna: "true"
spec:
containers:
- name: busybox-large
image: busybox
resources:
requests:
cpu: 4
memory: 256Mi
limits:
cpu: 6
memory: 512Mi
command:
- sleep
- "infinity"
EOF
Check that the pods have started, the -o wide opton will show which does the pods are running on:
kubectl get pods -l elotl-luna=true -o wide
Sample output
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
busybox-65cb45c86b-9s2hq 1/1 Running 0 11m 10.244.10.3 aks-b4ms-32307050-vmss000000 <none> <none>
busybox-65cb45c86b-bls4r 1/1 Running 0 11m 10.244.10.7 aks-b4ms-32307050-vmss000000 <none> <none>
busybox-65cb45c86b-d9z9g 1/1 Running 0 11m 10.244.10.8 aks-b4ms-32307050-vmss000000 <none> <none>
busybox-65cb45c86b-dnkq2 1/1 Running 0 11m 10.244.10.9 aks-b4ms-32307050-vmss000000 <none> <none>
busybox-65cb45c86b-ggpgr 1/1 Running 0 11m 10.244.10.4 aks-b4ms-32307050-vmss000000 <none> <none>
busybox-65cb45c86b-gp8c6 1/1 Running 0 11m 10.244.10.5 aks-b4ms-32307050-vmss000000 <none> <none>
busybox-large-789c68dc46-p258f 1/1 Running 0 5m21s 10.244.11.3 aks-pd1c22cc5-15317834-vmss000000 <none> <none>
Next, we can verify the node information to confirm which instance types were selected and added to the Kubernetes cluster by Luna.
kubectl get nodes "-o=custom-columns=NAME:.metadata.name,KUBELET STATUS:.status.conditions[3].reason,CREATED:.metadata.creationTimestamp,VERSION:.status.nodeInfo.kubeletVersion,INSTANCE TYPE:.metadata.labels.node\.kubernetes\.io/instance-type,CPU(S):.status.capacity.cpu,MEMORY:.status.capacity.memory" --sort-by=metadata.creationTimestamp
Sample output
NAME KUBELET STATUS CREATED VERSION INSTANCE TYPE CPU(S) MEMORY
aks-agentpool-25414935-vmss000001 ContainerRuntimeIsUp 2023-02-28T23:04:36Z v1.24.9 Standard_DS2_v2 2 7116272Ki
aks-agentpool-25414935-vmss000000 ContainerRuntimeIsUp 2023-02-28T23:05:31Z v1.24.9 Standard_DS2_v2 2 7116272Ki
aks-b4ms-32307050-vmss000000 KubeletHasSufficientPID 2023-03-26T21:04:08Z v1.24.9 Standard_B4ms 4 16393240Ki
aks-pd1c22cc5-15317834-vmss000000 KubeletHasSufficientPID 2023-03-26T21:09:25Z v1.24.9 Standard_B8ms 8 32882796Ki