GKE
Luna GKE Testing Tutorials
A few very basic test examples can be used to validate the functionally and operation of Luna.
ML Use-Cases /w NVIDIA gpu
Please check if NVIDIA t4 GPU is supported in your cluster zone:
gcloud compute accelerator-types list --filter="zone:( $COMPUTE_REGION )" | grep t4
By default, Luna will autoscale pods with the label elotl-luna=true. Kindly execute the following command kubectl apply -f
followed by the provided YAML file.
You will observe a pod running and completing its execution on a n1 instance (which has nvidia t4 GPU)..
Upon completion of the pod, the corresponding node will be automatically terminated.
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: gpu-test
annotations:
node.elotl.co/instance-gpu-skus: "t4"
labels:
elotl-luna: "true"
spec:
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
image: "registry.k8s.io/cuda-vector-add:v0.1"
resources:
limits:
nvidia.com/gpu: 1
EOF
You can monitor the creation of the new pod and node by running the command watch kubectl get pods,nodes
. The test pod will only be active for a brief period after it starts. The GPU node that was added to support the pod will persist for a few more minutes.
To confirm the presence of a GPU on the node, you can run the kubectl describe node
command and look for the "nvidia.com/gpu" entry or alternatively, you can run the following command:
kubectl get nodes "-o=custom-columns=NAME:.metadata.name,KUBELET STATUS:.status.conditions[3].reason,CREATED:metadata.creationTimestamp,VERSION:.status.nodeInfo.kubeletVersion,NVIDIA GPU(s):.status.allocatable.nvidia\.com/gpu"
General Testing (non-ML)
Luna will attempt to consolidate multiple smaller pods onto a newly deployed node, or in the case of larger pods, it will allocate a dedicated node for the pod, as occurs with the bin-selection packing mode.
You can perform simple testing using busybox or other pods of varying sizes. The following YAML files can be utilized, and the number of pods can be adjusted to observe Luna's dynamic response.
Small busybox deployment
Several busybox pods will be co-located within a single node
busybox
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: busybox
spec:
replicas: 6
selector:
matchLabels:
app: busybox
template:
metadata:
labels:
app: busybox
elotl-luna: "true"
spec:
containers:
- name: busybox
image: busybox
resources:
requests:
cpu: 200m
memory: 128Mi
limits:
cpu: 300m
memory: 256Mi
command:
- sleep
- "infinity"
EOF
Larger busybox deployment
A single busybox pod will hit the threshold for bin-selection and be located on it's own node
busybox-large
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: busybox-large
spec:
replicas: 1
selector:
matchLabels:
app: busybox-large
template:
metadata:
labels:
app: busybox-large
elotl-luna: "true"
spec:
containers:
- name: busybox-large
image: busybox
resources:
requests:
cpu: 4
memory: 256Mi
limits:
cpu: 6
memory: 512Mi
command:
- sleep
- "infinity"
EOF
Check that the pods have started, the -o wide opton will show which does the pods are running on:
kubectl get pods -l elotl-luna=true -o wide
Sample output
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
busybox-65cb45c86b-26dqd 1/1 Running 0 2m44s 10.68.0.5 gke-test-cluster-c2d-highcpu-4-2b4dd3cf-4g2q <none> <none>
busybox-65cb45c86b-7jcft 1/1 Running 0 2m44s 10.68.0.4 gke-test-cluster-c2d-highcpu-4-2b4dd3cf-4g2q <none> <none>
busybox-65cb45c86b-cxrj4 1/1 Running 0 2m44s 10.68.0.2 gke-test-cluster-c2d-highcpu-4-2b4dd3cf-4g2q <none> <none>
busybox-65cb45c86b-mmxjv 1/1 Running 0 2m44s 10.68.0.7 gke-test-cluster-c2d-highcpu-4-2b4dd3cf-4g2q <none> <none>
busybox-65cb45c86b-mq8vj 1/1 Running 0 2m44s 10.68.0.6 gke-test-cluster-c2d-highcpu-4-2b4dd3cf-4g2q <none> <none>
busybox-65cb45c86b-q44x6 1/1 Running 0 2m44s 10.68.0.3 gke-test-cluster-c2d-highcpu-4-2b4dd3cf-4g2q <none> <none>
busybox-large-789c68dc46-s9rbd 1/1 Running 0 2m26s 10.68.3.2 gke-test-cluster-pff11dc99-92d3eef6-shfm <none> <none>
Next, we can verify the node information to confirm which instance types were selected and added to the Kubernetes cluster by Luna.
kubectl get nodes "-o=custom-columns=NAME:.metadata.name,KUBELET STATUS:.status.conditions[3].reason,CREATED:.metadata.creationTimestamp,VERSION:.status.nodeInfo.kubeletVersion,INSTANCE TYPE:.metadata.labels.node\.kubernetes\.io/instance-type,CPU(S):.status.capacity.cpu,MEMORY:.status.capacity.memory" --sort-by=metadata.creationTimestamp
Sample output
NAME KUBELET STATUS CREATED VERSION INSTANCE TYPE CPU(S) MEMORY
gke-justin-luna-043b-default-pool-567273d3-1odc NoCorruptDockerOverlay2 2023-03-10T19:35:39Z v1.24.9-gke.3200 e2-medium 2 4025892Ki
gke-justin-luna-043b-c2d-highcpu-4-2b4dd3cf-4g2q NoFrequentKubeletRestart 2023-03-10T19:45:00Z v1.24.9-gke.3200 c2d-highcpu-4 4 8148012Ki
gke-justin-luna-043b-pff11dc99-92d3eef6-shfm NoFrequentDockerRestart 2023-03-10T19:46:04Z v1.24.9-gke.3200 n1-highcpu-8 8 7320632Ki
Zone Affinity and Spread Testing (non-ML)
Luna running on a GKE regional cluster supports kube-scheduler pod placement that includes zone spread or zone affinity. Luna recognizes zone spread expressed in the pod spec topologySpreadConstraints field as topologyKey set to topology.kubernetes.io/zone, and zone affinity expressed in the pod spec nodeAffinity field as the topology.kubernetes.io/zone in a zone value set; examples of each are given below. If Luna's placeNodeSelector option is set true, Luna also recognizes zone affinity expressed as a topology.kubernetes.io/zone nodeSelector. Currently Luna does not support both zone spread and zone affinity on the same pod; Luna webhook reports an error and skips the pod in this case.
Zone spread is supported on a GKE regional cluster for bin packing if the option gcp.binPackingZoneSpread is set true (default false). When bin packing, Luna creates at least one bin pack node in each of the regional cluster's 3 zones, so that kube-scheduler has visibility into the full set of available zones. Then whenever Luna sees pending bin pack pod(s) with zone spread, it scales up the node count in each of the regional cluster's 3 zones, to ensure that kube-scheduler can find needed zone-specific resources. Luna will scale down any unused nodes once the pending pods have been placed. When Luna sees pending bin pack pod(s) with zone affinity, it scales up the node count in an affine zone. And when Luna sees pending bin pack pod(s) with neither zone spread nor zone affinity, it scales up the node count in a single chosen zone.
Luna bin selection support for zone spread and affinity works similarly. By default (option reuseBinSelectNodes=true) Luna bin selection groups pods for node selection based on factors relevant to choosing a node type, including various pod spec fields and Luna annotations. For a pending pod in a group that includes zone spread, Luna scales up the node count in each of its zones. For a pending pod in a group that includes zone affinity, Luna scales up the node count in an affine zone, and for a pending pod in a group that includes neither zone spread nor zone affinity, it scales up the node count in a single chosen zone.
Small busybox zone spread deployment
These 6 small busybox pods will use Luna bin packing mode and be spread across the 3 zones in a regional cluster.
busybox-small-zone-spread
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: busybox-small-zone-spread
spec:
replicas: 6
selector:
matchLabels:
app: busybox-small-zone-spread
template:
metadata:
labels:
app: busybox-small-zone-spread
elotl-luna: "true"
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: "topology.kubernetes.io/zone"
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: busybox-small-zone-spread
containers:
- name: busybox-small-zone-spread
image: busybox
resources:
requests:
cpu: 200m
memory: 128Mi
limits:
cpu: 300m
memory: 256Mi
command:
- sleep
- "infinity"
EOF
You can see the nodes running the zone spread pods:
kubectl get pods -l elotl-luna=true -l app=busybox-small-zone-spread -o wide | awk {'print $1" " $7'} | column -t
NAME NODE
busybox-small-zone-spread-7dcfb5466b-2bpdh gke-anne-regional-c2d-highcpu-4-185e4f7c-bsrv
busybox-small-zone-spread-7dcfb5466b-fv8v9 gke-anne-regional-c2d-highcpu-4-185e4f7c-bsrv
busybox-small-zone-spread-7dcfb5466b-8fdvc gke-anne-regional-c2d-highcpu-4-3996d8b5-v9pc
busybox-small-zone-spread-7dcfb5466b-8zksv gke-anne-regional-c2d-highcpu-4-3996d8b5-5zfw
busybox-small-zone-spread-7dcfb5466b-bflfp gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-spread-7dcfb5466b-j55zs gke-anne-regional-c2d-highcpu-4-e2c00890-1m7p
And can check the zones associated with those nodes to see the spread:
kubectl get nodes -Ltopology.kubernetes.io/zone | awk {'print $1" " $6'} | column -t
NAME ZONE
gke-anne-regional-c2d-highcpu-4-185e4f7c-bsrv us-central1-c
gke-anne-regional-c2d-highcpu-4-3996d8b5-v9pc us-central1-a
gke-anne-regional-c2d-highcpu-4-3996d8b5-5zfw us-central1-a
gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd us-central1-f
gke-anne-regional-c2d-highcpu-4-e2c00890-1m7p us-central1-f
...
Small busybox zone affinity deployment
These 16 small busybox pods will use Luna bin packing mode and are affine to the us-central1-f zone in a regional cluster.
busybox-small-zone-affinity
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: busybox-small-zone-affinity
spec:
replicas: 16
selector:
matchLabels:
app: busybox-small-zone-affinity
template:
metadata:
labels:
app: busybox-small-zone-affinity
elotl-luna: "true"
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- us-central1-f
containers:
- name: busybox-small-zone-affinity
image: busybox
resources:
requests:
cpu: 200m
memory: 128Mi
limits:
cpu: 300m
memory: 256Mi
command:
- sleep
- "infinity"
EOF
You can see the node running the zone affine pods:
kubectl get pods -l elotl-luna=true -l app=busybox-small-zone-affinity -o wide | awk {'print $1" " $7'} | column -t
NAME NODE
busybox-small-zone-affinity-5b77497774-4v7ld gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-5cl8p gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-5r8t4 gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-8cdx4 gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-bpggv gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-cwnvj gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-d7gl2 gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-dsdtw gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-h28sd gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-mf8qs gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-r5lbn gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-r8v2k gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-svrk6 gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-tp4xt gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-whv52 gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-xpjxj gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
And check the zone associated with that node to see it is us-central1-f:
kubectl get nodes -Ltopology.kubernetes.io/zone | awk {'print $1" " $6'} | grep gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd us-central1-f
Large busybox zone spread deployment
These 3 large busybox pods will use Luna bin selection mode and be spread across the 3 zones in a regional cluster.
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: busybox-spread
spec:
replicas: 3
selector:
matchLabels:
app: busybox-spread
template:
metadata:
labels:
app: busybox-spread
elotl-luna: "true"
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: "topology.kubernetes.io/zone"
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: busybox-spread
containers:
- name: busybox-spread
image: busybox
resources:
requests:
cpu: 4
memory: 256Mi
limits:
cpu: 6
memory: 512Mi
command:
- sleep
- "infinity"
EOF
You can see the nodes running the zone spread pods:
kubectl get pods -l elotl-luna=true -l app=busybox-spread -o wide | awk {'print $1" " $7'} | column -t
NAME NODE
busybox-spread-56844c7899-677mn gke-anne-regional-anne-regional-c3eca-99ff7273-k8vg
busybox-spread-56844c7899-j4hs8 gke-anne-regional-anne-regional-c3eca-f958c2d7-1bw4
busybox-spread-56844c7899-ztm4v gke-anne-regional-anne-regional-c3eca-a9469860-3jwz
And can check the zones associated with those nodes to see the spread:
kubectl get nodes -Ltopology.kubernetes.io/zone | awk {'print $1" " $6'} | column -t
NAME ZONE
gke-anne-regional-anne-regional-c3eca-99ff7273-k8vg us-central1-a
gke-anne-regional-anne-regional-c3eca-f958c2d7-1bw4 us-central1-c
gke-anne-regional-anne-regional-c3eca-a9469860-3jwz us-central1-f
...
Large busybox zone affinity deployment
This large busybox pod will use Luna bin selection mode and is affine to the us-central1-f zone in a regional cluster.
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: busybox-affinity
spec:
replicas: 1
selector:
matchLabels:
app: busybox-affinity
template:
metadata:
labels:
app: busybox-affinity
elotl-luna: "true"
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- us-central1-f
containers:
- name: busybox-affinity
image: busybox
resources:
requests:
cpu: 4
memory: 256Mi
limits:
cpu: 6
memory: 512Mi
command:
- sleep
- "infinity"
EOF
As in the previous examples, you can see which node is running the zone affinity pod and can check that it is in the correct zone.