Version: v1.3

Google GKE Tutorials

Luna GKE Testing Tutorials

A few very basic test examples can be used to validate the functionality and operation of Luna.

ML Use-Cases /w NVIDIA gpu

Please check if NVIDIA t4 GPU is supported in your cluster zone:

gcloud compute accelerator-types list --filter="zone:( $COMPUTE_REGION )" | grep t4

By default, Luna will autoscale pods with the label elotl-luna=true. Kindly execute the following command kubectl apply -f followed by the provided YAML file. You will observe a pod running and completing its execution on a n1 instance (which has nvidia t4 GPU)..

Upon completion of the pod, the corresponding node will be automatically terminated.

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: gpu-test
  annotations:
    node.elotl.co/instance-gpu-skus: "t4"
  labels:
    elotl-luna: "true"
spec:
  restartPolicy: OnFailure
  containers:
    - name: cuda-vector-add
      image: "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0"
      resources:
        limits:
          nvidia.com/gpu: 1
EOF

You can monitor the creation of the new pod and node by running the command watch kubectl get pods,nodes. The test pod will only be active for a brief period after it starts. The GPU node that was added to support the pod will persist for a few more minutes.

To confirm the presence of a GPU on the node, you can run the kubectl describe node command and look for the "nvidia.com/gpu" entry or alternatively, you can run the following command:

kubectl get nodes "-o=custom-columns=NAME:.metadata.name,KUBELET STATUS:.status.conditions[3].reason,CREATED:metadata.creationTimestamp,VERSION:.status.nodeInfo.kubeletVersion,NVIDIA GPU(s):.status.allocatable.nvidia\.com/gpu"

General Testing (non-ML)

Luna will attempt to consolidate multiple smaller pods onto a newly deployed node, or in the case of larger pods, it will allocate a dedicated node for the pod, as occurs with the bin-selection packing mode.

You can perform simple testing using busybox or other pods of varying sizes. The following YAML files can be utilized, and the number of pods can be adjusted to observe Luna's dynamic response.

Small busybox deployment

Several busybox pods will be co-located within a single node

busybox

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox
spec:
  replicas: 6
  selector:
    matchLabels:
      app: busybox
  template:
    metadata:
      labels:
        app: busybox
        elotl-luna: "true"
    spec:
      containers:
        - name: busybox
          image: busybox
          resources:
            requests:
              cpu: 200m
              memory: 128Mi
            limits:
              cpu: 300m
              memory: 256Mi
          command:
            - sleep
            - "infinity"
EOF

Larger busybox deployment

A single busybox pod will hit the threshold for bin-selection and be located on it's own node

busybox-large

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox-large
spec:
  replicas: 1
  selector:
    matchLabels:
      app: busybox-large
  template:
    metadata:
      labels:
        app: busybox-large
        elotl-luna: "true"
    spec:
      containers:
        - name: busybox-large
          image: busybox
          resources:
            requests:
              cpu: 4
              memory: 256Mi
            limits:
              cpu: 6
              memory: 512Mi
          command:
            - sleep
            - "infinity"
EOF

Check that the pods have started, the -o wide opton will show which does the pods are running on:

kubectl get pods -l elotl-luna=true -o wide

Sample output

NAME                             READY   STATUS    RESTARTS   AGE     IP          NODE                                           NOMINATED NODE   READINESS GATES
busybox-65cb45c86b-26dqd         1/1     Running   0          2m44s   10.68.0.5   gke-test-cluster-c2d-highcpu-4-2b4dd3cf-4g2q   <none>           <none>
busybox-65cb45c86b-7jcft         1/1     Running   0          2m44s   10.68.0.4   gke-test-cluster-c2d-highcpu-4-2b4dd3cf-4g2q   <none>           <none>
busybox-65cb45c86b-cxrj4         1/1     Running   0          2m44s   10.68.0.2   gke-test-cluster-c2d-highcpu-4-2b4dd3cf-4g2q   <none>           <none>
busybox-65cb45c86b-mmxjv         1/1     Running   0          2m44s   10.68.0.7   gke-test-cluster-c2d-highcpu-4-2b4dd3cf-4g2q   <none>           <none>
busybox-65cb45c86b-mq8vj         1/1     Running   0          2m44s   10.68.0.6   gke-test-cluster-c2d-highcpu-4-2b4dd3cf-4g2q   <none>           <none>
busybox-65cb45c86b-q44x6         1/1     Running   0          2m44s   10.68.0.3   gke-test-cluster-c2d-highcpu-4-2b4dd3cf-4g2q   <none>           <none>
busybox-large-789c68dc46-s9rbd   1/1     Running   0          2m26s   10.68.3.2   gke-test-cluster-pff11dc99-92d3eef6-shfm       <none>           <none>

Next, we can verify the node information to confirm which instance types were selected and added to the Kubernetes cluster by Luna.

kubectl get nodes "-o=custom-columns=NAME:.metadata.name,KUBELET STATUS:.status.conditions[3].reason,CREATED:.metadata.creationTimestamp,VERSION:.status.nodeInfo.kubeletVersion,INSTANCE TYPE:.metadata.labels.node\.kubernetes\.io/instance-type,CPU(S):.status.capacity.cpu,MEMORY:.status.capacity.memory" --sort-by=metadata.creationTimestamp

Sample output

NAME                                               KUBELET STATUS             CREATED                VERSION            INSTANCE TYPE   CPU(S)   MEMORY
gke-justin-luna-043b-default-pool-567273d3-1odc    NoCorruptDockerOverlay2    2023-03-10T19:35:39Z   v1.24.9-gke.3200   e2-medium       2        4025892Ki
gke-justin-luna-043b-c2d-highcpu-4-2b4dd3cf-4g2q   NoFrequentKubeletRestart   2023-03-10T19:45:00Z   v1.24.9-gke.3200   c2d-highcpu-4   4        8148012Ki
gke-justin-luna-043b-pff11dc99-92d3eef6-shfm       NoFrequentDockerRestart    2023-03-10T19:46:04Z   v1.24.9-gke.3200   n1-highcpu-8    8        7320632Ki

Zone Affinity and Spread Testing (non-ML)

Luna running on a GKE regional cluster supports kube-scheduler pod placement that includes zone spread or zone affinity. Luna recognizes zone spread expressed in the pod spec topologySpreadConstraints field as topologyKey set to topology.kubernetes.io/zone, and zone affinity expressed in the pod spec nodeAffinity field as the topology.kubernetes.io/zone in a zone value set; examples of each are given below. If Luna's placeNodeSelector option is set true, Luna also recognizes zone affinity expressed as a topology.kubernetes.io/zone nodeSelector. If Luna's placeBoundPVC option is set true, Luna also recognizes zone affinity if the pod has a persistent volume claim bound to a persistent volume with zone affinity. Currently Luna does not support both zone spread and zone affinity on the same pod; Luna webhook reports an error and skips the pod in this case.

Zone spread is supported on a GKE regional cluster for bin packing if the option gcp.binPackingZoneSpread is set true (default false). When bin packing, Luna creates at least one bin pack node in each of the regional cluster's 3 zones, so that kube-scheduler has visibility into the full set of available zones. Then whenever Luna sees pending bin pack pod(s) with zone spread, it scales up the node count in each of the regional cluster's 3 zones, to ensure that kube-scheduler can find needed zone-specific resources. Luna will scale down any unused nodes once the pending pods have been placed. When Luna sees pending bin pack pod(s) with zone affinity, it scales up the node count in an affine zone. And when Luna sees pending bin pack pod(s) with neither zone spread nor zone affinity, it scales up the node count in a single chosen zone.

Luna bin selection support for zone spread and affinity works similarly. By default (option reuseBinSelectNodes=true) Luna bin selection groups pods for node selection based on factors relevant to choosing a node type, including various pod spec fields and Luna annotations. For a pending pod in a group that includes zone spread, Luna scales up the node count in each of its zones. For a pending pod in a group that includes zone affinity, Luna scales up the node count in an affine zone, and for a pending pod in a group that includes neither zone spread nor zone affinity, it scales up the node count in a single chosen zone.

Small busybox zone spread deployment

export LUNA_LABEL_KEY=elotl-luna
export LUNA_LABEL_VALUE=true

These 6 small busybox pods will use Luna bin packing mode and be spread across the 3 zones in a regional cluster.

busybox-small-zone-spread

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox-small-zone-spread
spec:
  replicas: 6
  selector:
    matchLabels:
      app: busybox-small-zone-spread
  template:
    metadata:
      labels:
        app: busybox-small-zone-spread
        ${LUNA_LABEL_KEY}: "${LUNA_LABEL_VALUE}"
    spec:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: "topology.kubernetes.io/zone"
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: busybox-small-zone-spread
      containers:
        - name: busybox-small-zone-spread
          image: busybox
          resources:
            requests:
              cpu: 200m
              memory: 128Mi
            limits:
              cpu: 300m
              memory: 256Mi
          command:
            - sleep
            - "infinity"
EOF

You can see the nodes running the zone spread pods:

kubectl get pods -l  ${LUNA_LABEL_KEY}=${LUNA_LABEL_VALUE} -l app=busybox-small-zone-spread -o wide | awk {'print $1" " $7'} | column -t
NAME                                        NODE
busybox-small-zone-spread-7dcfb5466b-2bpdh  gke-anne-regional-c2d-highcpu-4-185e4f7c-bsrv
busybox-small-zone-spread-7dcfb5466b-fv8v9  gke-anne-regional-c2d-highcpu-4-185e4f7c-bsrv
busybox-small-zone-spread-7dcfb5466b-8fdvc  gke-anne-regional-c2d-highcpu-4-3996d8b5-v9pc
busybox-small-zone-spread-7dcfb5466b-8zksv  gke-anne-regional-c2d-highcpu-4-3996d8b5-5zfw
busybox-small-zone-spread-7dcfb5466b-bflfp  gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-spread-7dcfb5466b-j55zs  gke-anne-regional-c2d-highcpu-4-e2c00890-1m7p

And can check the zones associated with those nodes to see the spread:

kubectl get nodes -Ltopology.kubernetes.io/zone | awk {'print $1" " $6'} | column -t
NAME                                                 ZONE
gke-anne-regional-c2d-highcpu-4-185e4f7c-bsrv        us-central1-c
gke-anne-regional-c2d-highcpu-4-3996d8b5-v9pc        us-central1-a
gke-anne-regional-c2d-highcpu-4-3996d8b5-5zfw        us-central1-a
gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd        us-central1-f
gke-anne-regional-c2d-highcpu-4-e2c00890-1m7p        us-central1-f
...

Small busybox zone affinity deployment

These 16 small busybox pods will use Luna bin packing mode and are affine to the us-central1-f zone in a regional cluster.

busybox-small-zone-affinity

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox-small-zone-affinity
spec:
  replicas: 16
  selector:
    matchLabels:
      app: busybox-small-zone-affinity
  template:
    metadata:
      labels:
        app: busybox-small-zone-affinity
        ${LUNA_LABEL_KEY}: "${LUNA_LABEL_VALUE}"
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                - us-central1-f
      containers:
        - name: busybox-small-zone-affinity
          image: busybox
          resources:
            requests:
              cpu: 200m
              memory: 128Mi
            limits:
              cpu: 300m
              memory: 256Mi
          command:
            - sleep
            - "infinity"
EOF

You can see the node running the zone affine pods:

kubectl get pods -l ${LUNA_LABEL_KEY}=${LUNA_LABEL_VALUE} -l app=busybox-small-zone-affinity -o wide | awk {'print $1" " $7'} | column -t
NAME                                          NODE
busybox-small-zone-affinity-5b77497774-4v7ld  gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-5cl8p  gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-5r8t4  gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-8cdx4  gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-bpggv  gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-cwnvj  gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-d7gl2  gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-dsdtw  gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-h28sd  gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-mf8qs  gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-r5lbn  gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-r8v2k  gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-svrk6  gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-tp4xt  gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-whv52  gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
busybox-small-zone-affinity-5b77497774-xpjxj  gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd

And check the zone associated with that node to see it is us-central1-f:

kubectl get nodes -Ltopology.kubernetes.io/zone | awk {'print $1" " $6'} | grep gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd
gke-anne-regional-c2d-highcpu-4-e2c00890-kgwd us-central1-f

Large busybox zone spread deployment

These 3 large busybox pods will use Luna bin selection mode and be spread across the 3 zones in a regional cluster.

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox-spread
spec:
  replicas: 3
  selector:
    matchLabels:
      app: busybox-spread
  template:
    metadata:
      labels:
        app: busybox-spread
        ${LUNA_LABEL_KEY}: "${LUNA_LABEL_VALUE}"
    spec:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: "topology.kubernetes.io/zone"
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: busybox-spread
      containers:
        - name: busybox-spread
          image: busybox
          resources:
            requests:
              cpu: 4
              memory: 256Mi
            limits:
              cpu: 6
              memory: 512Mi
          command:
            - sleep
            - "infinity"
EOF

You can see the nodes running the zone spread pods:

kubectl get pods -l ${LUNA_LABEL_KEY}=${LUNA_LABEL_VALUE} -l app=busybox-spread -o wide | awk {'print $1" " $7'} | column -t
NAME                             NODE
busybox-spread-56844c7899-677mn  gke-anne-regional-anne-regional-c3eca-99ff7273-k8vg
busybox-spread-56844c7899-j4hs8  gke-anne-regional-anne-regional-c3eca-f958c2d7-1bw4
busybox-spread-56844c7899-ztm4v  gke-anne-regional-anne-regional-c3eca-a9469860-3jwz

And can check the zones associated with those nodes to see the spread:

kubectl get nodes -Ltopology.kubernetes.io/zone | awk {'print $1" " $6'} | column -t
NAME                                                 ZONE
gke-anne-regional-anne-regional-c3eca-99ff7273-k8vg  us-central1-a
gke-anne-regional-anne-regional-c3eca-f958c2d7-1bw4  us-central1-c
gke-anne-regional-anne-regional-c3eca-a9469860-3jwz  us-central1-f
...

Large busybox zone affinity deployment

This large busybox pod will use Luna bin selection mode and is affine to the us-central1-f zone in a regional cluster.

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox-affinity
spec:
  replicas: 1
  selector:
    matchLabels:
      app: busybox-affinity
  template:
    metadata:
      labels:
        app: busybox-affinity
        ${LUNA_LABEL_KEY}: "${LUNA_LABEL_VALUE}"
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                - us-central1-f
      containers:
        - name: busybox-affinity
          image: busybox
          resources:
            requests:
              cpu: 4
              memory: 256Mi
            limits:
              cpu: 6
              memory: 512Mi
          command:
            - sleep
            - "infinity"
EOF

As in the previous examples, you can see which node is running the zone affinity pod and can check that it is in the correct zone.

Luna GKE Testing Tutorials​

ML Use-Cases /w NVIDIA gpu​

General Testing (non-ML)​

Zone Affinity and Spread Testing (non-ML)​

Luna GKE Testing Tutorials

ML Use-Cases /w NVIDIA gpu

General Testing (non-ML)

Zone Affinity and Spread Testing (non-ML)