Version: v0.4

GKE

Luna GKE Testing Tutorials

A few very basic text examples can be used to validate the functionally and operation of Luna.

ML Use-Cases /w NVIDIA gpu

Please check if NVIDIA t4 GPU is supported in your cluster zone:

gcloud compute accelerator-types list --filter="zone:( $COMPUTE_REGION )" | grep t4

By default, Luna will autoscale pods with the label elotl-luna=true. Kindly execute the following command kubectl apply -f followed by the provided YAML file. You will observe a pod running and completing its execution on a n1 instance (which has nvidia t4 GPU)..

Upon completion of the pod, the corresponding node will be automatically terminated.

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: gpu-test
  annotations:
    node.elotl.co/instance-gpu-skus: "t4"
  labels:
    elotl-luna: "true"
spec:
  restartPolicy: OnFailure
  containers:
    - name: cuda-vector-add
      image: "registry.k8s.io/cuda-vector-add:v0.1"
      resources:
        limits:
          nvidia.com/gpu: 1
EOF

You can monitor the creation of the new pod and node by running the command watch kubectl get pods,nodes. The test pod will only be active for a brief period after it starts. The GPU node that was added to support the pod will persist for a few more minutes.

To confirm the presence of a GPU on the node, you can run the kubectl describe node command and look for the "nvidia.com/gpu" entry or alternatively, you can run the following command:

kubectl get nodes "-o=custom-columns=NAME:.metadata.name,KUBELET STATUS:.status.conditions[3].reason,CREATED:metadata.creationTimestamp,VERSION:.status.nodeInfo.kubeletVersion,NVIDIA GPU(s):.status.allocatable.nvidia\.com/gpu"

General Testing (non-ML)

Luna will attempt to consolidate multiple smaller pods onto a newly deployed node, or in the case of larger pods, it will allocate a dedicated node for the pod, as occurs with the bin-selection packing mode.

You can perform simple testing using busybox or other pods of varying sizes. The following YAML files can be utilized, and the number of pods can be adjusted to observe Luna's dynamic response.

Small busybox deployment

Several busybox pods will be co-located within a single node

busybox

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox
spec:
  replicas: 6
  selector:
    matchLabels:
      app: busybox
  template:
    metadata:
      labels:
        app: busybox
        elotl-luna: "true"
    spec:
      containers:
        - name: busybox
          image: busybox
          resources:
            requests:
              cpu: 200m
              memory: 128Mi
            limits:
              cpu: 300m
              memory: 256Mi
          command:
            - sleep
            - "3600"
EOF

Larger busybox deployment

A single busybox pod will hit the threshold for bin-selection and be located on it's own node

busybox-large

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox-large
spec:
  replicas: 1
  selector:
    matchLabels:
      app: busybox-large
  template:
    metadata:
      labels:
        app: busybox-large
        elotl-luna: "true"
    spec:
      containers:
        - name: busybox-large
          image: busybox
          resources:
            requests:
              cpu: 4
              memory: 256Mi
            limits:
              cpu: 6
              memory: 512Mi
          command:
            - sleep
            - "3600"
EOF

Check that the pods have started, the -o wide opton will show which does the pods are running on:

kubectl get pods -l elotl-luna=true -o wide

Sample output

NAME                             READY   STATUS    RESTARTS   AGE     IP          NODE                                           NOMINATED NODE   READINESS GATES
busybox-65cb45c86b-26dqd         1/1     Running   0          2m44s   10.68.0.5   gke-test-cluster-c2d-highcpu-4-2b4dd3cf-4g2q   <none>           <none>
busybox-65cb45c86b-7jcft         1/1     Running   0          2m44s   10.68.0.4   gke-test-cluster-c2d-highcpu-4-2b4dd3cf-4g2q   <none>           <none>
busybox-65cb45c86b-cxrj4         1/1     Running   0          2m44s   10.68.0.2   gke-test-cluster-c2d-highcpu-4-2b4dd3cf-4g2q   <none>           <none>
busybox-65cb45c86b-mmxjv         1/1     Running   0          2m44s   10.68.0.7   gke-test-cluster-c2d-highcpu-4-2b4dd3cf-4g2q   <none>           <none>
busybox-65cb45c86b-mq8vj         1/1     Running   0          2m44s   10.68.0.6   gke-test-cluster-c2d-highcpu-4-2b4dd3cf-4g2q   <none>           <none>
busybox-65cb45c86b-q44x6         1/1     Running   0          2m44s   10.68.0.3   gke-test-cluster-c2d-highcpu-4-2b4dd3cf-4g2q   <none>           <none>
busybox-large-789c68dc46-s9rbd   1/1     Running   0          2m26s   10.68.3.2   gke-test-cluster-pff11dc99-92d3eef6-shfm       <none>           <none>

Next, we can verify the node information to confirm which instance types were selected and added to the Kubernetes cluster by Luna.

kubectl get nodes "-o=custom-columns=NAME:.metadata.name,KUBELET STATUS:.status.conditions[3].reason,CREATED:.metadata.creationTimestamp,VERSION:.status.nodeInfo.kubeletVersion,INSTANCE TYPE:.metadata.labels.node\.kubernetes\.io/instance-type,CPU(S):.status.capacity.cpu,MEMORY:.status.capacity.memory" --sort-by=metadata.creationTimestamp

Sample output

NAME                                               KUBELET STATUS             CREATED                VERSION            INSTANCE TYPE   CPU(S)   MEMORY
gke-justin-luna-043b-default-pool-567273d3-1odc    NoCorruptDockerOverlay2    2023-03-10T19:35:39Z   v1.24.9-gke.3200   e2-medium       2        4025892Ki
gke-justin-luna-043b-c2d-highcpu-4-2b4dd3cf-4g2q   NoFrequentKubeletRestart   2023-03-10T19:45:00Z   v1.24.9-gke.3200   c2d-highcpu-4   4        8148012Ki
gke-justin-luna-043b-pff11dc99-92d3eef6-shfm       NoFrequentDockerRestart    2023-03-10T19:46:04Z   v1.24.9-gke.3200   n1-highcpu-8    8        7320632Ki

GKE

Luna GKE Testing Tutorials​

ML Use-Cases /w NVIDIA gpu​

General Testing (non-ML)​

Luna GKE Testing Tutorials

ML Use-Cases /w NVIDIA gpu

General Testing (non-ML)