Skip to main content
Version: v0.4

EKS

Prerequisites

  1. aws cli
  2. kubectl with correct context selected: pointing to the cluster you want to deploy Luna on. If the name of the cluster passed to the deploy script doesn’t match the name of the EKS cluster in the kubectl context, the deploy script will exit with an error.
  3. helm: the package manager for Kubernetes
  4. eksctl >= 0.104.0: to manage the EKS OpenID connect provider.
  5. An existing EKS cluster. If you don't have one, you can create a new one with eksctl: eksctl --region=... create cluster --name=...

Considerations

Fargate

Please note that Luna, along with the associated cert-manager, is not currently supported on Fargate nodes. To ensure proper functioning of these services, please ensure that any pods running in the elotl or cert-manager namespaces are not included in any Fargate profiles.

If you would like Luna to leverage Fargate nodes for Luna managed workloads, you can enable this feature by answering 'y' to the question 'Do you wish to use EKS Fargate? [y/n]' when running deploy.sh. Once enabled, Luna will automatically create a new Fargate profile as part of the deployment process.

See the configuration page on how to mark workloads for Fargate.

Spot

Luna running on EKS supports allocating Spot instances for bin selection.

If you would like Luna to consider a Spot instance for your workload, please include the following annotation in your configuration:

    annotations:
node.elotl.co/instance-offerings: "spot"

Luna will allocate a Spot instance if available for the lowest-priced right-sized instance type; otherwise, it will allocate an on-demand instance.

If a Luna-allocated Spot instance node is terminated, the associated workload will become pending and Luna will again select a node for it.

The user can set up an AWS SQS queue to receive Spot interruption messages, delivered two minutes before termination, and can provide that queue name to Luna via the AWS option spotSqsQueueName. When Luna receives a Spot termination message, it marks the node with "node.elotl.co/spot-event: termination". Nodes with this annotation are targeted in Luna's scale down selection.

Step 1(optional): Install Nvidia gpu driver for gpu workload

    kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.12.0/nvidia-device-plugin.yml

Step 2: Deploy Luna

Luna needs cert-manager running in the cluster. Deploy script tries to detect cert-manager in the cluster and installs cert-manager to cert-manager namespace otherwise

    cd luna-vX.Y.Z/
./deploy.sh <eks-cluster-name> <aws-region> <additional-helm-values(optional)>

Step 3: Verify Luna

kubectl get all -n elotl

Sample output

    NAME                                READY   STATUS    RESTARTS   AGE
pod/luna-manager-5d8578565d-86jwc 1/1 Running 0 56s
pod/luna-webhook-58b7b5dcfb-dwpcb 1/1 Running 0 56s
pod/luna-webhook-58b7b5dcfb-xmlds 1/1 Running 0 56s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/luna-webhook ClusterIP x.x.x.x <none> 8443/TCP 57s

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/luna-manager 1/1 1 1 57s
deployment.apps/luna-webhook 2/2 2 2 57s

NAME DESIRED CURRENT READY AGE
replicaset.apps/luna-manager-5d8578565d 1 1 1 57s
replicaset.apps/luna-webhook-58b7b5dcfb 2 2 2 57s

Step 4: Testing

Follow our tutorial to understand value provided by Luna.

Step 5: Verify test pod launch and dynamic worker node addition/removal (while testing)

kubectl get pods --selector=elotl-luna=true -o wide -w
kubectl get nodes -w

Cleanup

We recommend you delete all the pods running in the nodes managed by Luna manager, otherwise there may be orphan nodes left behind once you uninstall Luna. The uninstall.sh script won’t remove orphan nodes to prevent accidentally knocking out critical workloads. These orphan nodes can be easily cleaned up, as described below.

To remove Luna manager’s Helm chart and the custom AWS resources created to run Luna execute the uninstall script:

    ./uninstall.sh <cluster-name> <region>

This will not remove the left over nodes that Luna manager hasn’t scaled down yet. To get the list of orphan nodes’ instance IDs you can use the following command replacing <eks-cluster-name> with the name of the cluster:

    aws ec2 describe-instances \
--filters Name=tag:elotl.co/nodeless-cluster/name/<eks-cluster-name>,Values=owned \
--query "Reservations[*].Instances[*].[InstanceId]" \
--output text

To ensure all the nodes managed by Luna manager are deleted, execute the following command replacing <eks-cluster-name> with the name of the cluster:

    aws ec2 terminate-instances --instance-ids \
$(aws ec2 describe-instances \
--filters Name=tag:elotl.co/nodeless-cluster/name/<eks-cluster-name>,Values=owned \
--query "Reservations[*].Instances[*].[InstanceId]" \
--output text)

Note that all the pods running on these nodes will be forcefully terminated.

To delete Luna manager and the web hook from the cluster while preserving the AWS resources execute the following:

helm uninstall elotl-luna --namespace=elotl
kubectl delete namespace elotl

If you decide to uninstall the Helm chart instead of running uninstall.sh, please ensure that all the orphan nodes have been cleaned as described above.

Notes

Security Groups

Security Groups act as virtual firewalls for EC2 instances to control incoming and outgoing traffic. If a node is missing a security group rule it can affect Luna’s ability to attach the nodes to the EKS cluster or prevent nodes from running pods and services.

To ensure that all the security groups required by EKS are applied to the Luna managed nodes, we tag security groups with the key elotl.co/nodeless-cluster/name and the cluster name as its value. When it starts Luna queries what security groups are needed and add them to the nodes.

When Luna is deployed, the default EKS security groups are automatically tagged. If you wish to tag another security group you can use awscli to add the tags to the security group:

    aws --region=<region> \
ec2 create-tags \
--resources <security group id> \
--tags "Key=elotl.co/nodeless-cluster/name,Value=<cluster_name>"

Once tagged, you must restart the Luna manager pod for Luna to assign the new security group to the newly provisioned nodes. Notes that existing Luna managed nodes will not have their security groups updated and will have to be replaced to get the new security group assignment working.