Skip to main content
Version: v0.8.0

Highly available Installation

Overview

Purpose

This guide provides step-by-step instructions for installing Nova, a control plane and agent system designed to manage multiple Kubernetes clusters. By following this guide, you will set up the Nova Control Plane on a hosting Kubernetes cluster and deploy Nova Agents to workload clusters.

Scope

This guide covers:

  • Prerequisites: Requirements before installing Nova.
  • Installing novactl: How to download and set up the Nova CLI.
  • Deploying Nova: Instructions for deploying the Nova Control Plane and Agents.
  • Post-Installation Checks: Verifying the installation.
  • Uninstalling Nova: Steps to remove Nova if needed.

Key Concepts

  • Nova Control Plane: The central management unit running on a hosting Kubernetes cluster.
  • Nova Agent: The component deployed to each workload cluster for management.
  • novactl: The command-line interface (CLI) for installing, uninstalling and checking the status of a Nova deployment.
  • Workload Cluster: A Kubernetes cluster managed by the Nova Control Plane.
  • Hosting Cluster: A Kubernetes cluster where the Nova Control Plane runs.

Prerequisites

  1. At least 2 Kubernetes clusters up and running. One cluster will be the hosting cluster where nova control plane runs. Other clusters will be workload clusters that are managed by the nova control plane.
  2. Installed and configured kubectl
  3. Nova cannot be deployed to an Autopilot GKE cluster. Please validate that you are deploying to a non-Autopilot cluster.
  4. Cluster hosting Nova Control Plane MUST have storage provisioner and default StorageClass configured. Nova Control Plane uses [etcd] as a backing store, which runs as StatefulSet and requires PersistentVolume to work.
  5. Cluster hosting Nova Control Plane MUST have ingress controller configured. Nova API Server is exposed as a LoadBalancer type Kubernetes Service and it needs to get an IP address or domain that will be reachable from the Nova Agent in the workload cluster, as well as human users interacting with Nova Control Plane.

Kubernetes compatibility

Nova VersionKubernetes Versions Supported
v0.7v1.25, v1.26, v1.27, v1.28
v0.6v1.24, v1.25

Installation steps

  1. Download and install novactl
  2. Prepare hosting cluster
  3. Create API Server load balancer service.
  4. Create certificates for Nova Control Plane
  5. Install Control Plane Components
  6. Get Nova Control Plane kubeconfig
  7. Install Nova CRDs
  8. Verify your installation
  9. Connect workload clusters

Download novactl

novactl is our CLI that allows you to easily create new Nova Control Planes, register new Nova Workload Clusters, check the health of your Nova cluster, and more!

If you don't have the release tarball then in order to download the latest novactl version for your OS, run:

curl -s https://api.github.com/repos/elotl/novactl/releases/latest | \
jq -r '.assets[].browser_download_url' | \
grep "$(uname -s | tr '[:upper:]' '[:lower:]')-$(uname -m | sed 's/x86_64/amd64/;s/i386/386/;s/aarch64/arm64/')" | \
xargs -I {} curl -L {} -o novactl

Install novactl

Make the binary executable

Once you have the binary, run:

chmod +x novactl*

Place the binary in your PATH

The following is an example to install the plugin in /usr/local/bin for Unix-like operating systems:

sudo mv novactl* /usr/local/bin/novactl

If you accidentally downloadedmore than one novactl binary, please move only the binary that corresponds to the OS and ARCH of your machine to the /usr/local/bin location.

Install it as kubectl plugin

novactl is ready to work as kubectl plugin. Our docs assume you're using novactl as kubectl plugin. To make this work, simply run:

sudo novactl kubectl-install

And test if it works:

kubectl nova --version
kubectl-nova version v0.7.1 (git: a97586b5) built: 20231102171119

Upgrading novactl

If you want to upgrade novactl to latest version, it's enough to run all previous steps starting from Download novactl up to this point again. This will automatically download latest version and replace your local binary with it.

Preparing hosting cluster

Cluster hosting Nova Control Plane MUST have storage provisioner and default StorageClass configured. Nova Control Plane uses [etcd] as a backing store, which runs as StatefulSet and requires PersistentVolume to work. Cluster hosting Nova Control Plane MUST have ingress controller configured.. Nova API Server is exposed as a LoadBalancer type Kubernetes Service and it needs to get an IP address or domain that will be reachable from the Nova Agent in the workload cluster, as well as human users interacting with Nova Control Plane.

Creating API Server

By default, we will install everything in the elotl namespace, you can modify it if you need. Please export the name of the hosting cluster kube context.

export INSTALL_NAMESPACE=elotl
export HOSTING_CLUSTER_CONTEXT=kind-cp
kubectl --context=${HOSTING_CLUSTER_CONTEXT} create namespace ${INSTALL_NAMESPACE}
kubectl --context=${HOSTING_CLUSTER_CONTEXT} create -f install/base/control-plane/apiserver.yaml -n ${INSTALL_NAMESPACE}

then, we need to wait for external IP being allocated. If it's still Pending after a few minutes, you should check whether your cluster has a working Ingress controller as mentioned in the hosting cluster Prerequisites.

kubectl --context=${HOSTING_CLUSTER_CONTEXT} wait -n ${INSTALL_NAMESPACE} service/apiserver --for=jsonpath='{.status.loadBalancer.ingress[0].ip}' --timeout=360s

If this command fails, it's very likely that API Server got hostname (instead of external IP) allocated. You can check it using following command:

kubectl --context=${HOSTING_CLUSTER_CONTEXT} wait -n ${INSTALL_NAMESPACE} service/apiserver --for=jsonpath='{.status.loadBalancer.ingress[0].hostname}' --timeout=360s
Configure domain name record for API Server service

We strongly recommend configuring a domain name record or at least external static IP address for this Service.

This API Server service will be an entrypoint to the Nova Control Plane, for workload clusters, as well as human users.

Understand the impact before deleting this Service

Make sure that you know what you are doing before deleting this Service. Without API Server being exposed to the workload clusters and users, Nova Control Plane is not usable. You should delete it only if you don't intend to use Nova anymore.

Generating certificates for Nova Control Plane

Nova Control Plane runs similar components as a regular Kubernetes cluster, e.g.: apiserver, kube-controller-manager, nova-scheduler and key-value store. To secure communications between components, Nova Control Plane needs to create a set of certificates, similar to the kubeadm init certs phase. These certificates are then mounted from kubernetes secrets to the control plane components.

We provide kubectl nova subcommand to generate certificates and the secret manifests in correct format. We will provide the instructions how to do this using kubeadm in the future.

For now, you can use install certs subcommand. The command needs API Server IP as an input and the namespace where API Server will be installed (default is elotl).

apiserver_ip=$(kubectl --context=${HOSTING_CLUSTER_CONTEXT} get -n ${INSTALL_NAMESPACE} service/apiserver -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
kubectl nova --context=${HOSTING_CLUSTER_CONTEXT} install certs --apiserver-public-endpoint="${apiserver_ip}" --namespace=${INSTALL_NAMESPACE} --nova-node-ip="${apiserver_ip}" > "${PWD}/nova_certificates.yaml"

If this command fails, it's likely that your API Server service got exposed with hostname, not external IP. Then, use this command instead:

apiserver_ip=$(kubectl --context=${HOSTING_CLUSTER_CONTEXT} get -n ${INSTALL_NAMESPACE} service/apiserver -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
test -n "${apiserver_ip}"
kubectl nova --context=${HOSTING_CLUSTER_CONTEXT} install certs --apiserver-public-endpoint="${apiserver_ip}" --namespace=${INSTALL_NAMESPACE} --nova-node-ip="${apiserver_ip}" > "${PWD}/nova_certificates.yaml"

Then, we can create those Secrets in the hosting cluster:

kubectl --context=${HOSTING_CLUSTER_CONTEXT} create -f "${PWD}/nova_certificates.yaml"

By default, these certificates expire in 10 years from the generation. They can be rotated by re-generating the secrets, applying them to the hosting cluster and restarting Nova Control Plane components.

Install Control Plane Components

We will use kustomize overlay, which changes a number of replicas for each component to 3 and adds the pod anti affinity, to ensure that pods of each component are distributed across 3 nodes. We will use topology.kubernetes.io/zone as a topologyKey in the podAntiAffinity. If you want to use another node label as a topology key, please edit files in install/overlays/cp-ha.

kubectl --context=${HOSTING_CLUSTER_CONTEXT} apply -k install/overlays/cp-ha -n ${INSTALL_NAMESPACE}

It might take a while, but eventually apiserver, kube-controller-manager, nova-scheduler and etcd should get ready and available:

kubectl --context=${HOSTING_CLUSTER_CONTEXT} wait pod --for=jsonpath='{.status.phase}'=Running -n ${INSTALL_NAMESPACE} -l app=etcd --timeout=360s
kubectl --context=${HOSTING_CLUSTER_CONTEXT} wait pod --for=jsonpath='{.status.phase}'=Running -n ${INSTALL_NAMESPACE} -l component=apiserver --timeout=180s
kubectl --context=${HOSTING_CLUSTER_CONTEXT} wait pod --for=jsonpath='{.status.phase}'=Running -n ${INSTALL_NAMESPACE} -l component=controller-manager --timeout=180s
kubectl --context=${HOSTING_CLUSTER_CONTEXT} wait pod --for=jsonpath='{.status.phase}'=Running -n ${INSTALL_NAMESPACE} -l component=nova-scheduler --timeout=180s

If all these conditions are met, we can proceed with Control Plane configuration.

Troubleshooting error: timed out waiting for the condition on pods/etcd-0

ETCD pods will not start if the storage provisioner is not configured for a hosting cluster (as mentioned in the Preparing hosting cluster section above). You can also take a look at troubleshooting section for this issue

Get Nova KubeConfig

To get Nova kubeconfig, we need to be sure that Nova API Server Endpoint is responding. Nova Control Plane will create a "NovaAPIServerEndpointReady" event once it's ready.

kubectl --context=${HOSTING_CLUSTER_CONTEXT} wait -n ${INSTALL_NAMESPACE} '--for=jsonpath={.reason}'=NovaAPIServerEndpointReady event/NovaAPIServerEndpointReady --timeout=360s

We will use NovaCLI as kubectl plugin to generate kubeconfig.

kubectl nova --context=${HOSTING_CLUSTER_CONTEXT} get kubeconfig -n ${INSTALL_NAMESPACE} > "${PWD}/nova_kubeconfig.yaml"
cat "${PWD}/nova_kubeconfig.yaml"

you can examine the contents of the file. It should look similar to:

apiVersion: v1
clusters:
- cluster:
certificate-authority-data: "LS0tLS1...LS0K"
server: "https://172...200"
name: nova
contexts:
- context:
cluster: nova
user: nova-admin
name: nova
current-context: nova
kind: Config
preferences: {}
users:
- name: nova-admin
user:
client-certificate-data: "LS0tLS1CRUd...LS0tLQo="
client-key-data: "LS0tLS1CRUdJ...tLS0tLQo="

Install Nova CRDs

Once we can talk to the Nova API Server, we can install Nova CRDs. These are need for a proper Nova functioning.

export KUBECONFIG=${PWD}/nova_kubeconfig.yaml:${HOME}/.kube/config:${KUBECONFIG}
kubectl config get-contexts
kubectl --context=nova create -f install/base/control-plane/nova_crds.yaml

Nova Control Plane component will run a set of checks at the startup, and once they all pass, it will create NovaControlPlaneReady event. You can use kubectl to wait for this event:

kubectl --context=${HOSTING_CLUSTER_CONTEXT} wait -n ${INSTALL_NAMESPACE} '--for=jsonpath={.reason}'=NovaControlPlaneReady event/NovaControlPlaneReady --timeout=540s

At this point, we have Nova Control Plane up and running. To verify it, we can check the status of the Nova Control Plane:

kubectl nova --context=nova --hosting-cluster-context=${HOSTING_CLUSTER_CONTEXT} --hosting-cluster-nova-namespace=${INSTALL_NAMESPACE} status
Checking status of Nova Control Plane Components

* API Server status... Running √
* Kube-controller-manager status... Running √
* ETCD status... Running √
* Nova scheduler status... Running √
Nova Control Plane is healthy √

Checking presence of Nova Custom Resource Definitions

* Cluster CRD presence... installed √
0 workload clusters connected ‼
please connect at least one Cluster, otherwise Nova does not have a target cluster to run your workloads. Connecting clusters can be done by running novactl install agent <cluster-name> in correct Kube context.
* SchedulePolicy CRD presence... installed √
* 0 SchedulePolicies defined ‼
please create at least one SchedulePolicy, otherwise Nova does not know where to run your workloads. SchedulePolicy spec: https://docs.elotl.co/nova/intro
* ScheduleGroup CRD presence... installed √
All Nova Custom Resource Definitions installed √

To schedule any workloads via Nova we need to connect workload clusters. Let's do it.

Connect workload clusters

Install nova agent into workload cluster

Each workload cluster needs a Nova agent. The Nova agent will be deployed by default to the elotl namespace. Before deploying Nova agent, you need to ensure that the Nova's init-kubeconfig is present in the elotl namespace. Nova's init-kubeconfig provides a kube config to the Nova Control Plane. This kube config is used by Nova agent in the workload cluster to connect and register itself as a workload cluster in the Nova Control Plane.

export INSTALL_NAMESPACE=elotl

Let's create the namespace first:

kubectl --context=kind-workload-1 create namespace ${INSTALL_NAMESPACE}

and copy init-kubeconfig from Nova Control Plane to workload cluster:

kubectl --context=nova get secret -n ${INSTALL_NAMESPACE} nova-cluster-init-kubeconfig -o yaml | kubectl --context=kind-workload-1 apply -f -

To connect a workload cluster to Nova, we will use kubectl apply with the kustomize overlay. By default, we ship an overlay for two workload clusters, named kind-workload-1 and kind-workload-2, respectively in install/overlays/. To name your workload cluster differently, you can modify the contents of the install/overlays/workload-cluster-1/nova_agent.yaml or install/overlays/workload-cluster-2/nova_agent.yaml (or copy the entire directory and rename it). In nova_agent.yaml, you need to change the line --cluster-name=kind-workload-2 and replace kind-workload-2/kind-workload-1 with your desired workload cluster name.

Open install/base/agent/kustomization.yaml in text editor and set namespace: to the namespace you chose and exported as $INSTALL_NAMESPACE. You can also do it using sed:

sed "s/namespace: elotl/namespace: ${INSTALL_NAMESPACE}/" install/base/agent/kustomization.yaml > temp_file && mv temp_file install/base/agent/kustomization.yaml

Next step is creating the agent in the workload cluster context:

kubectl --context=kind-workload-1 apply -k install/overlays/workload-cluster-1

Now lets check if that worked! Simply run:

kubectl get --context=nova clusters

Remember to update path to your Nova Control Plane kubeconfig

NAME                    K8S-VERSION   K8S-CLUSTER   REGION   ZONE   READY   IDLE   STANDBY
kind-workload-1 1.25 workload-1 True True False

What if I don't see my workload cluster listed?

If agent install finished without issues and your cluster is not showing up in Nova Control Plane, something went wrong during agent registration process. Run the following command to get agent logs:

kubectl logs --context nova-example-agent-1 -n elotl deployment/nova-agent

And start debuging from there!

Install other workload clusters

If you have a second cluster, run the same commands with a different cluster and cluster name, e.g.,

kubectl --context=kind-workload-2 create namespace ${INSTALL_NAMESPACE}
kubectl --context=nova get secret -n ${INSTALL_NAMESPACE} nova-cluster-init-kubeconfig -o yaml | kubectl --context=kind-workload-2 apply -f -
kubectl --context=kind-workload-2 apply -k install/overlays/workload-cluster-2

Verify your installation

You can use novactl status subcommand to examine the state of the Nova Control Plane:

kubectl nova --context=nova --hosting-cluster-context=${HOSTING_CLUSTER_CONTEXT} --hosting-cluster-nova-namespace=elotl status
Checking status of Nova Control Plane Components

* API Server status... Running √
* Kube-controller-manager status... Running √
* ETCD status... Running √
* Nova scheduler status... Running √
Nova Control Plane is healthy √

Checking presence of Nova Custom Resource Definitions

* Cluster CRD presence... installed √
* Cluster kind-workload-1 connected and ready √
* Cluster kind-workload-2 connected and ready √
* SchedulePolicy CRD presence... installed √
* 0 SchedulePolicies defined ‼
please create at least one SchedulePolicy, otherwise Nova does not know where to run your workloads. SchedulePolicy spec: https://docs.elotl.co/nova/intro
* ScheduleGroup CRD presence... installed √
All Nova Custom Resource Definitions installed √

Uninstalling Nova

Uninstalling Nova Agent

Uninstalling the Nova agent from the workload cluster is as simple as deleting the agent resources we created in the installation steps:

kubectl --context=kind-workload-1 delete -k install/overlays/workload-cluster-1
kubectl --context=kind-workload-1 delete secret -n ${INSTALL_NAMESPACE} nova-cluster-init-kubeconfig
kubectl --context=kind-workload-1 delete ns ${INSTALL_NAMESPACE}
kubectl --context=kind-workload-2 delete -k install/overlays/workload-cluster-2
kubectl --context=kind-workload-2 delete secret -n ${INSTALL_NAMESPACE} nova-cluster-init-kubeconfig
kubectl --context=kind-workload-2 delete ns ${INSTALL_NAMESPACE}

Uninstalling Nova Control Plane

Uninstalling the Nova Control Plane from the hosting cluster is as simple as deleting the control plane resources we created in the installation steps:

kubectl --context=${HOSTING_CLUSTER_CONTEXT} delete -k install/base/control-plane/

Removing Nova API Server Service

Understand the impact before deleting this Service

Make sure that you know what you are doing before deleting this Service. Without API Server being exposed to the workload clusters and users, Nova Control Plane is not usable. You should delete it only if you don't intend to use Nova anymore.

kubectl --context=${HOSTING_CLUSTER_CONTEXT} delete -f "${PWD}/nova_certificates.yaml"
kubectl --context=${HOSTING_CLUSTER_CONTEXT} delete -f install/base/control-plane/apiserver.yaml -n ${INSTALL_NAMESPACE}
kubectl --context=${HOSTING_CLUSTER_CONTEXT} delete namespace ${INSTALL_NAMESPACE}