Highly available Installation
Overview
Purpose
This guide provides step-by-step instructions for installing Nova, a control plane and agent system designed to manage multiple Kubernetes clusters. By following this guide, you will set up the Nova Control Plane on a hosting Kubernetes cluster and deploy Nova Agents to workload clusters.
Scope
This guide covers:
- Prerequisites: Requirements before installing Nova.
- Installing novactl: How to download and set up the Nova CLI.
- Deploying Nova: Instructions for deploying the Nova Control Plane and Agents.
- Post-Installation Checks: Verifying the installation.
- Uninstalling Nova: Steps to remove Nova if needed.
Key Concepts
- Nova Control Plane: The central management unit running on a hosting Kubernetes cluster.
- Nova Agent: The component deployed to each workload cluster for management.
- novactl: The command-line interface (CLI) for installing, uninstalling and checking the status of a Nova deployment.
- Workload Cluster: A Kubernetes cluster managed by the Nova Control Plane.
- Hosting Cluster: A Kubernetes cluster where the Nova Control Plane runs.
Prerequisites
- At least 2 Kubernetes clusters up and running. One cluster will be the hosting cluster where nova control plane runs. Other clusters will be workload clusters that are managed by the nova control plane.
- Installed and configured
kubectl
- Nova cannot be deployed to an Autopilot GKE cluster. Please validate that you are deploying to a non-Autopilot cluster.
- Cluster hosting Nova Control Plane MUST have storage provisioner and default
StorageClass
configured. Nova Control Plane uses [etcd] as a backing store, which runs asStatefulSet
and requiresPersistentVolume
to work. - Cluster hosting Nova Control Plane MUST have ingress controller configured. Nova API Server is exposed as a LoadBalancer type Kubernetes Service and it needs to get an IP address or domain that will be reachable from the Nova Agent in the workload cluster, as well as human users interacting with Nova Control Plane.
Kubernetes compatibility
Nova Version | Kubernetes Versions Supported |
---|---|
v0.7 | v1.25, v1.26, v1.27, v1.28 |
v0.6 | v1.24, v1.25 |
Installation steps
- Download and install
novactl
- Prepare hosting cluster
- Create API Server load balancer service.
- Create certificates for Nova Control Plane
- Install Control Plane Components
- Get Nova Control Plane kubeconfig
- Install Nova CRDs
- Verify your installation
- Connect workload clusters
Download novactl
novactl
is our CLI that allows you to easily create new Nova Control Planes, register new Nova Workload Clusters, check the health of your Nova cluster, and more!
If you don't have the release tarball then in order to download the latest novactl
version for your OS, run:
curl -s https://api.github.com/repos/elotl/novactl/releases/latest | \
jq -r '.assets[].browser_download_url' | \
grep "$(uname -s | tr '[:upper:]' '[:lower:]')-$(uname -m | sed 's/x86_64/amd64/;s/i386/386/;s/aarch64/arm64/')" | \
xargs -I {} curl -L {} -o novactl
Install novactl
Make the binary executable
Once you have the binary, run:
chmod +x novactl*
Place the binary in your PATH
The following is an example to install the plugin in /usr/local/bin
for Unix-like operating systems:
sudo mv novactl* /usr/local/bin/novactl
If you accidentally downloadedmore than one novactl binary, please move only the binary that corresponds to the OS and ARCH of your machine to the /usr/local/bin location.
Install it as kubectl plugin
novactl
is ready to work as kubectl plugin. Our docs assume you're using novactl
as kubectl plugin. To make this work, simply run:
sudo novactl kubectl-install
And test if it works:
kubectl nova --version
kubectl-nova version v0.7.1 (git: a97586b5) built: 20231102171119
Upgrading novactl
If you want to upgrade novactl
to latest version, it's enough to run all previous steps starting from Download novactl
up to this point again. This will automatically download latest version and replace your local binary with it.
Preparing hosting cluster
Cluster hosting Nova Control Plane MUST have storage provisioner and default StorageClass
configured. Nova Control Plane uses [etcd] as a backing store, which runs as StatefulSet
and requires PersistentVolume
to work.
Cluster hosting Nova Control Plane MUST have ingress controller configured.. Nova API Server is exposed as a LoadBalancer type Kubernetes Service and it needs to get an IP address or domain that will be reachable from the Nova Agent in the workload cluster, as well as human users interacting with Nova Control Plane.
Creating API Server
By default, we will install everything in the elotl
namespace, you can modify it if you need.
Please export the name of the hosting cluster kube context.
export INSTALL_NAMESPACE=elotl
export HOSTING_CLUSTER_CONTEXT=kind-cp
kubectl --context=${HOSTING_CLUSTER_CONTEXT} create namespace ${INSTALL_NAMESPACE}
kubectl --context=${HOSTING_CLUSTER_CONTEXT} create -f install/base/control-plane/apiserver.yaml -n ${INSTALL_NAMESPACE}
then, we need to wait for external IP being allocated. If it's still Pending after a few minutes, you should check whether your cluster has a working Ingress controller as mentioned in the hosting cluster Prerequisites.
kubectl --context=${HOSTING_CLUSTER_CONTEXT} wait -n ${INSTALL_NAMESPACE} service/apiserver --for=jsonpath='{.status.loadBalancer.ingress[0].ip}' --timeout=360s
If this command fails, it's very likely that API Server got hostname (instead of external IP) allocated. You can check it using following command:
kubectl --context=${HOSTING_CLUSTER_CONTEXT} wait -n ${INSTALL_NAMESPACE} service/apiserver --for=jsonpath='{.status.loadBalancer.ingress[0].hostname}' --timeout=360s
Configure domain name record for API Server serviceWe strongly recommend configuring a domain name record or at least external static IP address for this Service.
This API Server service will be an entrypoint to the Nova Control Plane, for workload clusters, as well as human users.
Understand the impact before deleting this ServiceMake sure that you know what you are doing before deleting this Service. Without API Server being exposed to the workload clusters and users, Nova Control Plane is not usable. You should delete it only if you don't intend to use Nova anymore.
Generating certificates for Nova Control Plane
Nova Control Plane runs similar components as a regular Kubernetes cluster, e.g.: apiserver, kube-controller-manager, nova-scheduler and key-value store. To secure communications between components, Nova Control Plane needs to create a set of certificates, similar to the kubeadm init certs phase. These certificates are then mounted from kubernetes secrets to the control plane components.
We provide kubectl nova
subcommand to generate certificates and the secret manifests in correct format. We will provide the instructions how to do this using kubeadm
in the future.
For now, you can use install certs
subcommand. The command needs API Server IP as an input and the namespace where API Server will be installed (default is elotl
).
apiserver_ip=$(kubectl --context=${HOSTING_CLUSTER_CONTEXT} get -n ${INSTALL_NAMESPACE} service/apiserver -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
kubectl nova --context=${HOSTING_CLUSTER_CONTEXT} install certs --apiserver-public-endpoint="${apiserver_ip}" --namespace=${INSTALL_NAMESPACE} --nova-node-ip="${apiserver_ip}" > "${PWD}/nova_certificates.yaml"
If this command fails, it's likely that your API Server service got exposed with hostname, not external IP. Then, use this command instead:
apiserver_ip=$(kubectl --context=${HOSTING_CLUSTER_CONTEXT} get -n ${INSTALL_NAMESPACE} service/apiserver -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
test -n "${apiserver_ip}"
kubectl nova --context=${HOSTING_CLUSTER_CONTEXT} install certs --apiserver-public-endpoint="${apiserver_ip}" --namespace=${INSTALL_NAMESPACE} --nova-node-ip="${apiserver_ip}" > "${PWD}/nova_certificates.yaml"
Then, we can create those Secrets in the hosting cluster:
kubectl --context=${HOSTING_CLUSTER_CONTEXT} create -f "${PWD}/nova_certificates.yaml"
By default, these certificates expire in 10 years from the generation. They can be rotated by re-generating the secrets, applying them to the hosting cluster and restarting Nova Control Plane components.
Install Control Plane Components
We will use kustomize overlay, which changes a number of replicas for each component to 3 and adds the pod anti affinity, to ensure that pods of each component are distributed across 3 nodes.
We will use topology.kubernetes.io/zone
as a topologyKey
in the podAntiAffinity
. If you want to use another node label as a topology key, please edit files in install/overlays/cp-ha
.
kubectl --context=${HOSTING_CLUSTER_CONTEXT} apply -k install/overlays/cp-ha -n ${INSTALL_NAMESPACE}
It might take a while, but eventually apiserver
, kube-controller-manager
, nova-scheduler
and etcd
should get ready and available:
kubectl --context=${HOSTING_CLUSTER_CONTEXT} wait pod --for=jsonpath='{.status.phase}'=Running -n ${INSTALL_NAMESPACE} -l app=etcd --timeout=360s
kubectl --context=${HOSTING_CLUSTER_CONTEXT} wait pod --for=jsonpath='{.status.phase}'=Running -n ${INSTALL_NAMESPACE} -l component=apiserver --timeout=180s
kubectl --context=${HOSTING_CLUSTER_CONTEXT} wait pod --for=jsonpath='{.status.phase}'=Running -n ${INSTALL_NAMESPACE} -l component=controller-manager --timeout=180s
kubectl --context=${HOSTING_CLUSTER_CONTEXT} wait pod --for=jsonpath='{.status.phase}'=Running -n ${INSTALL_NAMESPACE} -l component=nova-scheduler --timeout=180s
If all these conditions are met, we can proceed with Control Plane configuration.
Troubleshooting error: timed out waiting for the condition on pods/etcd-0ETCD pods will not start if the storage provisioner is not configured for a hosting cluster (as mentioned in the Preparing hosting cluster section above). You can also take a look at troubleshooting section for this issue
Get Nova KubeConfig
To get Nova kubeconfig, we need to be sure that Nova API Server Endpoint is responding. Nova Control Plane will create a "NovaAPIServerEndpointReady" event once it's ready.
kubectl --context=${HOSTING_CLUSTER_CONTEXT} wait -n ${INSTALL_NAMESPACE} '--for=jsonpath={.reason}'=NovaAPIServerEndpointReady event/NovaAPIServerEndpointReady --timeout=360s
We will use NovaCLI as kubectl plugin to generate kubeconfig.
kubectl nova --context=${HOSTING_CLUSTER_CONTEXT} get kubeconfig -n ${INSTALL_NAMESPACE} > "${PWD}/nova_kubeconfig.yaml"
cat "${PWD}/nova_kubeconfig.yaml"
you can examine the contents of the file. It should look similar to:
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: "LS0tLS1...LS0K"
server: "https://172...200"
name: nova
contexts:
- context:
cluster: nova
user: nova-admin
name: nova
current-context: nova
kind: Config
preferences: {}
users:
- name: nova-admin
user:
client-certificate-data: "LS0tLS1CRUd...LS0tLQo="
client-key-data: "LS0tLS1CRUdJ...tLS0tLQo="
Install Nova CRDs
Once we can talk to the Nova API Server, we can install Nova CRDs. These are need for a proper Nova functioning.
export KUBECONFIG=${PWD}/nova_kubeconfig.yaml:${HOME}/.kube/config:${KUBECONFIG}
kubectl config get-contexts
kubectl --context=nova create -f install/base/control-plane/nova_crds.yaml
Nova Control Plane component will run a set of checks at the startup, and once they all pass, it will create NovaControlPlaneReady
event.
You can use kubectl
to wait for this event:
kubectl --context=${HOSTING_CLUSTER_CONTEXT} wait -n ${INSTALL_NAMESPACE} '--for=jsonpath={.reason}'=NovaControlPlaneReady event/NovaControlPlaneReady --timeout=540s
At this point, we have Nova Control Plane up and running. To verify it, we can check the status of the Nova Control Plane:
kubectl nova --context=nova --hosting-cluster-context=${HOSTING_CLUSTER_CONTEXT} --hosting-cluster-nova-namespace=${INSTALL_NAMESPACE} status
Checking status of Nova Control Plane Components
* API Server status... Running √
* Kube-controller-manager status... Running √
* ETCD status... Running √
* Nova scheduler status... Running √
Nova Control Plane is healthy √
Checking presence of Nova Custom Resource Definitions
* Cluster CRD presence... installed √
0 workload clusters connected ‼
please connect at least one Cluster, otherwise Nova does not have a target cluster to run your workloads. Connecting clusters can be done by running novactl install agent <cluster-name> in correct Kube context.
* SchedulePolicy CRD presence... installed √
* 0 SchedulePolicies defined ‼
please create at least one SchedulePolicy, otherwise Nova does not know where to run your workloads. SchedulePolicy spec: https://docs.elotl.co/nova/intro
* ScheduleGroup CRD presence... installed √
All Nova Custom Resource Definitions installed √
To schedule any workloads via Nova we need to connect workload clusters. Let's do it.
Connect workload clusters
Install nova agent into workload cluster
Each workload cluster needs a Nova agent. The Nova agent will be deployed by default to the elotl namespace. Before deploying Nova agent, you need to ensure that the Nova's init-kubeconfig is present in the elotl namespace. Nova's init-kubeconfig provides a kube config to the Nova Control Plane. This kube config is used by Nova agent in the workload cluster to connect and register itself as a workload cluster in the Nova Control Plane.
export INSTALL_NAMESPACE=elotl
Let's create the namespace first:
kubectl --context=kind-workload-1 create namespace ${INSTALL_NAMESPACE}
and copy init-kubeconfig from Nova Control Plane to workload cluster:
kubectl --context=nova get secret -n ${INSTALL_NAMESPACE} nova-cluster-init-kubeconfig -o yaml | kubectl --context=kind-workload-1 apply -f -
To connect a workload cluster to Nova, we will use kubectl
apply with the kustomize overlay.
By default, we ship an overlay for two workload clusters, named kind-workload-1
and kind-workload-2
, respectively in install/overlays/
.
To name your workload cluster differently, you can modify the contents of the install/overlays/workload-cluster-1/nova_agent.yaml
or install/overlays/workload-cluster-2/nova_agent.yaml
(or copy the entire directory and rename it).
In nova_agent.yaml
, you need to change the line --cluster-name=kind-workload-2
and replace kind-workload-2
/kind-workload-1
with your desired workload cluster name.
Open install/base/agent/kustomization.yaml
in text editor and set namespace:
to the namespace you chose and exported as $INSTALL_NAMESPACE
.
You can also do it using sed
:
sed "s/namespace: elotl/namespace: ${INSTALL_NAMESPACE}/" install/base/agent/kustomization.yaml > temp_file && mv temp_file install/base/agent/kustomization.yaml
Next step is creating the agent in the workload cluster context:
kubectl --context=kind-workload-1 apply -k install/overlays/workload-cluster-1
Now lets check if that worked! Simply run:
kubectl get --context=nova clusters
Remember to update path to your Nova Control Plane kubeconfig
NAME K8S-VERSION K8S-CLUSTER REGION ZONE READY IDLE STANDBY
kind-workload-1 1.25 workload-1 True True False
What if I don't see my workload cluster listed?
If agent install finished without issues and your cluster is not showing up in Nova Control Plane, something went wrong during agent registration process. Run the following command to get agent logs:
kubectl logs --context nova-example-agent-1 -n elotl deployment/nova-agent
And start debuging from there!
Install other workload clusters
If you have a second cluster, run the same commands with a different cluster and cluster name, e.g.,
kubectl --context=kind-workload-2 create namespace ${INSTALL_NAMESPACE}
kubectl --context=nova get secret -n ${INSTALL_NAMESPACE} nova-cluster-init-kubeconfig -o yaml | kubectl --context=kind-workload-2 apply -f -
kubectl --context=kind-workload-2 apply -k install/overlays/workload-cluster-2
Verify your installation
You can use novactl
status subcommand to examine the state of the Nova Control Plane:
kubectl nova --context=nova --hosting-cluster-context=${HOSTING_CLUSTER_CONTEXT} --hosting-cluster-nova-namespace=elotl status
Checking status of Nova Control Plane Components
* API Server status... Running √
* Kube-controller-manager status... Running √
* ETCD status... Running √
* Nova scheduler status... Running √
Nova Control Plane is healthy √
Checking presence of Nova Custom Resource Definitions
* Cluster CRD presence... installed √
* Cluster kind-workload-1 connected and ready √
* Cluster kind-workload-2 connected and ready √
* SchedulePolicy CRD presence... installed √
* 0 SchedulePolicies defined ‼
please create at least one SchedulePolicy, otherwise Nova does not know where to run your workloads. SchedulePolicy spec: https://docs.elotl.co/nova/intro
* ScheduleGroup CRD presence... installed √
All Nova Custom Resource Definitions installed √
Uninstalling Nova
Uninstalling Nova Agent
Uninstalling the Nova agent from the workload cluster is as simple as deleting the agent resources we created in the installation steps:
kubectl --context=kind-workload-1 delete -k install/overlays/workload-cluster-1
kubectl --context=kind-workload-1 delete secret -n ${INSTALL_NAMESPACE} nova-cluster-init-kubeconfig
kubectl --context=kind-workload-1 delete ns ${INSTALL_NAMESPACE}
kubectl --context=kind-workload-2 delete -k install/overlays/workload-cluster-2
kubectl --context=kind-workload-2 delete secret -n ${INSTALL_NAMESPACE} nova-cluster-init-kubeconfig
kubectl --context=kind-workload-2 delete ns ${INSTALL_NAMESPACE}
Uninstalling Nova Control Plane
Uninstalling the Nova Control Plane from the hosting cluster is as simple as deleting the control plane resources we created in the installation steps:
kubectl --context=${HOSTING_CLUSTER_CONTEXT} delete -k install/base/control-plane/
Removing Nova API Server Service
Understand the impact before deleting this ServiceMake sure that you know what you are doing before deleting this Service. Without API Server being exposed to the workload clusters and users, Nova Control Plane is not usable. You should delete it only if you don't intend to use Nova anymore.
kubectl --context=${HOSTING_CLUSTER_CONTEXT} delete -f "${PWD}/nova_certificates.yaml"
kubectl --context=${HOSTING_CLUSTER_CONTEXT} delete -f install/base/control-plane/apiserver.yaml -n ${INSTALL_NAMESPACE}
kubectl --context=${HOSTING_CLUSTER_CONTEXT} delete namespace ${INSTALL_NAMESPACE}