Skip to main content
Version: v1.1

Installing Nova Control Plane in Disaster Recovery Mode

Nova, when run in production, should be set up so that it can be resilient to regional failures. As Nova is a Kubernetes application in itself, it can work with your enterprise's DR solution for Kubernetes applications.

In this section, we illustrate how Nova can be deployed in an EKS hosting cluster on AWS and use the open-source tool Velero for DR. Please note that this procedure is specific to EKS. If you will be using Nova's DR mode on another cloud provider, please contact us at info@elotl.co and we will be happy to support you.

Installing Nova in DR Mode

These are the high-level steps to setup and operate Nova in DR mode:

  1. Create the Nova Primary and Standby hosting clusters in 2 different regions.

  2. Install Velero and configure it on both the primary and standby clusters - This will include setting up the necessary AWS IAM roles, service accounts and helm charts of Velero[1]. Any other Kubernetes backup tool can also be used to periodically backup stateful Nova control plane components.

  3. Reserve static IPs for the API service component of the Nova primary and standby control plane. On EKS we will be using Elastic IPs [2].

    1. Install Nova Control plane components on the hosting clusters. Nova can be installed via two methods:

      • Novactl command-line install

      • Advanced Install using manifests

        For Nova setup in DR mode we will be using the advanced install, which will allow us to configure the API server’s IP address as well as customize the certificates used by Nova.

        i) Create the API server service on Primary and Standby. The apiserver manifest file needs to be modified to include the static IPs reserved in Step 3.

        ii) Generate certificates for Nova Primary and Standby. The novactl install certs command provides an additional flag to include the Elastic IPs of both the primary and standby service.

        iii) Continue to deploy the remaining Nova control plane components using the typical manifest-based installation procedure on both the Nova primary and standby hosting cluster.

The figure below shows Nova operating under normal conditions in DR mode.

Nova DR mode

Standby Promotion during Disaster

These are the high-level steps to be followed during a disaster/failure of the Nova primary hosting cluster:

  1. Use Velero to run a “restore” operation on the Nova standby. This is used to restore the etcd statefulset PVs from the backup S3 bucket into the control plane.
  2. Reconnect workload clusters to Nova Standby. This is invoked by deleting the workload cluster secrets and service-accounts on the control plane. New secrets will then be regenerated by the cluster registration controller. The Nova init-kubeconfig secret in the workload clusters should also be updated to the Nova Standby’s value, thereby allowing the workload clusters to communicate with the new control plane. Follow procedure in this section: Install Nova agent

The figure below shows Nova operating after standby promotion.

Nova Standby Promotion

  1. Installing Velero on your EKS cluster
  2. Elastic IPs in AWS