Skip to main content
Version: v0.6.0

Troubleshooting

This section details how to correct issues you may encounter when working with Nova. If you encounter an issue that is not listed here, please contact the Elotl team.

Installation

Diagnose with the novactl status CLI

Nova CLI has a diagnosing sub-command novactl status. The command runs checks to ensure that your Nova Control Plane is up and running and if Nova's CRDs are installed. After passing Nova's Control Plane hosting cluster kubeconfig to the command, you should see output such as:

Checking status of Nova Control Plane Components

* API Server status... Running √
* Kube-controller-manager status... Running √
* ETCD status... Running √
* Nova scheduler status... Running √
Nova Control Plane is healthy √

Checking presence of Nova Custom Resource Definitions

* Cluster CRD presence... installed √
* Cluster kind-workload-1 connected and ready √
* Cluster kind-workload-2 connected and ready √
* SchedulePolicy CRD presence... installed √
* 0 SchedulePolicies defined ‼
please create at least one SchedulePolicy, otherwise Nova does not know where to run your workloads. SchedulePolicy spec: https://docs.elotl.co
* ScheduleGroup CRD presence... installed √
All Nova Custom Resource Definitions installed √


If one of the components of Nova Control Plane is not running, Nova cannot function properly. All Nova's control plane components run in elotl namespace.

To debug further, get each component logs using the kubectl command:

$ kubectl logs -n elotl deploy/nova-scheduler
$ kubectl logs -n elotl deploy/apiserver
$ kubectl logs -n elotl deploy/kube-controller-manager
$ kubectl logs -n elotl statefulset/etcd

Your cluster does not appear in Nova Control Plane

If the Nova agent was successfully installed to the workload cluster, but the cluster does not show up in Nova Control Plane, do the following:

  1. Check that the Nova agent is up and running in the workload cluster. To do this, check the agent's logs:

    $ kubectl -n elotl deployment/nova-agent
  2. Confirm the Nova Control Plane API Server reachable from the workload cluste

Operations

My resources are in the Nova Control Plane, but not scheduled

Nova's scheduling process is a multi-step process. In the first step, Nova tries to find the matching SchedulePolicy when you create a new resource. If your resource is not scheduled, check if it was matched to a SchedulePolicy using following command:

$ kubectl get events --namespace=<resource-namespace> --field-selector=involvedObject.name=<resource-name>

If matched SchedulePolicy is found, Nova returns the kubernetes Event that object is matched too, for example:

$ kubectl get events --namespace=\<resource-namespace\> --field-selector=involvedObject.name=\<resource-name\>
16s Normal SchedulePolicyMatched \<resource-namespace\>/\<resource-name\> schedule policy \<policy-name\> will be used to determine target cluster

If the no SchedulePolicy was matched, please verify the following:

  • Ensure your resource's Kind is supported by Nova. The Nova introduction. lists the supported kinds.
  • Check you defined your SchedulePolicy with the correct namespaceSelector and resourceSelector.
  • If your resource is in one of the namespaces specified in SchedulePolicy's namespaceSelector. Cluster scope objects are matched only based on label selector.
  • Does your resource have labels that match SchedulePolicy's resourceSelectors?
  • Do your objects match more than one SchedulePolicy? In this case, Nova will sort SchedulePolicies in alphabetical order and use the first one.
  • If your resource is a namespace starting with kube- or elotl- these are are restricted namespaces and Nova ignores them.

Resources were created that match the SchedulePolicy but not workload cluster.

When your resource(s) are matched to the SchedulePolicy, but they don't transition into Running state, there may be a few reasons:

  • SchedulePolicy's has a clusterSelector that does not match any clusters. To see workload clusters connected to Nova run:

    $ kubectl get clusters --show-labels
To fix this

Compare the output it with your SchedulePolicy's .spec.clusterSelector. Then, edit the cluster selector so it matches one or more clusters.

  • SchedulePolicy has a clusterSelector matching cluster(s), but there is not enough capacity on those cluster nodes to run your resource(s).
To fix this

If this is a case and you were using group scheduling, please check following:

$ kubectl get events --namespace=<resource-namespace> >  --field-selector=involvedObject.name=<resource-name> >

Your resource should have an event saying:

added to ScheduleGroup <schedule-group-name> >  which contains objects > > with groupBy.labelKey <foo> > =<bar> >

Then, you can get the details on this ScheduleGroup, using:

$ kubectl describe schedulegroup <schedule-group-name> >

in the Events section there should be a line saying: Normal ScheduleGroupSyncedToWorkloadCluster 8s nova-scheduler Multiple clusters matching policy <policy-name> (empty cluster selector): <cluster-names>; group policy <schedule-group-name> does not fit in any cluster;

  • SchedulePolicy has a clusterSelector matching cluster(s), and there is enough capacity, but the workloads cannot be created because their namespace does not exist in the workload cluster.
To fix this

You can either create a namespace manually in the workload cluster (by running kubectl --context=workload-cluster-context create namespace <your-namespace> or schedule namespace object using Nova. Remember that Namespace is treated as any other resource, meaning that it needs to have labels matching desired SchedulePolicy's resourceSelector.

  • SchedulePolicy has a clusterSelector matching cluster(s), and there is enough capacity, but the nova-agent in the cluster is having issues.
To fix this this

You can grab logs from nova-agent in this cluster by running:

$ kubectl logs -n elotl deploy/nova-agent

and contact Elotl team. info@elotl.co

Nova supports automatic re-scheduling and it happens in these cases:

  • Pod(s) that was/were scheduled via Nova are Pending in the workload cluster, because there is insufficient capacity in the cluster.
note

This state does not occur if you scheduled Deployment or any other pod controller via Nova.

  • Deployment that was scheduled via Nova has the following condition:

    conditions:
    - lastTransitionTime: <does not matter>
    lastUpdateTime: <does not matter>
    message: Deployment does not have minimum availability.
    reason: MinimumReplicasUnavailable
    status: "False"
    type: Available
  • If you defined your SchedulePolicy with groupBy settings, Nova will schedule entire ScheduleGroup at once. If one of the deployments in this group has either of the preceding conditions, Nova reschedules the whole ScheduleGroup. In this case, Nova sends the following kubernetes Event for ScheduleGroup:

    $ kubectl describe schedulegroup <my-group-name>
    ...
    Events:
    Type Reason Age From Message
    ---- ------ ---- ---- -------
    Warning ReschedulingTriggered 3s nova-agent deployment default/nginx-group-5 does not have minimum replicas available