Skip to main content
Version: v1.2

Release Notes

1.2.9

This release fixes minor issues, adds a label with the packing mode to node objects, and adds support for Artifact Registry Image Streaming with GKE.

Common

Adds a new label to node objects: node.elotl.co/mode with values bin-packing or bin-selection. This label allows operators to determine the packing mode for the node.

Fix pod retry delay when node pool scale-up operation fails. If a scale-up operation fails, the pod was ignored until the timeout podRetryPeriod expired. Now if the node creation failed, the pod will be retried at the next iteration.

Fix eviction path and add extra logging to avoid a rare case where a node with low utilization would get cordoned and uncordoned repeatedly.

Google GKE

Add support for Artifact Registry Image Streaming image streaming, which can significantly reduce image loading time. Luna detects if Image Streaming is enabled on the cluster and takes the extra memory overhead associated with it into account when calculating the available memory on the nodes it allocates.

1.2.8

This release fixes minor issues and introduces Bottlerocket support for AWS.

Common

Fixes node over-allocation for zone-spread StatefulSets with bound PVC placement. Luna over-allocated bin-selected nodes during the initial zone-spread placement of the StatefulSet pods.

Fixes out-of-capacity error handling for bin-selection.

Allows critical pods like CoreDNS to run on Luna-managed nodes even when these pods aren’t handled by Luna. This prevents resource exhaustion of non-Luna nodes when CoreDNS is scaled up.

Optimizes API calls to the control plane by using PATCH operations instead of UPDATE when modifying pods and nodes.

Reduces Luna manager’s memory consumption by 5% to 10%.

Updates Kubernetes’ client-go dependency to 0.29.10.

Amazon EKS

Adds the option aws.isBottlerocketImage. When enabled, Luna handles node initialization and userData with the TOML format expected by the Bottlerocket images.

1.2.7

This release adds disk options for AKS nodes, allowing operators to better control the OS disk and use Ultra SSD volumes. It also improves GKE support with better IAM role configuration and node overhead calculation.

Common

The deployment script now has a --skip-cert-manager option to skip the detection and auto-installation of cert-manager. Note that if cert-manager isn’t properly installed in the cluster when using this option, the Luna installation will fail.

For cloud providers on which zone affinity and zone spread are supported (GKE and EKS), handle zone-spread pods for which zone affinity is inherited from a bound PVC.

Update recommended NVIDIA driver version and update the cuda vector add examples to use the latest image from NVIDIA.

Google GKE

Luna now requires fewer permissions. Luna manager no longer uses roles/container.developer, roles/compute.instanceAdmin.v1, or roles/container.admin; instead, it uses roles/iam.serviceAccountUser and a custom role created via the deployment script.

Microsoft AKS

Luna supports new disk options for AKS nodes:

1.2.5

This release enhances Luna with improved support for various cloud providers and addresses several minor issues.

Google GKE

Added support for additional raw block devices. These options allow you to build node-level caches for pods running on the nodes.

You can specify a number of NVMe or Ephemeral raw block devices to attach to each node.

For bin-selected nodes, you can also specify node-specific options with pod annotations.

Microsoft Azure

Recognize new node types with A10 Nvidia GPUs: Standard_NC, Standard_ND, Standard_NV36, Standard_NV72, and instance types with the v5 suffix.

Oracle OKE

The resource requirements for running the Kubernetes system on nodes have increased. This release updates nodes’ estimated available resources based on those new requirements.

Fixed a crash with Luna manager during node deletion failures.

1.2.4

This release officially transitions OKE support from Beta to GA.

Amazon EKS

The release updates the AWS instance types and prices, adding the G6 series of GPU systems and the R8G series of Graviton4-based ARM systems.

Oracle OCI

The release updates the OCI instance types and prices, adding the VM.GPU.A10.1 and VM.GPU.A10.2 GPU systems and the VM.Standard.A2.Flex Ampere ARM systems.

The release adds support for Spot pricing, using OCI preemptible instances.

1.2.3

This release significantly enhances Luna's security and stability. We audited Luna and fixed the critical and high vulnerabilities reported by various scanning tools. We added automated security scanning and reporting around Luna to ensure we ship a secure product.

General

Our installation script deploy.sh now requires cmctl to be installed. We have also updated the version of cert-manager installed by the script to 1.15.2.

The telemetry option is now false by default.

The default CPU and memory requests and limits have been set. See the default Helm values for more information.

Added support for requests and limits ephemeral-storage to manager and webhook:

manager:
requests:
ephemeral-storage: 500Mi
limits:
ephemeral-storage: 1Gi

Added NetworkPolicy for both manager and webhook pods. You can configure the ingress and egress rules:

manager:
networkPolicy:
ingress:
- ports:
- protocol: TCP
port: 9090
egress:
- ports:
- protocol: TCP
port: 8443

Added pod disruption budget for webhook pods to ensure there's always 1 pod available when upgrading or restarting.

We fixed various minor bugs with Luna.

There were also various improvements to the documentation like the Hot Node Mitigation Tutorial.

Microsoft AKS

Password authentication support was removed, and the azure.Username & azure.Password helm values have also been removed. We also added documentation about Spot usage with AKS to make it easier to get started.

Google GKE

GKE now supports Spot pricing.

1.2.2

This release includes significant improvements and bug fixes for Microsoft AKS and Oracle OCI platforms, as well as general enhancements to logging and performance.

Microsoft AKS

The nodeTags option is now supported on AKS. You can now add custom tags to Azure VMs managed by Luna.

Oracle OCI

Bin packing nodepool configuration matching is now correctly handled; a configuration mismatch will result in the creation of a new nodepool with the new configuration.

Fixed a bug where Luna would "overshoot" the target number of nodes when scaling down a nodepool because of a race condition with the API. Luna now scales down nodepools with a single API call, eliminating this race condition.

General

Better logging for instance selection, previously covered pods, and node utilization.

1.2.1

Google GKE

Added gcp.diskSizeGb to specify the disk size for GKE nodes.

Implemented support for instance out-of-stock error handling.

Webhook

Luna uses pod CPU and memory resources requests for key elements of its operation. Therefore warnings are logged when bin-selected pods don’t have full resource requirements specified.

General

Add max pods per node support for all cloud providers. See the documentation for details.

Luna now features management of over-utilized bin-packed nodes. Users can configure Luna to distribute CPU and/or memory usage across multiple nodes by evicting the overburdening pod(s). For details, see the BinPacking section in the documentation.

1.2.0

Amazon EKS

aws.maxPodsPerNode has been deprecated and replaced with binPackingMaxPodsPerNode and binSelectMaxPodsPerNode.

Google GKE

gcp.maxPodsPerNode has been deprecated and replaced with binPackingMaxPodsPerNode and binSelectMaxPodsPerNode.

Oracle OCI

oci.maxPodsPerNode has been deprecated and replaced with binPackingMaxPodsPerNode and binSelectMaxPodsPerNode.

General

We have unified the management of maximum pods per node parameter across cloud providers by replacing the provider-specific options (*.maxPodsPerNode) with the new binPackingMaxPodsPerNode and binSelectMaxPodsPerNode parameteRs/index. Additionally, this update introduces support for configuring Max Pods Per Node on Azure, bringing consistency to all cloud environments we support.

1.1.0

Google GKE

  • Luna can now configure auto-upgrade, auto-repair, secure boot, and integrity monitoring for its node pool.

General

We have updated the Kubernetes client libraries to version 0.29.1. This update inadvertently broke the computation of the reuse hash value. We have since fixed this issue in Luna to prevent such regressions in the future. However, upgrading from versions prior to 1.0.1 may result in bin-selected node pools not being reused. See the upgrade page for more details.

1.0.1

Fixed an issue with the trial version of Luna where images were being pulled from the incorrect Docker repository, causing installation failures. The regular versions were not affected.

1.0.0

Amazon EKS

  • Fix EKS zone spread node overshoot. Too many nodes were created once the node(s) required for zone spread expired. Now the correct number of nodes is maintained during node expiration.

Google GKE

  • Luna now avoids triggering node garbage collection on locked node pools, and logs a warning instead of an error when it attempts to operate a locked node pool.
  • Node pool garbage collection now includes orphaned instance templates. No need to clean these manually anymore.
  • Support network tags, and GCE instance metadata. The Helm values gcp.networkTags and gcp.gceInstanceMetadata can be used to customize these configuration options.
  • Fix unnecessary creation of new node pool when there are minor differences in the nodes metadata.

General bug fixes

  • Fix annotation when nodes are successfully drained.
  • Set minimum value for non-zero nodeTTL to prevent unnecessary node churn. The minimum value for scaleDown.nodeTTL is now scaleUpTimeout.