Release Notes
1.2.8
This release fixes minor issues and introduces Bottlerocket support for AWS.
Common
Fixes node over-allocation for zone-spread StatefulSets with bound PVC placement. Luna over-allocated bin-selected nodes during the initial zone-spread placement of the StatefulSet pods.
Fixes out-of-capacity error handling for bin-selection.
Allows critical pods like CoreDNS to run on Luna-managed nodes even when these pods aren’t handled by Luna. This prevents resource exhaustion of non-Luna nodes when CoreDNS is scaled up.
Optimizes API calls to the control plane by using PATCH operations instead of UPDATE when modifying pods and nodes.
Reduces Luna manager’s memory consumption by 5% to 10%.
Updates Kubernetes’ client-go dependency to 0.29.10.
Amazon EKS
Adds the option aws.isBottlerocketImage
. When enabled, Luna handles node initialization and userData with the TOML format expected by the Bottlerocket images.
1.2.7
This release adds disk options for AKS nodes, allowing operators to better control the OS disk and use Ultra SSD volumes. It also improves GKE support with better IAM role configuration and node overhead calculation.
Common
The deployment script now has a --skip-cert-manager
option to skip the detection and auto-installation of cert-manager. Note that if cert-manager isn’t properly installed in the cluster when using this option, the Luna installation will fail.
For cloud providers on which zone affinity and zone spread are supported (GKE and EKS), handle zone-spread pods for which zone affinity is inherited from a bound PVC.
Update recommended NVIDIA driver version and update the cuda vector add examples to use the latest image from NVIDIA.
Google GKE
Luna now requires fewer permissions. Luna manager no longer uses roles/container.developer
, roles/compute.instanceAdmin.v1
, or roles/container.admin
; instead, it uses roles/iam.serviceAccountUser
and a custom role created via the deployment script.
Microsoft AKS
Luna supports new disk options for AKS nodes:
azure.opportunisticEphemeralOSDiskSizeGB
, it replaces the now deprecated optionazure.useEphemeralOsDisk
.azure.enableUltraSSD
set to false by defaultazure.kubeletDiskType
, it can be set toOS
orTemporary
azure.osDiskType
, it can be set toEphemeral
orManaged
azure.osDiskSizeGB
1.2.5
This release enhances Luna with improved support for various cloud providers and addresses several minor issues.
Google GKE
Added support for additional raw block devices. These options allow you to build node-level caches for pods running on the nodes.
You can specify a number of NVMe or Ephemeral raw block devices to attach to each node.
For bin-selected nodes, you can also specify node-specific options with pod annotations.
Microsoft Azure
Recognize new node types with A10 Nvidia GPUs: Standard_NC, Standard_ND, Standard_NV36, Standard_NV72, and instance types with the v5 suffix.
Oracle OKE
The resource requirements for running the Kubernetes system on nodes have increased. This release updates nodes’ estimated available resources based on those new requirements.
Fixed a crash with Luna manager during node deletion failures.
1.2.4
This release officially transitions OKE support from Beta to GA.
Amazon EKS
The release updates the AWS instance types and prices, adding the G6 series of GPU systems and the R8G series of Graviton4-based ARM systems.
Oracle OCI
The release updates the OCI instance types and prices, adding the VM.GPU.A10.1 and VM.GPU.A10.2 GPU systems and the VM.Standard.A2.Flex Ampere ARM systems.
The release adds support for Spot pricing, using OCI preemptible instances.
1.2.3
This release significantly enhances Luna's security and stability. We audited Luna and fixed the critical and high vulnerabilities reported by various scanning tools. We added automated security scanning and reporting around Luna to ensure we ship a secure product.
General
Our installation script deploy.sh
now requires cmctl to be installed. We have also updated the version of cert-manager installed by the script to 1.15.2.
The telemetry option is now false by default.
The default CPU and memory requests and limits have been set. See the default Helm values for more information.
Added support for requests and limits ephemeral-storage
to manager and webhook:
manager:
requests:
ephemeral-storage: 500Mi
limits:
ephemeral-storage: 1Gi
Added NetworkPolicy for both manager and webhook pods. You can configure the ingress
and egress
rules:
manager:
networkPolicy:
ingress:
- ports:
- protocol: TCP
port: 9090
egress:
- ports:
- protocol: TCP
port: 8443
Added pod disruption budget for webhook pods to ensure there's always 1 pod available when upgrading or restarting.
We fixed various minor bugs with Luna.
There were also various improvements to the documentation like the Hot Node Mitigation Tutorial.
Microsoft AKS
Password authentication support was removed, and the azure.Username & azure.Password helm values have also been removed. We also added documentation about Spot usage with AKS to make it easier to get started.
Google GKE
GKE now supports Spot pricing.
1.2.2
This release includes significant improvements and bug fixes for Microsoft AKS and Oracle OCI platforms, as well as general enhancements to logging and performance.
Microsoft AKS
The nodeTags option is now supported on AKS. You can now add custom tags to Azure VMs managed by Luna.
Oracle OCI
Bin packing nodepool configuration matching is now correctly handled; a configuration mismatch will result in the creation of a new nodepool with the new configuration.
Fixed a bug where Luna would "overshoot" the target number of nodes when scaling down a nodepool because of a race condition with the API. Luna now scales down nodepools with a single API call, eliminating this race condition.
General
Better logging for instance selection, previously covered pods, and node utilization.
1.2.1
Google GKE
Added gcp.diskSizeGb to specify the disk size for GKE nodes.
Implemented support for instance out-of-stock error handling.
Webhook
Luna uses pod CPU and memory resources requests for key elements of its operation. Therefore warnings are logged when bin-selected pods don’t have full resource requirements specified.
General
Add max pods per node support for all cloud providers. See the documentation for details.
Luna now features management of over-utilized bin-packed nodes. Users can configure Luna to distribute CPU and/or memory usage across multiple nodes by evicting the overburdening pod(s). For details, see the BinPacking section in the documentation.
1.2.0
Amazon EKS
aws.maxPodsPerNode
has been deprecated and replaced with binPackingMaxPodsPerNode
and binSelectMaxPodsPerNode
.
Google GKE
gcp.maxPodsPerNode
has been deprecated and replaced with binPackingMaxPodsPerNode
and binSelectMaxPodsPerNode
.
Oracle OCI
oci.maxPodsPerNode
has been deprecated and replaced with binPackingMaxPodsPerNode
and binSelectMaxPodsPerNode
.
General
We have unified the management of maximum pods per node parameter across cloud providers by replacing the provider-specific options (*.maxPodsPerNode
) with the new binPackingMaxPodsPerNode
and binSelectMaxPodsPerNode
parameteRs/index. Additionally, this update introduces support for configuring Max Pods Per Node on Azure, bringing consistency to all cloud environments we support.
1.1.0
Google GKE
- Luna can now configure auto-upgrade, auto-repair, secure boot, and integrity monitoring for its node pool.
General
We have updated the Kubernetes client libraries to version 0.29.1. This update inadvertently broke the computation of the reuse hash value. We have since fixed this issue in Luna to prevent such regressions in the future. However, upgrading from versions prior to 1.0.1 may result in bin-selected node pools not being reused. See the upgrade page for more details.
1.0.1
Fixed an issue with the trial version of Luna where images were being pulled from the incorrect Docker repository, causing installation failures. The regular versions were not affected.
1.0.0
Amazon EKS
- Fix EKS zone spread node overshoot. Too many nodes were created once the node(s) required for zone spread expired. Now the correct number of nodes is maintained during node expiration.
Google GKE
- Luna now avoids triggering node garbage collection on locked node pools, and logs a warning instead of an error when it attempts to operate a locked node pool.
- Node pool garbage collection now includes orphaned instance templates. No need to clean these manually anymore.
- Support network tags, and GCE instance metadata. The Helm values
gcp.networkTags
andgcp.gceInstanceMetadata
can be used to customize these configuration options. - Fix unnecessary creation of new node pool when there are minor differences in the nodes metadata.
General bug fixes
- Fix annotation when nodes are successfully drained.
- Set minimum value for non-zero nodeTTL to prevent unnecessary node churn. The minimum value for
scaleDown.nodeTTL
is nowscaleUpTimeout
.