Skip to main content
Version: v1.5

Available Metrics

Luna exposes metrics in Prometheus format. They can be scraped on port 9090 on the elotl-luna-manager pod. The following is a list of available metrics with descriptions.

Luna Metric NameDescriptionLabel(s)
elotl_luna_scale_actions_totalCounts total number of node scale actions done by luna-manager"action" ("up" or "down") and "node_packing_mode" ("bin-packing" or "bin-selection").
elotl_luna_scale_errors_totalCounts total number of node scale up or down errors"action" ("up" or "down"), "node_packing_mode" ("bin-packing" or "bin-selection"), "node_type", "reason", "spot", and "subnet".
elotl_luna_started_node_types_totalCounts total number of started nodes, grouped by node_type label. Note that this metric will not appear until Luna has created a node."node_packing_mode" ("bin-packing" or "bin-selection") and "node_type" with the node type’s name.
elotl_luna_node_startup_duration_seconds_{bucket,sum,count}Histogram of seconds between ScaleUp Request creation and nodepool completing the operation."node_packing_mode" ("bin-packing" or "bin-selection").
elotl_luna_pods_evicted_totalCounts pods evictions."node_packing_mode" ("bin-packing" or "bin-selection") and "results" ("success" or "error").
elotl_luna_nodes_drained_totalCounts node drain actions."node_packing_mode" ("bin-packing" or "bin-selection") and "results" ("success" or "error").
elotl_luna_nodes_removed_totalCounts nodes removed from cluster (ready or not before cordoning)."node_packing_mode" ("bin-packing" or "bin-selection") and "node_state" ("ready" or "not_ready").
elotl_luna_unschedulable_podsThis gauge is set to current number of unschedulable pods considered by luna-manager."node_packing_mode" ("bin-packing" or "bin-selection").
elotl_luna_gpu_requests_exceeding_cluster_limitCounts number of attempts when pod requests exceeds cluster GPU limit."node_packing_mode" ("bin-packing" or "bin-selection").
elotl_luna_pods_skippedNumber of skipped pods (in the last loop iteration) for various reasons."node_packing_mode" ("bin-packing" or "bin-selection") and "reason" ("pending_reason_mismatch" or "pvc_bound")
elotl_luna_nodes_scale_up_request_expired_totalCounts number of nodes scale up requests expirations."node_packing_mode" ("bin-packing" or "bin-selection").
elotl_luna_insufficient_free_addresses_in_subnet_errors_totalCounts number of insufficient free addresses in subnet errors. Its label is "node_packing_mode" ("bin-packing" or "bin-selection").
elotl_luna_node_type_backoff_activeIndicates whether a node type is currently in backoff (1) or not (0)."node_type" and "reason"