Skip to main content
Version: v1.3

Available Metrics

Luna exposes metrics in Prometheus format. They can be scraped on port 9090 on the elotl-luna-manager pod. The following is a list of available metrics with descriptions.

Luna Metric NameDescription
elot_luna_scale_actions_totalCounts total number of node scale actions done by luna-manager. Its labels are "action" ("up" or "down") and "node_packing_mode" ("bin-packing" or "bin-selection").
elot_luna_scale_errors_totalCounts total number of node scale up or down errors. Its labels are "action" ("up" or "down") and "node_packing_mode" ("bin-packing" or "bin-selection").
elot_luna_started_node_types_totalCounts total number of started nodes, grouped by node_type label. Note that this metric will not appear until Luna has created a node. Its labels are "node_packing_mode" ("bin-packing" or "bin-selection") and "node_type" with the node type’s name.
elotluna_node_startup_duration_seconds{bucket,sum,count}Histogram of seconds between ScaleUp Request creation and nodepool completing the operation. Its label is "node_packing_mode" ("bin-packing" or "bin-selection").
elot_luna_pods_evicted_totalCounts pods evictions. Its labels are "node_packing_mode" ("bin-packing" or "bin-selection") and "results" ("success" or "error").
elot_luna_nodes_drained_totalCounts node drain actions. Its labels are "node_packing_mode" ("bin-packing" or "bin-selection") and "results" ("success" or "error").
elot_luna_nodes_removed_totalCounts nodes removed from cluster (ready or not before cordoning). Its labels are "node_packing_mode" ("bin-packing" or "bin-selection") and "node_state" ("ready" or "not_ready").
elot_luna_unschedulable_podsThis gauge is set to current number of unschedulable pods considered by luna-manager. Its label is "node_packing_mode" ("bin-packing" or "bin-selection").
elot_luna_gpu_requests_exceeding_cluster_limitCounts number of attempts when pod requests exceeds cluster GPU limit. Its label is "node_packing_mode" ("bin-packing" or "bin-selection").
elot_luna_pods_skippedNumber of skipped pods (in the last loop iteration) for various reasons. Its labels are "node_packing_mode" ("bin-packing" or "bin-selection") and "reason" ("pending_reason_mismatch" or "pvc_bound")
elot_luna_nodes_scale_up_request_expired_totalCounts number of nodes scale up requests expirations. Its label is "node_packing_mode" ("bin-packing" or "bin-selection").
elot_luna_insufficient_free_addresses_in_subnet_errors_totalCounts number of insufficient free addresses in subnet errors. Its label is "node_packing_mode" ("bin-packing" or "bin-selection").