Skip to main content
Version: v0.4

Available Metrics

Luna exposes metrics in Prometheus format. They can be scraped on port 9090 on the elotl-luna-manager pod. The following is a list of available metrics with descriptions.

Luna Metric NameDescription
scale_actions_totalCounts total number of scale actions done by luna-manager
started_node_types_totalCounts total number of started nodes, grouped by node_type label
node_startup_duration_secondsCounts seconds between ScaleUp Request creation and marking as succeeded
pods_evicted_totalCounts pods evictions (successful or not, see <result> label)
nodes_drained_totalCounts node drain actions (successful or not, see <ok> label)
nodes_removed_totalCounts nodes removed from cluster (ready or not before cordoning, see <node_state> label)
unschedulable_podsThis gauge is set to current number of unschedulable pods considered by luna-manager
cluster_gpu_limit_exceeds_totalCounts number of attempts when pod requests exceeds cluster GPU limit
pods_skippedNumber of skipped pods (in the last loop iteration) for various reasons
nodes_scale_up_request_expired_totalCounts number of nodes scale up requests expirations