Metrics Pattern Usage Guide¶

This guide demonstrates how to use the new standardized metrics collection pattern that enables automatic Prometheus discovery.

Overview¶

The standardized metrics pattern eliminates manual Prometheus configuration by allowing services to declare their metrics endpoints declaratively. When a service enables metrics.enable = true, it automatically appears in the Prometheus scrape configuration.

Basic Usage¶

1. Import Shared Types¶

In your service module, import the shared type definitions:

{ config, lib, pkgs, ... }:
let
  cfg = config.modules.services.myservice;
  # Import shared type definitions
  sharedTypes = import ../../lib/types.nix { inherit lib; };
in
{
  options.modules.services.myservice = {
    enable = lib.mkEnableOption "My Service";

    # Add standardized metrics submodule
    metrics = lib.mkOption {
      type = lib.types.nullOr sharedTypes.metricsSubmodule;
      default = {
        enable = true;
        port = 9100;
        path = "/metrics";
        labels = {
          service_type = "monitoring";
          exporter = "node";
        };
      };
      description = "Prometheus metrics collection configuration";
    };
  };

  config = lib.mkIf cfg.enable {
    # Your service implementation here
    # Metrics auto-registration happens automatically
  };
}

2. Service Implementation Example¶

Here's how the Glances service was migrated to use the new pattern:

# Before (manual configuration required)
modules.services.caddy.virtualHosts.glances = {
  enable = true;
  hostName = "glances.holthome.net";
  proxyTo = "localhost:61208";
};

# After (automatic registration)
modules.services.glances = {
  enable = true;
  metrics = {
    enable = true;
    port = 61208;
    path = "/api/3/metrics";
    labels = {
      service_type = "system_monitoring";
      exporter = "glances";
    };
  };
  reverseProxy = {
    enable = true;
    hostName = "glances.holthome.net";
    backend = {
      port = 61208;
    };
  };
};

Available Options¶

Metrics Submodule Options¶

metrics = {
  enable = true;                    # Enable metrics collection
  port = 9090;                      # Metrics endpoint port
  path = "/metrics";                # HTTP path (default: /metrics)
  interface = "127.0.0.1";          # Bind interface (default: 127.0.0.1)
  scrapeInterval = "60s";           # How often to scrape (default: 60s)
  scrapeTimeout = "10s";            # Scrape timeout (default: 10s)

  labels = {                        # Additional static labels
    service_type = "database";
    team = "infrastructure";
    environment = "production";
  };

  relabelConfigs = [                # Advanced relabeling
    {
      source_labels = [ "__name__" ];
      regex = "^go_.*";
      action = "drop";              # Drop Go runtime metrics
    }
  ];
};

Reverse Proxy Submodule Options¶

reverseProxy = {
  enable = true;
  hostName = "service.holthome.net";

  backend = {
    scheme = "http";                # or "https"
    host = "127.0.0.1";            # Backend host
    port = 8080;                   # Backend port

    tls = {                        # For HTTPS backends
      verify = true;               # Verify TLS cert
      sni = "override.example.com"; # SNI override
      caFile = "/path/to/ca.pem";  # Custom CA
    };
  };

  auth = {                         # Basic authentication
    user = "admin";
    passwordHashEnvVar = "SERVICE_PASSWORD_HASH";
  };

  security = {
    hsts = {
      enable = true;               # Enable HSTS (default: true)
      maxAge = 15552000;          # 6 months (default)
      includeSubDomains = true;   # Include subdomains
      preload = false;            # HSTS preload
    };

    customHeaders = {              # Additional security headers
      "X-Frame-Options" = "SAMEORIGIN";
      "X-Content-Type-Options" = "nosniff";
    };
  };

  extraConfig = ''                 # Additional Caddy directives
    # Custom configuration here
  '';
};

How Auto-Discovery Works¶

1. Service Declaration¶

When a service declares metrics configuration:

modules.services.myapp = {
  enable = true;
  metrics = {
    enable = true;
    port = 8080;
    path = "/metrics";
  };
};

2. Automatic Registration¶

The observability module scans all service configurations and generates Prometheus scrape configs:

# Generated Prometheus configuration
scrape_configs:
  - job_name: "service-myapp"
    static_configs:
      - targets: ["127.0.0.1:8080"]
        labels:
          service: "myapp"
          instance: "luna"
          __metrics_path__: "/metrics"
    scrape_interval: "60s"
    scrape_timeout: "10s"
    metrics_path: "/metrics"

3. Discovery Process¶

Evaluation Time: The observability module calls discoverMetricsTargets config
Scanning: Function scans config.modules.services.* for enabled metrics
Generation: Creates Prometheus scrape configurations automatically
Integration: Configurations are merged with any static targets

Best Practices¶

1. Consistent Labeling¶

Use consistent label schemes across services:

labels = {
  service_type = "database";      # database, web, monitoring, cache
  team = "infrastructure";        # owning team
  environment = "production";     # production, staging, development
  tier = "critical";             # critical, important, standard
};

2. Appropriate Scrape Intervals¶

Choose intervals based on service characteristics:

# High-frequency monitoring (databases, load balancers)
scrapeInterval = "15s";

# Standard monitoring (web services)
scrapeInterval = "60s";

# Low-frequency monitoring (batch jobs)
scrapeInterval = "300s";

3. Security Considerations¶

Always bind metrics to localhost and use reverse proxy for external access:

metrics = {
  interface = "127.0.0.1";  # Never bind to 0.0.0.0
  port = 9090;
};

reverseProxy = {
  enable = true;
  auth = {                  # Require authentication
    user = "monitor";
    passwordHashEnvVar = "METRICS_PASSWORD_HASH";
  };
};

4. Resource Monitoring¶

Include resource-related labels for capacity planning:

labels = {
  resource_tier = "high_memory";    # high_memory, high_cpu, standard
  scaling_group = "web_servers";    # logical grouping for scaling
};

Troubleshooting¶

1. Service Not Appearing in Prometheus¶

Check if metrics are properly configured:

# Verify service has metrics enabled
nix eval .#nixosConfigurations.luna.config.modules.services.myservice.metrics.enable

# Check discovered targets
nix eval .#nixosConfigurations.luna.config.services.prometheus.scrapeConfigs --json

2. Scrape Failures¶

Common issues and solutions:

Connection refused: Check if service is running and port is correct
404 errors: Verify metrics path is correct
Authentication failures: Ensure no auth required for metrics endpoint
Timeout errors: Increase scrapeTimeout or optimize metrics generation

3. Missing Labels¶

Verify label configuration:

# Check service metrics configuration
nix eval .#nixosConfigurations.luna.config.modules.services.myservice.metrics.labels --json

Migration from Manual Configuration¶

Before (Manual Prometheus Config)¶

services.prometheus.scrapeConfigs = [
  {
    job_name = "myservice";
    static_configs = [{
      targets = [ "localhost:8080" ];
    }];
  }
];

After (Automatic Discovery)¶

modules.services.myservice = {
  enable = true;
  metrics = {
    enable = true;
    port = 8080;
  };
};

# Prometheus configuration is generated automatically

This new pattern eliminates configuration drift and ensures all services are consistently monitored without manual intervention.