Modular Design Patterns¶
Last Updated: 2025-12-31
This document establishes standardized design patterns for NixOS service modules based on the refined Caddy and PostgreSQL reference implementations. Following these patterns ensures consistency, maintainability, and type safety across the entire infrastructure configuration.
Design Philosophy¶
Core Principles¶
- Declarative Configuration: Services declare what they need, not how to achieve it
- Type Safety: Use
types.submodulefor complex configuration structures - Separation of Concerns: Abstract implementation details from service declarations
- Automatic Integration: Services automatically register with infrastructure systems
- Graceful Migration: Support legacy patterns during transitions
Reference Implementations¶
- Web Services:
modules/nixos/services/caddy/default.nix- Structured backend configuration, security options, automatic DNS record generation - Storage Services:
modules/nixos/services/postgresql/- Database provisioning, secure credential handling, systemd integration - Observability Services:
modules/nixos/services/loki/default.nix,modules/nixos/services/promtail/default.nix- Complete observability stack with standardized patterns - Monitoring Services:
modules/nixos/services/gatus/default.nix- Black-box monitoring with contributory endpoint system and native Prometheus metrics - Shared Types:
lib/types.nix- Centralized type definitions for all standardized submodules
Related Documentation¶
- Monitoring Strategy:
docs/monitoring-strategy.md- Black-box vs white-box monitoring principles, service-specific guidance
Cross-Service Contribution Interfaces¶
Shared services expose dedicated integration points so downstream modules can declaratively contribute resources without patching implementation details:
- Grafana (
modules/nixos/services/grafana/default.nix) - Use
modules.services.grafana.integrations.<name>to bundle datasources, dashboards, andLoadCredentialentries. - Each integration can provide a map of datasources plus dashboard providers, and the module handles YAML provisioning + credential wiring automatically.
- Example:
modules.services.grafana.integrations.teslamate.datasources.teslamate = { ... };(see TeslaMate module for a real reference). - PostgreSQL → Grafana bridge
- Any database may declare
grafanaDatasources = [ { ... } ]undermodules.services.postgresql.databases.<dbName>. - Entries capture datasource metadata (host, user, UID, Timescale toggles) and optional dashboard directories; the module emits the correct Grafana integration and attaches password files safely via systemd credentials.
- EMQX MQTT broker
- Global ACLs live in
modules.services.emqx.aclRules, and downstream services can append viamodules.services.emqx.integrations.<service>.acls. - MQTT users can also be added through the same integration attribute, keeping per-service credentials close to their owners.
- The module now materializes
authz.confand setsEMQX_AUTHORIZATION__*automatically whenever rules are declared.
Thin Orchestrator Pattern (Multi-Service Stacks)¶
When multiple services work together as a cohesive stack (e.g., observability, monitoring), use a thin orchestrator pattern rather than a "god module" that re-exposes all options.
Anti-Pattern: God Module ❌¶
Don't create meta-modules that proxy every option from underlying services:
# BAD: God module that re-exposes 90% of underlying options
options.modules.services.observability = {
loki = {
port = mkOption { ... };
retentionDays = mkOption { ... };
storagePath = mkOption { ... };
# ... 50 more options copied from loki module
};
grafana = {
port = mkOption { ... };
oidc = mkOption { ... };
# ... 100 more options copied from grafana module
};
};
Problems: - Dual maintenance burden (options defined twice) - Documentation divergence - Type synchronization issues - Makes underlying modules harder to use directly
Correct Pattern: Thin Orchestrator ✅¶
A thin orchestrator only provides: 1. Master enable toggle - Turn the whole stack on/off 2. Component toggles - Enable/disable individual components 3. Cross-cutting wiring - Connections that span services (e.g., Promtail → Loki URL) 4. Stack-level concerns - Auto-discovery, shared alerts
# GOOD: Thin orchestrator that wires services together
options.modules.services.observability = {
enable = mkEnableOption "observability stack";
# Component toggles - no option re-exposure
loki.enable = mkOption { type = types.bool; default = cfg.enable; };
promtail.enable = mkOption { type = types.bool; default = cfg.enable; };
grafana.enable = mkOption { type = types.bool; default = cfg.enable; };
prometheus.enable = mkOption { type = types.bool; default = false; };
# Stack-level concern: auto-discovery of metrics endpoints
autoDiscovery.enable = mkOption { type = types.bool; default = true; };
# Stack-level concern: shared alerting rules
alerts.enable = mkOption { type = types.bool; default = true; };
};
config = mkIf cfg.enable {
# Enable individual modules - they configure themselves
modules.services.loki.enable = cfg.loki.enable;
modules.services.promtail.enable = cfg.promtail.enable;
modules.services.grafana.enable = cfg.grafana.enable;
# Cross-cutting wiring: connect Promtail to Loki
modules.services.promtail.lokiUrl = mkIf (cfg.promtail.enable && cfg.loki.enable)
"http://127.0.0.1:${toString config.modules.services.loki.port}";
# Cross-cutting wiring: auto-configure Grafana datasources
modules.services.grafana.autoConfigure = {
loki = lib.mkDefault cfg.loki.enable;
prometheus = lib.mkDefault cfg.prometheus.enable;
};
};
When to Use Thin Orchestrators¶
✅ Use thin orchestrator when: - Multiple services form a logical stack (observability, media automation) - Services need wiring between each other - You want a simple "enable the whole stack" toggle - Cross-cutting concerns like auto-discovery need coordination
❌ Don't create orchestrators when: - Services are independent (each service module stands alone) - No cross-service wiring is needed - A simple host-level config suffices
Reference Implementation¶
- Observability Stack:
modules/nixos/services/observability/default.nix - ~190 lines (vs 876 lines in previous god-module version)
- Enables: Loki, Promtail, Grafana, optionally Prometheus
- Provides: Auto-discovery of metrics endpoints, stack-level alerts
- Wires: Promtail → Loki, Grafana datasources
Host-Level Customization¶
Users who need to customize individual services configure them directly:
# Host config: Enable stack with thin orchestrator
modules.services.observability.enable = true;
# Customize individual services directly (not through orchestrator)
modules.services.loki.retention = 30;
modules.services.grafana.oidc = { ... };
modules.services.promtail.extraScrapeConfigs = [ ... ];
This provides the best of both worlds: - Simple enable for common use cases - Full control when customization is needed - No option duplication between modules
Creating New Service Modules¶
Native vs Container Decision¶
CRITICAL PRINCIPLE: Always prefer native NixOS services over containerized implementations when available.
Decision Framework¶
When adding a new service, follow this priority order:
- Check for native NixOS module (
search.nixos.org/options) - ✅ PREFERRED: Wrap native module with homelab patterns
- Example: Gatus has
services.gatus- use this instead of container -
Benefits: Better NixOS integration, easier updates, no container overhead
-
If native module doesn't exist, check if upstream provides one
- Sometimes services have NixOS modules in their own repos
-
Consider contributing the module to nixpkgs
-
Only use containers when:
- No native NixOS module exists or is practical
- Service explicitly requires containerization (security isolation)
- Rapid prototyping before creating native module
Architecture Example: Gatus¶
The Gatus module demonstrates the preferred approach:
Native Module with Contributory Pattern:
- Wraps native services.gatus NixOS module
- Adds homelab-specific contribution system
- Services register endpoints declaratively
Key Architecture:
# Wrapper around native module adds homelab patterns
config = mkIf cfg.enable {
# Enable native NixOS service
services.gatus = {
enable = true;
settings = {
web.port = cfg.port;
# Endpoints contributed by other services
};
};
# Add homelab integrations
# - ZFS storage management
# - Backup integration
# - Reverse proxy registration
# - Native Prometheus metrics
# - Preseed/DR capability
};
Contributory Pattern:
# Services register themselves with Gatus
modules.services.gatus.contributions.plex = {
name = "Plex";
group = "Media";
url = "https://plex.holthome.net/web/index.html";
interval = "60s";
conditions = [ "[STATUS] == 200" ];
};
Benefits of Native Approach:
- ✅ Simpler implementation (46% less code)
- ✅ No Podman dependency
- ✅ Better systemd integration
- ✅ Automatic NixOS updates (nix flake update)
- ✅ Native privilege management (no container user mapping)
- ✅ Direct filesystem access (no volume mounts)
Service Module Creation Workflow¶
Step 1: Research & Discovery¶
-
Search for native NixOS module:
-
Evaluate existing module (if found):
- Does it provide sufficient configuration options?
- Is it actively maintained?
-
Does it follow modern NixOS patterns?
-
Decision point:
- Native module exists and is sufficient → Wrapper approach (preferred)
- Native module incomplete/outdated → Contribute fixes or full implementation
- No native module → Container or custom implementation
Step 2: Module Structure¶
Create your module in modules/nixos/services/<service-name>/:
{ config, lib, pkgs, ... }:
let
cfg = config.modules.services.<service-name>;
# Use mylib for shared types (injected via _module.args)
sharedTypes = mylib.types;
# Storage helpers for ZFS dataset management (requires pkgs)
storageHelpers = mylib.storageHelpers pkgs;
in
{
options.modules.services.<service-name> = {
enable = lib.mkEnableOption "<service-name>";
# Add standardized submodules (choose applicable ones)
reverseProxy = lib.mkOption {
type = lib.types.nullOr sharedTypes.reverseProxySubmodule;
default = null;
description = "Reverse proxy configuration";
};
metrics = lib.mkOption {
type = lib.types.nullOr sharedTypes.metricsSubmodule;
default = null;
description = "Prometheus metrics collection";
};
logging = lib.mkOption {
type = lib.types.nullOr sharedTypes.loggingSubmodule;
default = null;
description = "Log shipping configuration";
};
backup = lib.mkOption {
type = lib.types.nullOr sharedTypes.backupSubmodule;
default = null;
description = "Backup configuration";
};
notifications = lib.mkOption {
type = lib.types.nullOr sharedTypes.notificationSubmodule;
default = null;
description = "Notification channels";
};
# Add preseed for disaster recovery (if stateful)
preseed = {
enable = lib.mkEnableOption "automatic restore before service start";
repositoryUrl = lib.mkOption {
type = lib.types.str;
description = "URL to Restic repository";
};
# ... (see preseed pattern below)
};
};
config = lib.mkIf cfg.enable {
# Service implementation here
};
}
Host-Level Contribution Rule¶
When a host (for example hosts/forge/services/<name>.nix) enables a service module and contributes additional infrastructure resources—ZFS datasets, Sanoid definitions, alert rules, backup jobs, Cloudflare tunnels, etc.—every one of those contributions must be wrapped in a guard tied to the service's enable flag. The canonical pattern is:
let
serviceEnabled = config.modules.services.<service>.enable or false;
in
{
config = lib.mkMerge [
{ modules.services.<service> = { enable = true; ... }; }
(lib.mkIf serviceEnabled {
modules.storage.datasets.services.<service> = { ... };
modules.backup.sanoid.datasets."tank/services/<service>" = { ... };
modules.alerting.rules."<service>-service-down" = { ... };
modules.services.caddy.virtualHosts.<service>.cloudflare = { ... };
})
];
}
This ensures that disabling a service automatically disables all downstream infrastructure so we never create orphaned datasets, alerts, or backup jobs.
Host-Level Defaults Libraries¶
For hosts with many services following similar patterns, create a centralized defaults library to reduce duplication. This is particularly useful for:
- Standard Sanoid/Syncoid replication configurations
- Common alert patterns (service-down, systemd-down)
- Backup repository configurations
- Authentication/security policies
Reference Implementation: hosts/forge/lib/defaults.nix
# hosts/<hostname>/lib/defaults.nix
{ config, lib }:
let
resticEnabled = (config.modules.backup.enable or false)
&& (config.modules.backup.restic.enable or false);
in
{
# Standard backup configuration
backup = {
enable = true;
repository = "nas-primary";
};
# ZFS replication helper
mkSanoidDataset = serviceName: {
useTemplate = [ "services" ];
recursive = false;
autosnap = true;
autoprune = true;
replication = {
targetHost = "nas-1.example.com";
targetDataset = "backup/${config.networking.hostName}/zfs-recv/${serviceName}";
# ... replication options
};
};
# Container service-down alert helper
mkServiceDownAlert = serviceName: displayName: description: {
type = "promql";
alertname = "${displayName}ServiceDown";
expr = ''container_service_active{name="${serviceName}"} == 0'';
for = "2m";
severity = "high";
labels = { service = serviceName; category = "availability"; };
annotations = {
summary = "${displayName} service is down on {{ $labels.instance }}";
description = "The ${displayName} ${description} service is not active.";
command = "systemctl status podman-${serviceName}.service";
};
};
# Systemd service-down alert helper (for native services)
mkSystemdServiceDownAlert = serviceName: displayName: description: {
type = "promql";
alertname = "${displayName}ServiceDown";
expr = ''node_systemd_unit_state{name="${serviceName}.service",state="active"} == 0'';
for = "2m";
severity = "high";
labels = { service = serviceName; category = "availability"; };
annotations = {
summary = "${displayName} service is down on {{ $labels.instance }}";
description = "The ${displayName} ${description} service is not active.";
command = "systemctl status ${serviceName}.service";
};
};
# Preseed/DR configuration (auto-gated by restic)
mkPreseed = restoreMethods: lib.mkIf resticEnabled {
enable = true;
repositoryUrl = "/mnt/nas-backup";
passwordFile = config.sops.secrets."restic/password".path;
restoreMethods = restoreMethods;
};
}
Usage in Service Files:
{ config, lib, ... }:
let
forgeDefaults = import ../lib/defaults.nix { inherit config lib; };
serviceEnabled = config.modules.services.myapp.enable or false;
in
{
config = lib.mkMerge [
{
modules.services.myapp = {
enable = true;
backup = forgeDefaults.backup;
preseed = forgeDefaults.mkPreseed [ "syncoid" "local" "restic" ];
};
}
(lib.mkIf serviceEnabled {
modules.backup.sanoid.datasets."tank/services/myapp" =
forgeDefaults.mkSanoidDataset "myapp";
modules.alerting.rules."myapp-service-down" =
forgeDefaults.mkServiceDownAlert "myapp" "MyApp" "application";
})
];
}
Key Benefits: - Reduces boilerplate from ~15-20 lines to 1-2 lines per concern - Ensures consistency across all services on a host - Single point of change for host-specific infrastructure patterns - Separates "what the module does" from "how this host deploys it"
When to Create a Defaults Library: - Host has 5+ services with similar patterns - Multiple services share the same backup/replication target - Alert patterns are standardized across services - Authentication policies are consistent
When NOT to Use Helpers: - Service has unique, complex alert expressions - Backup requires custom retention or exclusion patterns - Replication needs special handling (encrypted sends, different targets)
Step 3: Native Wrapper Pattern (PREFERRED)¶
When wrapping a native NixOS module:
config = lib.mkIf cfg.enable {
# 1. Enable and configure native service
services.<service-name> = {
enable = true;
# Pass through relevant configuration
# Keep it minimal - let native module handle defaults
};
# 2. Override systemd service if needed (for ZFS, etc.)
systemd.services."<service-name>" = {
# Add dependencies
after = [ "zfs-mount.service" ];
requires = [ "zfs-mount.service" ];
# Override user/permissions if needed
serviceConfig = {
User = lib.mkForce "<service-user>";
Group = lib.mkForce "<service-group>";
StateDirectory = lib.mkForce ""; # Disable if using ZFS
ReadWritePaths = [ cfg.dataDir ]; # For sandboxing
};
};
# 3. Add ZFS storage management
systemd.services."ensure-<service-name>-storage" =
storageHelpers.mkZfsStorageService {
dataset = "tank/services/<service-name>";
mountpoint = cfg.dataDir;
owner = "<service-user>";
properties = {
recordsize = "16K"; # Optimize for workload
compression = "zstd";
};
};
# 4. Add homelab integrations (backup, monitoring, etc.)
# See subsequent patterns below
};
Step 4: Add Homelab Integrations¶
Follow the standardized submodule patterns:
-
Reverse Proxy (if web service):
-
Monitoring (see Monitoring Strategy doc):
- Add Gatus endpoint contribution (user-facing check)
- Configure Prometheus alerts (system health)
-
Add systemd health check if needed
-
Backup (if stateful):
-
Preseed/DR (for critical services):
Step 5: Testing & Validation¶
-
Build configuration:
-
Deploy and verify:
-
Validate integrations:
- Reverse proxy:
curl https://<service>.domain.tld - Backup: Check Restic snapshots
- Monitoring: Verify Prometheus scrape targets
- Logs: Check Loki for service logs
Step 6: Documentation¶
Add inline comments explaining: - Why native vs container choice was made - Any workarounds or special considerations - Dependencies and assumptions - Reference to relevant design pattern docs
Common Patterns by Service Type¶
Web Application¶
- ✅ Reverse proxy (Caddy)
- ✅ Gatus health check contribution
- ✅ Backup (if stores data)
- ⚠️ Metrics (only if critical)
Database¶
- ✅ Backup with snapshots
- ✅ Metrics (postgres_exporter, etc.)
- ✅ Preseed/DR capability
- ✅ TCP health check (optional)
Infrastructure Service¶
- ✅ Systemd monitoring (Prometheus)
- ✅ Metrics (node_exporter or custom)
- ⚠️ Backup (if configuration is critical)
Monitoring Service¶
- ✅ Systemd health check
- ✅ Prometheus monitoring (meta-monitoring)
- ❌ NO recursive monitoring (avoid complexity)
Anti-Patterns to Avoid¶
❌ Don't create container version without checking for native module - Always search nixpkgs first - Containers should be last resort
❌ Don't duplicate functionality that exists in native modules - Use native module features when available - Only override what you need to change
❌ Don't skip standardized submodules - Every service should use applicable patterns - Consistency makes maintenance easier
❌ Don't create per-service backup scripts - Use unified backup system - Declare backup needs, don't implement them
❌ Don't implement custom metric exporters - Use existing exporters when available - Consider if metrics are actually needed (see Monitoring Strategy)
Migration Path for Existing Containers¶
If you have existing container-based services:
- Check for native module (may not have existed when originally deployed)
- Evaluate migration effort vs benefits
- Save container version as
.container-backupfile - Implement native wrapper with same functionality
- Test thoroughly before removing container
- Document architecture change with reasoning
Shared Types Library¶
All standardized submodule types are centralized in lib/types/ (with lib/types.nix as compatibility wrapper) to ensure consistency and reusability across services.
Import Pattern¶
# Preferred: Use mylib (injected via _module.args)
sharedTypes = mylib.types;
# Alternative: Direct import (for non-module contexts)
sharedTypes = import ../../../lib/types.nix { inherit lib; };
Available Shared Types¶
sharedTypes.reverseProxySubmodule- Reverse proxy integration with TLS backend supportsharedTypes.metricsSubmodule- Prometheus metrics collection with advanced labelingsharedTypes.loggingSubmodule- Log shipping with multiline parsing, regex support, and container driver configsharedTypes.backupSubmodule- Backup integration with retention policiessharedTypes.notificationSubmodule- Notification channels with escalationsharedTypes.containerResourcesSubmodule- Container resource managementsharedTypes.datasetSubmodule- ZFS dataset configuration (recordsize, compression, properties)sharedTypes.healthcheckSubmodule- Container healthcheck configuration (interval, timeout, retries)
Standardized Submodule Patterns¶
1. Reverse Proxy Integration¶
All web services should use the shared reverse proxy type for consistent Caddy integration.
Implementation Pattern¶
# Use shared type instead of inline definition
reverseProxy = mkOption {
type = types.nullOr sharedTypes.reverseProxySubmodule;
default = null;
description = "Reverse proxy configuration for this service";
};
Auto-Registration Implementation¶
config = mkIf cfg.enable {
# Automatic Caddy registration
modules.services.caddy.virtualHosts."${serviceName}" = mkIf (cfg.reverseProxy != null) {
enable = cfg.reverseProxy.enable;
hostName = cfg.reverseProxy.hostName;
backend = cfg.reverseProxy.backend;
auth = cfg.reverseProxy.auth;
};
};
2. Metrics Collection Pattern¶
Services that expose metrics should use the shared metrics type for automatic Prometheus integration.
Implementation Pattern¶
# Use shared type with service-specific defaults
metrics = mkOption {
type = types.nullOr sharedTypes.metricsSubmodule;
default = {
enable = true;
port = 9090; # Service-specific port
path = "/metrics";
labels = {
service_type = "database";
exporter = "postgres";
function = "storage";
};
};
description = "Prometheus metrics collection configuration";
};
Auto-Registration Implementation¶
# No explicit registration required - the observability module automatically
# scans all enabled services under `config.modules.services.*` for metrics submodules
config = mkIf cfg.enable {
# Services are automatically discovered when they define a metrics submodule
# The observability module uses discoverMetricsTargets() to find all services with:
# - (service.metrics or null) != null
# - (service.metrics.enable or false) == true
# Generated Prometheus scrape config will include:
# - job_name: "service-${serviceName}"
# - targets: ["${interface}:${port}"]
# - metrics_path: "${path}"
# - scrape_interval: "${scrapeInterval}"
# - labels: service.metrics.labels + { service = serviceName; instance = hostName; }
};
3. Log Shipping Pattern¶
Services that produce logs should use the shared logging type for automatic Promtail/Loki integration.
Implementation Pattern¶
# Use shared type with enhanced parsing capabilities
logging = mkOption {
type = types.nullOr sharedTypes.loggingSubmodule;
default = {
enable = true;
journalUnit = "${serviceName}.service";
labels = {
service = serviceName;
service_type = "application";
};
parseFormat = "json"; # or "logfmt", "regex", "multiline", "none"
};
description = "Log shipping configuration";
};
# Advanced parsing example
logging = mkOption {
type = types.nullOr sharedTypes.loggingSubmodule;
default = {
enable = true;
logFiles = [ "/var/log/app/error.log" ];
parseFormat = "multiline";
multilineConfig = {
firstLineRegex = "^\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}";
maxWaitTime = "3s";
};
};
};
Auto-Registration Implementation¶
# Logging auto-registration follows the same pattern as metrics
# The observability module automatically discovers services with logging submodules
config = mkIf cfg.enable {
# Promtail automatically discovers services with logging.enable = true
# Generated configuration includes:
# - job_name: "service-${serviceName}"
# - journalUnit or logFiles based on configuration
# - labels: service.logging.labels + { service = serviceName; }
# - parseFormat for structured log processing
};
4. Unified Backup Integration Pattern ✅ UPDATED 2025-10-29¶
All stateful services should use the unified backup system with the shared backup type for consistent policy management and automatic discovery.
Implementation Pattern¶
# Use shared type with service-specific configuration
backup = mkOption {
type = types.nullOr sharedTypes.backupSubmodule;
default = {
enable = true;
repository = "nas-primary";
frequency = "daily";
tags = [ "service-type" "service-name" "data-category" ];
useSnapshots = false; # Opt-in for ZFS snapshot coordination
excludePatterns = [
"**/cache/**"
"**/tmp/**"
"**/*.log"
];
};
description = "Backup configuration for unified backup system";
};
ZFS Snapshot Integration (Opt-in)¶
For services that need consistent backups (databases, applications with locks, etc.), enable ZFS snapshot coordination:
# For services requiring snapshot consistency (databases, locked files)
backup = {
enable = true;
repository = "nas-primary";
useSnapshots = true; # Enable snapshot coordination
zfsDataset = "tank/services/myservice"; # Required when useSnapshots=true
frequency = "daily";
tags = [ "database" "myservice" "critical" ];
excludePatterns = [
"**/*.log" # Exclude logs from snapshots
"**/cache/**" # Exclude cache directories
];
};
Working Example (from dispatcharr service):
backup = {
enable = true;
repository = "nas-primary";
useSnapshots = true;
zfsDataset = "tank/services/dispatcharr";
frequency = "daily";
tags = [ "iptv" "dispatcharr" "application" ];
};
When to Use Snapshots: - ✅ Database services (PostgreSQL, SQLite) - consistency critical - ✅ Application state (Sonarr, Radarr, etc.) - avoid corrupt configs - ✅ Locked files - services that may have open file handles - ❌ Static content (Plex media) - snapshots add unnecessary overhead - ❌ Read-only data - content that doesn't change during backup
Auto-Discovery Implementation¶
# No manual registration required!
# The unified backup system automatically discovers services with backup submodules
# Services simply declare their backup needs, system handles the rest
Key Changes from Legacy Pattern:
- ✅ Automatic Discovery: No manual job registration needed
- ✅ Opt-in Snapshots: Services declare useSnapshots = true when needed
- ✅ Unified Monitoring: All metrics flow through textfile collector
- ✅ Enterprise Verification: Automated integrity checks and restore testing
Critical Configuration Updates Required:
Services that handle databases, application state, or have file locking must enable snapshot coordination:
# Update these service backup configurations:
# 1. Sonarr - SQLite database, configuration files
modules.services.sonarr.backup = {
enable = true;
repository = "nas-primary";
useSnapshots = true; # REQUIRED
zfsDataset = "tank/services/sonarr"; # REQUIRED
excludePatterns = [ "**/*.log" "**/cache/**" ];
};
# 2. Loki - Database and indexes
modules.services.loki.backup = {
enable = true;
repository = "nas-primary";
useSnapshots = true; # REQUIRED
zfsDataset = "tank/services/loki"; # REQUIRED
excludePatterns = [ "**/*.tmp" ];
};
Migration Guide: See /docs/unified-backup-design-patterns.md for complete implementation details.
5. Notification Integration Pattern¶
Services should use the shared notification type for consistent alerting and status reporting.
Implementation Pattern¶
# Use shared type with escalation support
notifications = mkOption {
type = types.nullOr sharedTypes.notificationSubmodule;
default = {
enable = true;
channels = {
onFailure = [ "critical-alerts" "team-slack" ];
onBackup = [ "backup-status" ];
onHealthCheck = [ "monitoring-alerts" ];
};
customMessages = {
failure = "${serviceName} service failed on ${config.networking.hostName}";
backup = "${serviceName} backup completed on ${config.networking.hostName}";
};
escalation = {
afterMinutes = 15;
channels = [ "on-call-pager" ];
};
};
description = "Notification configuration";
};
6. Container Resource Management Pattern¶
Containerized services should use systemd resource limits and the shared container resources type.
Implementation Pattern¶
# For systemd services with resource limits
resources = mkOption {
type = types.attrsOf types.str;
default = {
MemoryMax = "512M";
MemoryReservation = "256M";
CPUQuota = "50%";
};
description = "Systemd resource limits";
};
# For Podman containers, use shared container resources type
container = mkOption {
type = types.submodule {
options = {
resources = mkOption {
type = sharedTypes.containerResourcesSubmodule;
default = {
memory = "512m";
memoryReservation = "256m";
cpus = "1.0";
cpuQuota = "50%";
};
};
# Additional container-specific options...
};
};
};
Implementation Guidelines¶
Module Structure Template¶
Every service module should follow this structure:
# Service module template using shared types
{ config, lib, pkgs, ... }:
let
inherit (lib) mkOption mkEnableOption mkIf types;
cfg = config.modules.services.<service>;
serviceName = "<service>";
# Use mylib for shared types (injected via _module.args)
sharedTypes = mylib.types;
in
{
options.modules.services.<service> = {
enable = mkEnableOption "<service> service";
# Core service configuration
dataDir = mkOption {
type = types.path;
default = "/var/lib/${serviceName}";
description = "Data directory";
};
port = mkOption {
type = types.port;
description = "Service port";
};
# Systemd resource limits
resources = mkOption {
type = types.attrsOf types.str;
default = {
MemoryMax = "256M";
CPUQuota = "25%";
};
description = "Systemd resource limits";
};
# Standardized integration submodules using shared types
reverseProxy = mkOption {
type = types.nullOr sharedTypes.reverseProxySubmodule;
default = null;
description = "Reverse proxy configuration";
};
metrics = mkOption {
type = types.nullOr sharedTypes.metricsSubmodule;
default = null;
description = "Prometheus metrics collection";
};
logging = mkOption {
type = types.nullOr sharedTypes.loggingSubmodule;
default = null;
description = "Log shipping configuration";
};
backup = mkOption {
type = types.nullOr sharedTypes.backupSubmodule;
default = null;
description = "Backup configuration";
};
notifications = mkOption {
type = types.nullOr sharedTypes.notificationSubmodule;
default = null;
description = "Notification configuration";
};
# ZFS integration pattern
zfs = {
dataset = mkOption {
type = types.nullOr types.str;
default = null;
example = "tank/services/${serviceName}";
description = "ZFS dataset to mount at dataDir";
};
properties = mkOption {
type = types.attrsOf types.str;
default = {
compression = "zstd";
atime = "off";
"com.sun:auto-snapshot" = "true";
};
description = "ZFS dataset properties";
};
};
};
config = mkIf cfg.enable {
# ZFS dataset configuration
modules.storage.datasets.services.${serviceName} = mkIf (cfg.zfs.dataset != null) {
mountpoint = cfg.dataDir;
properties = cfg.zfs.properties;
owner = serviceName;
group = serviceName;
mode = "0750";
};
# Core service implementation
systemd.services."${serviceName}" = {
description = "<Service> service";
wantedBy = [ "multi-user.target" ];
after = [ "network.target" ] ++ lib.optionals (cfg.zfs.dataset != null) [ "zfs-mount.service" ];
wants = lib.optionals (cfg.zfs.dataset != null) [ "zfs-mount.service" ];
serviceConfig = {
ExecStart = "${cfg.package}/bin/${serviceName}";
Restart = "always";
User = serviceName;
Group = serviceName;
# Resource limits
MemoryMax = cfg.resources.MemoryMax;
MemoryReservation = cfg.resources.MemoryReservation or null;
CPUQuota = cfg.resources.CPUQuota;
# Security hardening
ProtectSystem = "strict";
ProtectHome = true;
PrivateTmp = true;
NoNewPrivileges = true;
};
};
# User/group creation
users.users."${serviceName}" = {
isSystemUser = true;
group = serviceName;
home = cfg.dataDir;
createHome = true;
};
users.groups."${serviceName}" = {};
# Auto-registration with infrastructure systems using structured backend configuration
modules.services.caddy.virtualHosts.${serviceName} = mkIf (cfg.reverseProxy != null && cfg.reverseProxy.enable) {
enable = true;
hostName = cfg.reverseProxy.hostName;
# Use structured backend configuration from shared types
backend = cfg.reverseProxy.backend;
# Authentication configuration from shared types
auth = cfg.reverseProxy.auth;
# Security configuration from shared types
security = cfg.reverseProxy.security;
# Additional configuration
extraConfig = cfg.reverseProxy.extraConfig;
};
# Metrics auto-registration happens automatically via observability module
# No explicit configuration needed - the observability module scans all
# services under modules.services.* for metrics submodules
# Firewall configuration (localhost only)
networking.firewall = {
interfaces.lo.allowedTCPPorts = [ cfg.port ]
++ lib.optional (cfg.metrics != null && cfg.metrics.enable) cfg.metrics.port;
};
# Directory ownership (if not using ZFS dataset)
systemd.tmpfiles.rules = lib.mkIf (cfg.zfs.dataset == null) [
"d ${cfg.dataDir} 0755 ${serviceName} ${serviceName} -"
];
};
}
Validation and Assertions¶
Every module should include comprehensive validation:
config = mkIf cfg.enable {
assertions = [
{
assertion = cfg.reverseProxy == null || cfg.reverseProxy.backend.port == cfg.port;
message = "Reverse proxy backend port must match service port";
}
{
assertion = cfg.metrics == null || cfg.metrics.port != cfg.port;
message = "Metrics port must be different from service port";
}
# Additional validations...
];
};
7. ZFS Integration Pattern¶
Services with persistent storage should use the ZFS dataset pattern for optimized storage management.
Implementation Pattern¶
zfs = {
dataset = mkOption {
type = types.nullOr types.str;
default = null;
example = "tank/services/${serviceName}";
description = "ZFS dataset to mount at dataDir";
};
properties = mkOption {
type = types.attrsOf types.str;
default = {
compression = "zstd";
atime = "off";
"com.sun:auto-snapshot" = "true";
};
description = "ZFS dataset properties";
};
};
# Auto-registration with storage module
modules.storage.datasets.services.${serviceName} = mkIf (cfg.zfs.dataset != null) {
mountpoint = cfg.dataDir;
properties = cfg.zfs.properties;
owner = serviceName;
group = serviceName;
mode = "0750";
};
8. Directory and Permission Management Pattern¶
Services must use systemd's native StateDirectory mechanism for directory ownership and permissions. This provides a single source of truth and prevents conflicts between tmpfiles and systemd.
Critical Design Principles¶
✅ DO: Native SystemD Services
- Use StateDirectory + StateDirectoryMode for directory management
- Let systemd create and manage directory ownership
- Set UMask to control file creation permissions
- NO tmpfiles rules for native services
❌ DON'T: Common Mistakes - Don't use tmpfiles for native systemd services - Don't set home directory to data directory (causes permission reversion) - Don't mix tmpfiles and StateDirectory - Don't rely on ZFS dataset properties for permissions
Implementation Pattern for Native Services¶
# Service module configuration
config = mkIf cfg.enable {
# Set user home to /var/empty to prevent activation script interference
users.users.${serviceName} = {
isSystemUser = true;
group = serviceName;
home = lib.mkForce "/var/empty"; # CRITICAL: Prevents 700 permission enforcement
};
# SystemD service configuration
systemd.services.${serviceName} = {
serviceConfig = {
# StateDirectory tells systemd to create /var/lib/${serviceName}
# with ownership set to User:Group
StateDirectory = serviceName;
# StateDirectoryMode sets directory permissions (750 = rwxr-x---)
StateDirectoryMode = "0750";
# UMask ensures files created by service are 640 (rw-r-----)
UMask = "0027";
# User/Group are set by the service (usually from upstream module)
User = serviceName;
Group = serviceName;
};
};
# For ZFS datasets, only specify the dataset - NO owner/group/mode
modules.storage.datasets.services.${serviceName} = mkIf (cfg.zfs.dataset != null) {
mountpoint = cfg.dataDir;
properties = cfg.zfs.properties;
# DO NOT SET: owner, group, mode - these interfere with StateDirectory
};
};
Implementation Pattern for OCI Containers¶
OCI containers don't support StateDirectory, so they must use tmpfiles:
# Service module configuration for OCI containers
config = mkIf cfg.enable {
users.users.${serviceName} = {
isSystemUser = true;
group = serviceName;
home = "/var/empty";
};
# For OCI containers, use ZFS dataset WITH explicit permissions
modules.storage.datasets.services.${serviceName} = mkIf (cfg.zfs.dataset != null) {
mountpoint = cfg.dataDir;
properties = cfg.zfs.properties;
owner = serviceName;
group = serviceName;
mode = "0750";
# Note: OCI containers don't support StateDirectory, so we explicitly set
# permissions via tmpfiles (handled by storage module)
};
};
Storage Module Smart Detection¶
The storage module automatically detects service type and applies correct pattern:
# In storage/datasets.nix
systemd.tmpfiles.rules = lib.flatten (lib.mapAttrsToList (serviceName: serviceConfig:
let
# Check if explicit permissions are set (OCI containers)
hasExplicitPermissions = (serviceConfig.mode or null) != null
&& (serviceConfig.owner or null) != null
&& (serviceConfig.group or null) != null;
in
if hasExplicitPermissions then [
# OCI containers: Use explicit permissions via tmpfiles
"d \"${mountpoint}\" ${serviceConfig.mode} ${serviceConfig.owner} ${serviceConfig.group} - -"
"z \"${mountpoint}\" ${serviceConfig.mode} ${serviceConfig.owner} ${serviceConfig.group} - -"
] else [
# Native services: No tmpfiles rules - rely on StateDirectory
# (tmpfiles with "-" defaults to root:root which interferes)
]
) cfg.services);
Permission Architecture Summary¶
| Service Type | Directory Creation | Permission Management | Home Directory |
|---|---|---|---|
| Native SystemD | StateDirectory |
StateDirectoryMode + UMask |
/var/empty |
| Native + ZFS | ZFS dataset | owner/group/mode in dataset config |
/var/empty |
| OCI Container | tmpfiles | mode/owner/group in dataset config |
/var/empty |
IMPORTANT: When using ZFS datasets, StateDirectory only manages permissions for directories it creates. If a ZFS dataset is already mounted at the path, StateDirectory cannot change its permissions. You must explicitly set owner, group, and mode in the dataset configuration.
Examples from Working Implementations¶
Grafana (Native Service with ZFS):
users.users.grafana.home = lib.mkForce "/var/empty";
systemd.services.grafana.serviceConfig = {
StateDirectory = "grafana";
StateDirectoryMode = "0750";
UMask = "0027";
};
# ZFS dataset - MUST include owner/group/mode since ZFS mountpoint pre-exists
# StateDirectory cannot change permissions on pre-existing directories
modules.storage.datasets.services.grafana = {
mountpoint = "/var/lib/grafana";
owner = "grafana";
group = "grafana";
mode = "0750";
};
Sonarr (OCI Container):
users.users.sonarr.home = "/var/empty";
# No StateDirectory (not supported by OCI)
# ZFS dataset - WITH owner/group/mode
modules.storage.datasets.services.sonarr = {
mountpoint = "/var/lib/sonarr";
owner = "sonarr";
group = "sonarr";
mode = "0750";
# Note: OCI containers don't support StateDirectory
};
Backup User Group Membership¶
For backup integration, ensure the backup user can read service data:
# Add service groups to backup user
users.users.restic-backup.extraGroups = [
"grafana"
"loki"
"plex"
"promtail"
# Add all services that need backup
];
With 750 permissions (rwxr-x---), the backup user (member of service group) can read directories and files created with UMask=0027.
Common Pitfalls and Solutions¶
Problem: Permissions revert to 700 after nixos-rebuild
Cause: User home directory set to data directory
Solution: Set home = lib.mkForce "/var/empty"
Problem: Directory owned by root:root instead of service user Cause: tmpfiles rule with "-" for user/group Solution: Remove tmpfiles rule, use StateDirectory instead
Problem: Backup fails with permission denied Cause: Files created with 600 permissions (user-only) Solution: Add exclude patterns for security-sensitive files
Problem: Service fails to start after migration
Cause: StateDirectory not set, directory doesn't exist
Solution: Add StateDirectory = serviceName to serviceConfig
Validation Checklist¶
After implementing directory management:
# 1. Check StateDirectory configuration
systemctl show <service>.service -p StateDirectory -p StateDirectoryMode
# 2. Verify directory ownership
ls -ld /var/lib/<service> # Should be <service>:<service> drwxr-x---
# 3. Check user home directory
getent passwd <service> # Should show /var/empty
# 4. Verify tmpfiles rules
systemd-tmpfiles --cat-config | grep <service>
# Native services: Should NOT have tmpfiles entries
# OCI containers: Should have "d" and "z" entries with explicit permissions
# 5. Test permission persistence
sudo systemctl restart <service>.service
ls -ld /var/lib/<service> # Should still be drwxr-x---
Helper Functions¶
Reusable helper functions in lib/ for common patterns:
lib/types.nix- ✅ Implemented - Shared type definitions (split intolib/types/*.nix)lib/monitoring-helpers.nix- ✅ Implemented - Metrics and alert configurationlib/backup-helpers.nix- ✅ Implemented - Backup job generationlib/caddy-helpers.nix- ✅ Implemented - Reverse proxy configurationmodules/nixos/storage/helpers-lib.nix- ✅ Implemented - Storage/preseed helpers (viamylib.storageHelpers pkgs)
Migration Strategy¶
Phase 1: Documentation and Standards ✅ COMPLETED¶
- ✅ Document patterns (this document)
- ✅ Update copilot-instructions.md with pattern requirements
- ✅ Create helper libraries (
lib/types.nix,lib/types/*.nix)
Phase 2: Infrastructure Implementation ✅ COMPLETED¶
- ✅ Implement shared type definitions (
lib/types.nix) - ✅ Create standardized Caddy auto-registration
- ✅ Implement observability stack (Loki, Promtail)
- 🔄 Centralized notification system (planned)
- 🔄 Enhanced backup orchestration (planned)
Phase 3: Service Migration 🔄 IN PROGRESS¶
- ✅ Migrated core observability services (Grafana, Loki, Promtail)
- ✅ Migrated web services with reverse proxy patterns
- 🔄 Migrate remaining containerized services
- 🔄 Audit and update legacy service configurations
- 🔄 Deprecate inline type definitions in favor of shared types
Phase 4: Validation and Testing 📋 PLANNED¶
- Implement module validation tests
- Add integration tests for auto-registration
- Document troubleshooting guides
- Create migration validation checklist
Current Migration Status¶
✅ Completed Services: - Grafana - Full standardized pattern implementation - Loki - Complete observability stack with ZFS integration - Promtail - Advanced log shipping with multiline parsing - Caddy - Structured backend configuration with auto-registration
🔄 In Progress: - Remaining containerized services (UniFi, Omada, etc.) - Prometheus/Grafana integration for auto-discovery - Centralized backup orchestration
📋 Next Priority: - Migrate all remaining services to use shared types - Implement centralized observability auto-registration - Complete notification system integration
Best Practices¶
Type Safety¶
- Always use
types.submodulefor complex configuration - Provide clear descriptions and examples
- Use
mkEnableOptionfor boolean features - Validate configuration with assertions
Security¶
- Default to secure configurations
- Use systemd security directives
- Handle secrets via SOPS and environment variables
- Never expose credentials in process lists or logs
Maintainability¶
- Keep implementation details in helper functions
- Use descriptive option names
- Provide migration paths for breaking changes
- Document all breaking changes in commit messages
Performance¶
- Use lazy evaluation where possible
- Avoid expensive computations in option definitions
- Cache complex derivations
- Consider resource usage of generated configurations
This document serves as the authoritative guide for all new service modules and the target for migrating existing ones.