DevOps Ninja logo devops.ninja

Observability Without Datadog: A $20/Month Stack

By DevOps Ninja Editorial · Published 2026-05-09 · // cornerstone

A complete production observability setup — metrics, logs, traces, alerting — running on a single $20/mo VPS. Real config, real retention strategy.

A complete production observability setup — metrics, logs, traces, alerting — running on a single $20/month VPS. Real config. Real retention. We've deployed this stack at multiple companies; it handles workloads that would cost $5,000+/mo on Datadog.

The Stack

The Hardware

Hetzner CX22 (2 vCPU / 4GB / 40GB SSD) at $4.59/mo — or a CX32 (4 vCPU / 8GB / 80GB) at $7.95/mo if you're handling real volume. Add a CPX31 ($14/mo) if you outgrow that. Total for a real production stack: $15-30/month.

The docker-compose.yml

services:
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
      - prom_data:/prometheus
    command:
      - --config.file=/etc/prometheus/prometheus.yml
      - --storage.tsdb.retention.time=30d
      - --web.enable-remote-write-receiver
    ports: ["9090:9090"]

  loki:
    image: grafana/loki:latest
    volumes:
      - ./loki/loki.yml:/etc/loki/local-config.yaml
      - loki_data:/loki
    ports: ["3100:3100"]

  tempo:
    image: grafana/tempo:latest
    volumes:
      - ./tempo/tempo.yml:/etc/tempo.yaml
      - tempo_data:/var/tempo
    ports: ["3200:3200", "4317:4317"]

  grafana:
    image: grafana/grafana:latest
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml
    ports: ["3000:3000"]
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=false
      - GF_SECURITY_ADMIN_PASSWORD=changeme

  alertmanager:
    image: prom/alertmanager:latest
    volumes:
      - ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml
    ports: ["9093:9093"]

volumes:
  prom_data:
  loki_data:
  tempo_data:
  grafana_data:

The Retention Strategy

The mistake most teams make: trying to keep raw metrics for a year. Don't. Use a tiered retention policy:

The Cardinality Discipline

This is the #1 reason self-hosted observability fails. Every label combination is a unique series. user_id as a label = one series per user = your Prometheus dies. Rules of thumb:

The Alerts That Matter

Most teams over-alert. The rule: every alert that pages must require human action. If the response is 'ack and check again in 10 minutes,' it's a false positive — fix it.

That's it. Five alert categories cover 95% of real incidents.

What This Doesn't Replace

Honest about the gaps:

The Bill

ItemCost
Hetzner CX32 (compute)$7.95/mo
Hetzner Storage Box (off-host backup, 1TB)$3.79/mo
Cloudflare (DNS / TLS / DDoS, all free)$0
UptimeRobot (synthetic)$7/mo
Better Uptime / OnCall (paging)$0-29/mo
Total$18.74-48/mo

Compare that to a Datadog bill on the same workload: typically $1,500-5,000/mo. The savings fund senior engineering time, by a wide margin.

This is part of the DevOps Ninja cornerstone series. Honest critique welcome.