Part 1: Plotting a Talos + Kubernetes Platform on Hetzner

Charting the plan for a personal Kubernetes platform on Hetzner using Talos.

Jul 13, 2025

Why This Series Exists

I want a place to build and host my own applications without parachuting into someone else's platform. This series documents the plan: a multi-node Kubernetes cluster on Hetzner, managed by and focused on simplicity, reproducibility, and observability from day one. Part 1 captures the motivation, constraints, and the guardrails I plan to hold myself to as the project grows.

Nothing is provisioned yet. This is intentionally a stub—an outline of the decisions to vet and the experiments queued up. Future installments will replace these placeholders with concrete configurations, failure reports, and the kind of operational checklists I would expect from a production platform.

Core Questions to Answer

Before touching a control plane, I need clarity on what makes this cluster worth maintaining. The Beyond Cloud provisioning system has been my north star: a React/Go frontend that drives a Talos-aware workflow engine, streams status over WebSockets, and treats Hetzner as the source of truth. I am not rebuilding the entire platform, but I want the same clarity of flow captured in the diagram below. These are the threads I will pull in upcoming parts:

Talos on Hetzner: Validate the provisioning workflow, ignition handoff, and long-term upgrades for bare-metal nodes.
Networking story: Choose between Hetzner Cloud networks, WireGuard meshes, or Tailscale to simplify east-west traffic while keeping ingress manageable.
Stateful workloads: Decide whether to rely on Hetzner's CSI driver, an on-cluster Ceph deployment, or a mix of managed services and replicated volumes.
Platform ergonomics: Build the minimum toolchain (GitOps, secrets, SLOs) that keeps hobby projects from turning into another pager.

Initial Platform Topology

diagram.mmd

graph TD
  subgraph Control Plane
    cp1[Control Node 1]
    cp2[Control Node 2]
    cp3[Control Node 3]
  end

  subgraph Worker Pool
    w1[Worker Node A]
    w2[Worker Node B]
    gpu[GPU Worker?]
  end

  talos[Talos API]
  gitops[(GitOps Repo)]
  flux[Flux Reconciler]
  hetzner[Hetzner Network]

  gitops --> flux
  flux --> talos
  talos --> cp1
  talos --> cp2
  talos --> cp3
  talos --> w1
  talos --> w2
  talos --> gpu

  hetzner -. private networking .- cp1
  hetzner -. private networking .- w1
  hetzner -. private networking .- w2
  hetzner -. private networking .- gpu

  flux -->|deploy manifests| w1
  flux -->|deploy manifests| w2
  flux -->|deploy manifests| gpu

Visualizing the Talos-managed control plane, worker pool, and GitOps flow I'm targeting.

Immediate Next Steps

Part 2 will dive into provisioning experiments. For now the backlog is short and explicit:

Benchmark Hetzner's CX and AX nodes to understand the price-to-resource curve.
Draft Talos machine configs that codify bootstrap secrets and workload isolation goals.
Prototype a minimal GitOps loop (likely ) to keep cluster state declarative from the start.

When those pieces are real, this entry will graduate from outline to walkthrough. Until then, treat it as a contract with myself and a preview of the technical deep dive to come.

Provisioning Roadmap

diagram.mmd

stateDiagram-v2
    [*] --> Researching: Capture requirements
    Researching --> DraftingConfigs: Prototype Talos machine configs
    DraftingConfigs --> DryRun: Validate ignition + bootstrap locally
    DryRun --> Provisioning: Create Hetzner nodes + apply configs
    Provisioning --> Observability: Ship metrics + logs
    Observability --> WorkloadReady: Deploy baseline apps via Flux
    WorkloadReady --> [*]
    DryRun --> Researching: Gaps discovered

State machine sketching the Talos + Hetzner rollout from research to baseline workloads.

Pricing Snapshot: Hetzner vs Public Cloud

The Beyond Cloud project already proved the basic math: Hetzner hardware paired with Talos beats managed control planes by a wide margin, as long as you are willing to own provisioning and upgrades yourself. My baseline comparison before writing Part 2:

Scenario	Resources	Monthly Estimate	Notes
Hetzner Talos (3 ctrl + 3 worker)	CPX21 ×3, CPX51 ×3, LB11, 300 GB volumes, 20 TB egress included	≈ €212 / month	€0 control plane fee; first 20 TB traffic bundled.
AWS EKS Rough Equivalent	EKS control plane, m5.large ×3, m5.xlarge ×3, ALB, 300 GB gp3, 5 TB egress	≈ $620+ / month	$74 control plane + metered egress; autoscaling & IAM included.
GKE / AKS Similar Footprint	Managed control plane, balanced nodes, regional LB, 300 GB SSD	≈ $550–$650 / month	Control-plane surcharge + per-LB pricing; stronger platform integrations.

Hetzner delivers ~60–70% savings for steady workloads, but those euros come with strings: Talos upgrades, backup policy, incident response, and observability all land on me. The public clouds cost more yet offer managed control planes, IAM integration, richer autoscale primitives, and global network edges. Part 2 will spell out the guardrails I need in place so that the savings do not disappear into ops toil.