Homelab

Build your own infrastructure — practical guides for virtualization, containers, networking, storage, monitoring, and automation. From a real homelab, not a textbook.

Last updated:

Table of Contents
  1. 01Proxmox VE Setup
  2. 02Docker Fundamentals
  3. 03Tailscale Mesh VPN
  4. 04Self-Hosting Essentials
  5. 05NAS & Storage
  6. 06Monitoring Stack
  7. 07Backup Strategy
  8. 08Network Segmentation
  9. 09Reverse Proxy
  10. 10GPU Passthrough
  11. 11Home Automation
Proxmox VE virtualization setup with VMs and containers
1

Proxmox VE Setup

Install Proxmox VE, configure your first virtual machines and containers, set up storage pools, and understand clustering basics. The foundation of every serious homelab.

Why Proxmox?

Proxmox Virtual Environment (VE) is a free, open-source hypervisor based on Debian Linux. It combines KVM for full virtual machines and LXC for lightweight containers in a single web interface. Unlike ESXi (which went paid-only in 2024), Proxmox is completely free for homelab use.

It’s what most homelabbers run, and for good reason: it’s stable, well-documented, and has an active community.

Installation

  1. Download the ISO from proxmox.com/downloads
  2. Flash to USB with Rufus, Balena Etcher, or dd
  3. Boot from USB, follow the installer (choose ZFS mirror if you have two drives)
  4. Access the web UI at https://your-ip:8006
First thing after install: Disable the enterprise repo and enable the no-subscription repo. Edit /etc/apt/sources.list.d/pve-enterprise.list and comment out the enterprise line, then add the free repo.

VMs vs Containers

  • VMs (KVM) — full virtualization. Run any OS (Windows, Linux, BSD). Higher overhead but complete isolation. Use for: Windows, anything needing a full kernel, untrusted workloads
  • LXC Containers — lightweight, share the host kernel. Lower overhead, faster startup. Use for: Linux services (Pi-hole, Nginx, databases), anything that doesn’t need a separate kernel

Rule of thumb: use containers for everything you can, VMs for everything else.

Storage Configuration

  • local — default storage on the boot drive. Good for ISOs and templates
  • local-lvm — LVM thin pool for VM disks. Default for new VMs
  • ZFS — if you chose ZFS during install, you get snapshots, compression, and data integrity for free
  • NFS/CIFS — add network shares for backups or shared storage

Clustering

If you have 3+ Proxmox nodes, you can create a cluster for:

  • Centralized management (one web UI for all nodes)
  • Live migration (move running VMs between nodes with zero downtime)
  • High Availability (auto-restart VMs on another node if one fails)
Important: Never create a cluster with fewer than 3 nodes. A 2-node cluster loses quorum if one node goes down, which can cause split-brain and data corruption.
Docker containers, Compose, networking, and volumes
2

Docker Fundamentals

Containers, Compose files, networking, volumes, and best practices. Docker is the engine that runs most self-hosted services — understanding it properly saves countless hours of debugging.

Containers vs VMs

A container is not a lightweight VM. It’s an isolated process (or group of processes) that shares the host’s kernel but has its own filesystem, network, and process space. Think of it as chroot on steroids with cgroups for resource limits and namespaces for isolation.

Docker Compose

Running docker run commands with 20 flags is unmaintainable. Docker Compose lets you define multi-container applications in a YAML file:

  • Define services, networks, and volumes in one file
  • docker compose up -d starts everything
  • docker compose down stops and removes containers
  • docker compose pull updates images
Best practice: One docker-compose.yml per service (not one giant file for everything). This lets you update, restart, and manage services independently.

Networking

  • Bridge (default) — containers get internal IPs, publish ports to host with -p
  • Host — container uses host’s network stack directly. No port mapping needed. Use sparingly
  • Macvlan — container gets its own IP on your LAN. Useful for services that need to be on the same subnet as physical devices

Create custom bridge networks for inter-container communication. Containers on the same custom network can resolve each other by container name.

Volumes and Data Persistence

  • Named volumes — Docker manages the storage. Use for databases and application data
  • Bind mounts — map a host directory into the container. Use for config files and data you want to manage directly
Never store important data only inside a container. Containers are ephemeral. If you docker compose down -v or rebuild, data in unnamed volumes is gone. Always use named volumes or bind mounts for anything you care about.

Essential Commands

  • docker compose logs -f service_name — tail logs
  • docker compose exec service_name sh — shell into running container
  • docker system prune -a — reclaim disk space (removes unused images, containers, networks)
  • docker stats — live resource usage per container
Tailscale mesh VPN with exit nodes and subnet routing
3

Tailscale Mesh VPN

Zero-config WireGuard VPN that connects all your devices into a private mesh network. Exit nodes, subnet routing, ACLs, and MagicDNS — remote access to your homelab without port forwarding.

Why Tailscale?

Tailscale wraps WireGuard in a zero-config coordination layer. Every device gets a stable 100.x.x.x IP address. Devices find each other through Tailscale’s coordination server, but traffic flows directly peer-to-peer (encrypted with WireGuard). No port forwarding, no dynamic DNS, no firewall holes.

Setup

  1. Install on each device: curl -fsSL https://tailscale.com/install.sh | sh
  2. Authenticate: sudo tailscale up
  3. Done. Every device can now reach every other device via 100.x.x.x

Key Features for Homelabs

  • Subnet routing — advertise your LAN subnet so Tailscale devices can reach non-Tailscale devices on your network: tailscale up --advertise-routes=10.10.10.0/24
  • Exit nodes — route all internet traffic through a specific machine. Great for using your home IP while traveling: tailscale up --advertise-exit-node
  • MagicDNS — access devices by hostname instead of IP: ssh user@proxmox instead of ssh [email protected]
  • ACLs — control which devices can talk to which. Restrict your IoT VLAN from reaching your NAS, for example
  • Funnel — expose a service to the public internet through Tailscale’s edge (no port forwarding needed)
Free tier is generous: 100 devices, 3 users, all core features. For a homelab, you’ll likely never need to pay.

Alternative: Headscale

If you don’t want to depend on Tailscale’s coordination server, Headscale is an open-source, self-hosted implementation of the Tailscale coordination server. Same WireGuard mesh, but you control everything. Trade-off: more setup, more maintenance, but full sovereignty over your network.

Self-hosting essentials with domain setup DNS and Cloudflare
4

Self-Hosting Essentials

What to self-host, domain setup, DNS configuration, Cloudflare as a reverse proxy, and the practical trade-offs of running your own services vs using SaaS.

What’s Worth Self-Hosting?

Not everything should be self-hosted. The best candidates are services where you gain privacy, control, or cost savings without taking on unacceptable risk:

  • High value: Nextcloud (files), Vaultwarden (passwords), Immich (photos), Jellyfin (media), Gitea (code), Paperless-ngx (documents)
  • Medium value: Pi-hole/AdGuard (DNS), Uptime Kuma (monitoring), Bookstack (wiki), Mealie (recipes)
  • Think twice: Email (deliverability is painful), public-facing websites (DDoS risk), anything your family relies on without a backup plan

Domain and DNS Setup

  1. Buy a domain (Cloudflare Registrar, Porkbun, or Namecheap)
  2. Point nameservers to Cloudflare (even if you bought elsewhere)
  3. Create A records for your services: cloud.yourdomain.com, git.yourdomain.com, etc.
  4. Enable Cloudflare proxy (orange cloud) for DDoS protection and SSL

Cloudflare Tunnels

Cloudflare Tunnels (formerly Argo Tunnels) let you expose internal services to the internet without opening any ports on your router. A lightweight daemon (cloudflared) runs on your server and creates an outbound-only connection to Cloudflare’s edge.

This is the recommended approach for exposing services to the internet. No port forwarding, no dynamic DNS, and Cloudflare handles SSL, DDoS protection, and bot filtering for free.

The Self-Hosting Trade-Off

Self-hosting gives you control and privacy but costs you time and reliability. You’re the sysadmin now — if it breaks at 2 AM, it’s your problem. Always have a plan for when (not if) hardware fails:

  • Automated backups (see Guide 7)
  • Monitoring and alerts (see Guide 6)
  • A fallback for critical services (can you survive without your NAS for a week?)
NAS storage with TrueNAS ZFS RAID and snapshots
5

NAS & Storage

TrueNAS, ZFS fundamentals, RAID levels, snapshot-based backup design, and why ECC RAM actually matters. Your data is only as safe as your storage strategy.

ZFS: The Homelab Filesystem

ZFS is a combined filesystem and volume manager that provides data integrity verification, snapshots, compression, and redundancy. It’s the default choice for NAS systems because it catches and corrects silent data corruption (bit rot) that other filesystems miss.

  • Checksumming — every block is checksummed. ZFS detects corruption on read and auto-heals from redundant copies
  • Copy-on-write — data is never overwritten in place. Writes go to new blocks, then the pointer is updated atomically. No partial writes, no fsck needed
  • Snapshots — instant, zero-cost point-in-time copies. Roll back a dataset to any snapshot in seconds
  • Compression — LZ4 compression is nearly free (CPU-wise) and saves 30-50% on typical data

RAID Levels (ZFS Pools)

  • Mirror (RAID 1) — 2+ drives, each is an exact copy. 50% usable space. Fast reads. Best for small setups (2-4 drives)
  • RAIDZ1 (RAID 5) — single parity. Can lose 1 drive. Good for 3-5 drives. Minimum recommended for data you care about
  • RAIDZ2 (RAID 6) — double parity. Can lose 2 drives. Recommended for 6+ drives or large capacity drives (rebuild times are long)
  • RAIDZ3 — triple parity. For enterprise or very large arrays
RAID is not backup. RAID protects against drive failure. It does NOT protect against accidental deletion, ransomware, fire, theft, or controller failure. You still need backups.

TrueNAS Setup Tips

  • Use TrueNAS SCALE (Linux-based) over CORE (FreeBSD) for better Docker/container support
  • Give ZFS as much RAM as you can. The ARC (Adaptive Replacement Cache) dramatically improves read performance
  • Enable automated snapshots: hourly for 24h, daily for 30 days, weekly for 6 months
  • Set up automated scrubs (monthly) to detect and repair bit rot
Grafana Prometheus monitoring stack with dashboards and alerting
6

Monitoring Stack

Grafana + Prometheus + node_exporter + alerting. See everything happening in your homelab with dashboards, graphs, and alerts that actually tell you when something breaks.

The Stack

  • Prometheus — time-series database that scrapes metrics from exporters at regular intervals (typically 15s)
  • node_exporter — exposes Linux host metrics (CPU, RAM, disk, network) on port 9100
  • Grafana — visualization platform. Connects to Prometheus as a data source and renders dashboards
  • Alertmanager — handles alert routing, deduplication, and notification (email, Slack, Discord, PagerDuty)

Quick Setup with Docker Compose

The entire monitoring stack can run in Docker containers:

  • Prometheus: scrapes metrics, stores time-series data
  • node_exporter: runs on each host you want to monitor
  • Grafana: web UI on port 3000, import pre-built dashboards
  • cAdvisor: monitors Docker container resource usage
Start with dashboard ID 1860 (Node Exporter Full) from Grafana’s dashboard marketplace. It gives you comprehensive host monitoring out of the box. Then customize from there.

What to Monitor

  • Host metrics: CPU, RAM, disk I/O, network, filesystem usage, temperature
  • Container metrics: per-container CPU, memory, network (via cAdvisor)
  • Service health: Uptime Kuma or Blackbox exporter for HTTP/TCP checks
  • Smart disk health: smartctl_exporter for drive failure prediction
  • UPS status: NUT (Network UPS Tools) exporter for battery monitoring

Alerting That Matters

Only alert on things that require action. Good alerts:

  • Disk usage above 85% (time to clean up or expand)
  • Host unreachable for 2+ minutes
  • RAM usage above 90% sustained for 10+ minutes
  • SMART disk warning (replace the drive)
  • UPS on battery (power outage)
Backup strategy with 3-2-1 rule Borg Restic and offsite
7

Backup Strategy

The 3-2-1 rule, Borg vs Restic, offsite options, and automation. The backup you never test is the backup that fails when you need it most.

The 3-2-1 Rule

  • 3 copies of your data (original + 2 backups)
  • 2 different storage media (e.g., SSD + HDD, or local + cloud)
  • 1 offsite copy (protects against fire, theft, natural disaster)

This isn’t paranoia — it’s math. Any single backup has a non-trivial failure probability. Two independent copies reduce risk exponentially.

Borg Backup

Borg is a deduplicating, encrypting backup program. It’s fast, space-efficient, and battle-tested:

  • Deduplication at the chunk level — only new/changed data is stored
  • Client-side encryption (AES-256) — the backup server never sees your data
  • Compression (LZ4, ZSTD, LZMA) — further reduces storage
  • Mount any backup as a FUSE filesystem to browse and restore individual files

Restic

Restic is similar to Borg but with native support for cloud backends:

  • Supports S3, B2, Azure, Google Cloud, SFTP, and local storage
  • Faster for cloud-based offsite backups
  • Single binary, no dependencies
  • Built-in data integrity verification
My recommendation: Borg for local/LAN backups (faster, more mature). Restic for offsite/cloud backups (native cloud support). Use both for 3-2-1 compliance.

Offsite Options

  • Backblaze B2 — $6/TB/month. The cheapest S3-compatible storage. Use with Restic or rclone
  • Hetzner Storage Box — cheap BorgBackup-compatible remote storage (SSH/SFTP)
  • Another physical location — a friend’s house, a small NAS at a relative’s place, encrypted and synced nightly

Automation and Testing

A backup that isn’t automated will eventually stop happening. A backup that isn’t tested will eventually fail to restore.

  • Cron jobs or systemd timers for daily backups
  • Monitor backup completion with your alerting stack
  • Test restores quarterly — actually restore files and verify them
  • Document the restore process. When you need it, you’ll be stressed and possibly sleep-deprived
Network segmentation with VLANs firewall rules and IoT isolation
8

Network Segmentation

VLANs, firewall rules, IoT isolation, and pfSense/OPNsense configuration. Keep your smart devices from talking to your NAS, and your guest network from seeing your servers.

Why Segment?

A flat network means every device can talk to every other device. Your kid’s IoT toy, your smart TV, and your NAS with irreplaceable family photos are all on the same network. If any device is compromised, everything is reachable.

VLANs create virtual network segments that act like separate physical networks. Firewall rules control what traffic can flow between them.

Recommended VLAN Layout

  • VLAN 10 — Management (10.10.10.0/24): Proxmox hosts, switches, access points, router admin. Restricted access
  • VLAN 20 — Servers (10.10.20.0/24): Docker hosts, VMs running services
  • VLAN 30 — Trusted (10.10.30.0/24): Your personal devices (laptop, phone, desktop)
  • VLAN 40 — IoT (10.10.40.0/24): Smart home devices, cameras, thermostats. Internet access only — no access to other VLANs
  • VLAN 50 — Guest (10.10.50.0/24): Guest WiFi. Internet only, nothing else

Firewall Rules (pfSense/OPNsense)

The basic principle: deny everything by default, then allow specific traffic:

  • Trusted → Servers: Allow (you need to access your services)
  • Trusted → Management: Allow (you need to admin your infrastructure)
  • IoT → Internet: Allow (smart devices need cloud access)
  • IoT → anything else: Deny (contain compromised devices)
  • Guest → Internet: Allow
  • Guest → anything else: Deny
  • Servers → Internet: Allow specific (updates, API calls)

Hardware Requirements

  • A managed switch that supports 802.1Q VLANs (TP-Link, Ubiquiti, Mikrotik)
  • A router/firewall that can route between VLANs (pfSense, OPNsense, Mikrotik)
  • WiFi access points that support multiple SSIDs mapped to VLANs
Start simple: Even just separating IoT from everything else is a massive security improvement. You can add more VLANs later as your network grows.
Reverse proxy with Nginx Proxy Manager Caddy SSL and wildcard subdomains
9

Reverse Proxy

Nginx Proxy Manager, Caddy, SSL certificates, and wildcard subdomains. Access all your services via clean URLs with automatic HTTPS instead of remembering IP:port combos.

What a Reverse Proxy Does

Instead of accessing services by IP and port (192.168.1.50:8096), a reverse proxy lets you use clean URLs (jellyfin.home.lab) with automatic HTTPS. One entry point (ports 80/443) routes to the correct backend based on the hostname.

Option 1: Nginx Proxy Manager (NPM)

The easiest option for homelabs. Web-based GUI for configuring Nginx reverse proxy:

  • Point-and-click SSL certificate management (Let’s Encrypt)
  • Wildcard certificate support via DNS challenge
  • Access lists for restricting services to specific IPs
  • Custom Nginx config snippets for advanced use cases

Option 2: Caddy

Caddy automatically provisions HTTPS for every site. Zero configuration for SSL:

  • Automatic Let’s Encrypt certificates (no manual setup)
  • Simple Caddyfile syntax (5 lines vs 50 for Nginx)
  • Automatic HTTP → HTTPS redirect
  • Built-in reverse proxy, file server, and load balancer

Wildcard Certificates

Instead of getting a separate certificate for each subdomain, get one wildcard cert (*.home.yourdomain.com) that covers everything:

  1. Use the DNS challenge method (not HTTP challenge)
  2. Configure your DNS provider’s API credentials in NPM or Caddy
  3. One cert covers grafana.home.yourdomain.com, jellyfin.home.yourdomain.com, etc.

Internal vs External Access

  • Internal only: Split-brain DNS (your local DNS resolves *.home.lab to your proxy’s LAN IP). Services never touch the internet
  • External access: Use Cloudflare Tunnels (Guide 4) or port forward 443 to your proxy. Always use authentication (Authelia, Authentik) for public-facing services
Never expose a service to the internet without authentication. Even “harmless” services get probed by bots within minutes of going public. Use SSO (Authelia, Authentik) or at minimum HTTP basic auth.
GPU passthrough with IOMMU groups PCIe for AI inference and gaming
10

GPU Passthrough

IOMMU groups, PCIe passthrough for AI inference and gaming VMs, VFIO configuration, and troubleshooting. Give a virtual machine direct access to your GPU for near-native performance.

What Is GPU Passthrough?

GPU passthrough (PCI passthrough) gives a virtual machine direct, exclusive access to a physical GPU. The VM sees the GPU as if it were running on bare metal. This enables:

  • Gaming VMs — run Windows games at near-native performance from a Linux host
  • AI inference — run Ollama, llama.cpp, or Stable Diffusion with full GPU acceleration in a VM or container
  • Transcoding — hardware-accelerated video transcoding for Jellyfin/Plex

Requirements

  • CPU with IOMMU support: Intel VT-d or AMD-Vi. Most modern CPUs have this
  • Motherboard with good IOMMU groups: each device should be in its own group. Consumer boards are hit-or-miss
  • Two GPUs: one for the host (can be integrated/iGPU), one for passthrough. The passed-through GPU is exclusively owned by the VM

Setup on Proxmox

  1. Enable IOMMU in BIOS (Intel VT-d / AMD-Vi / IOMMU)
  2. Add kernel parameters: intel_iommu=on iommu=pt (or amd_iommu=on) to /etc/default/grub
  3. Load VFIO modules: add vfio vfio_iommu_type1 vfio_pci vfio_virqfd to /etc/modules
  4. Blacklist the GPU driver: prevent the host from claiming the GPU (blacklist nouveau or blacklist nvidia)
  5. Bind the GPU to VFIO: add the GPU’s PCI IDs to /etc/modprobe.d/vfio.conf
  6. Add the PCI device to your VM in Proxmox (Hardware → Add → PCI Device)

Common Issues

  • IOMMU group too large: the GPU shares a group with other devices. Use the ACS override patch (risky) or try a different PCIe slot
  • Code 43 (NVIDIA): NVIDIA drivers detect they’re in a VM and refuse to work. Add args: -cpu 'host,hv_vendor_id=proxmox' to the VM config
  • Reset bug: some GPUs don’t reset properly when the VM shuts down, requiring a host reboot. AMD GPUs are generally better at this than NVIDIA
For AI workloads: You don’t always need full passthrough. NVIDIA GPUs support vGPU (with a license) or you can use Docker with --gpus all to share the GPU between containers without passthrough. Much simpler for inference workloads.
Home Assistant with Zigbee Z-Wave MQTT automations and dashboards
11

Home Automation

Home Assistant, Zigbee/Z-Wave, MQTT, automations, and dashboards. Turn your homelab into a smart home that actually works — locally, privately, without cloud dependencies.

Why Home Assistant?

Home Assistant (HA) is an open-source home automation platform that runs locally. It integrates with 2,000+ devices and services. Unlike Google Home or Alexa, your data stays in your house, and automations work even when the internet is down.

Installation Options

  • Home Assistant OS (HAOS) — dedicated VM or Raspberry Pi. Includes Supervisor for easy add-on management. The recommended option for most people
  • Docker container — runs alongside other containers. No Supervisor (install add-ons manually). For experienced Docker users who want more control
  • Proxmox VM — run HAOS as a VM on your Proxmox host. Best of both worlds: dedicated environment with Supervisor, plus the flexibility of virtualization

Protocols: Zigbee, Z-Wave, WiFi

  • Zigbee — low-power mesh network. Devices relay signals through each other. Needs a coordinator (SONOFF Zigbee 3.0 dongle, $15). Best protocol for most smart home devices. Use with Zigbee2MQTT for maximum compatibility
  • Z-Wave — similar to Zigbee but with a different frequency band (fewer interference issues). More expensive devices but very reliable. Good for locks and critical devices
  • WiFi — easy setup but each device adds load to your WiFi network. Devices using ESPHome (custom firmware) are excellent because they operate locally without cloud
  • Matter/Thread — the new standard backed by Apple, Google, Amazon. Still maturing but promises universal compatibility

MQTT

MQTT is a lightweight messaging protocol that many IoT devices use to communicate. Run a broker (Mosquitto) and devices publish/subscribe to topics:

  • zigbee2mqtt/living_room/temperature → sensor publishes temperature readings
  • homeassistant/light/bedroom/set → HA publishes to turn on a light
  • Decouples devices from each other — any device can listen to any topic

Automations That Actually Help

  • Turn on lights at sunset, off at bedtime (based on your phone’s presence)
  • Send a notification if a door is left open for more than 5 minutes
  • Adjust thermostat based on occupancy (motion sensors + phone presence)
  • Turn off everything when the last person leaves (presence detection)
  • Morning routine: lights at 30% → coffee machine on → weather briefing on a dashboard
Put all smart home devices on the IoT VLAN (Guide 8). Many cheap IoT devices phone home to servers in China with zero security. Segment them from your main network and only allow internet access, not LAN access.
Press / to search