I’ve been running Docker Swarm with a Traefik ingress in my homelab for years now, starting with a single node and slowly increasing to a 3 node cluster.
My aim for a lot of my homelab experiments is for high automation, as I don’t have a lot of time to futz with setting up and managing each service I want to test. In service of that goal I use a wildcard DNS entry on Cloudflare with a matching Let’s Encrypt wildcard certificate in Traefik. Since many of these services are in testing I’m also not comfortable exposing most of those services to the public, so I point everything to the Tailscale IP address of the origical Docker Swarm node that started everything. This was fine for a long time, but that node is >10 years old now and has started to experience some minor failures leading to occasional downtime recently; which got me thinking, “I wish I could easily distribute traffic across these nodes without all of the complexity of setting up separate load-balancing infrastructure”
Luckily, soon after that thought I saw a post about Tailscale’s new Services offering, a way to publish a new
entry to your Tailnet (your individual Tailscale network with its own XXX.ts.net subdomain) that can be backed by multiple
real servers and in addition to basic load-splitting also offers to use network proximity as a factor in directing traffic. 💡
This seemed like a perfect fit for my use case so I dove right in:
- I created a new service through the Tailscale admin console,
- I pointed my DNS wildcard at the service’s IP address,
- and I registered the service with the Tailscale daemon on my initial worker node by running
sudo tailscale serve --service=svc:my_traefik_service_name https+insecure://localhost:443 After a quick test I tried loading one of my services in a browser and ran into a major issue: the Tailscale daemon was terminating my TLS connection – it handled requests using a certificate issued to my Tailnet domain, not the public domain I wanted to use 😤
If this was a greenfield project I could have adopted the Tailnet domain and been done; however, in the last decade I’ve accrued thousands of references to the current domain spread across files on a dozen systems. With the limited time I have available to keep my homelab running this wasn’t feasible.
Going back and properly reading the documentation, I discovered that in addition to acting as an HTTP reverse proxy the Tailscale daemon can also be configured to forward raw TCP, which I hoped would allow Traefik terminate TLS using its existing domains and certificate. 🤞
After running
sudo tailscale serve --service=svc:my_traefik_service_name https+insecure://localhost --tcp 443 tcp://localhost:443 on the first node I reloaded my browser and much to my show everything loaded as expected 🎉
After quickly logging into each Docker Swarm node and running the same command I was able generate some traffic and observe them spread across the available nodes 🙏
I’ve still got a handful of services I expose publicly that I need to figure out; maybe using floating virtual IPs and keepalived? 🤔
