I like local models because they make experimentation feel cheap. No dashboard ceremony, no waiting for permission, no worrying that a half-formed idea is burning money in the background.
The stack I keep coming back to is simple: Ollama for inference, Caddy for reverse proxy and auth, and Cloudflare Tunnel when I want to reach the machine from outside the house.
The shape of it
Ollama runs on the local machine and exposes the model API. Caddy sits in front of it, handles HTTPS, and adds a bearer-token check. Cloudflare Tunnel gives it a stable public route without opening inbound ports on the network.
That keeps the ergonomics good. Tools can talk to a normal HTTPS endpoint, but the actual compute stays on hardware I control.
Why this setup works
It is small enough to understand and flexible enough to reuse. One endpoint can support terminal experiments, local agents, quick scripts, and prototype apps. The important thing is that the infrastructure disappears into the background, which is exactly where infrastructure belongs when you are trying to build something else.