Building Shoal — copium.supply

I'd wanted a Framework Desktop ever since they were announced, but couldn't justify it. That was until I managed to convince myself this was a great way to dive into doing local AI inference. Convinced, I made the pre-order and eagerly awaited for it to arrive. To tide myself over I pored over pre-release reviews and dove into subreddits, forums, and blog posts looking forward to all the things I'd be able to do with it.

Then it finally arrived...

Desktop Purpose

Yup that's right, I installed bazzite on it and used it as an HTPC to play games.

Why the sudden loss of motivation? Well, it was realising that even if I set everything up on the machine it was still going to be a pain to manage the inference server. Sure I could set up something simple on my local network using llama-server or ollama or whatever else, but it would still only be available on my local network. Sure I could port-forward it, open it up to the internet, work to secure it, or use a VPN, but it honestly still felt like more pain than it was worth. Plus I also wanted it to be easily shareable with others in a secure way, and getting others to VPN into my local network was just too much to think about managing.

At work building suga, I use claude code quite a bit but have recently been dabbling in using opencode with local models on my laptop. It works reasonably well but is awfully slow, and my thoughts returned to how much more capable the sunk investment resting atop my Living Room TV unit would be, and how perhaps it could make my dreams of fully local inference a reality.

This got me thinking back to what demotivated me in the first place, and I went back to do some research to see if there was finally a solution to my problem... and there wasn't yet. So I decided that instead of continuing to use my Steam Machine to chase ascension 10 in StS2, I would finally get around to solving that problem.

I had a singular vision in mind for how I wanted it to be: a stupid simple self-hosted server setup that could help securely expose my machine to the internet, and a stupid simple single-command way for my Desktop to dial into that server.

npx @shoal.sh/cli worker start --server wss://myserver.shoal.sh/ws/workers

That's it. A simple proxy that will forward requests to an OpenAI-compatible backend. The problem of inference itself had already been solved for me. All I was focused on was a way to get this so I could not only access it from anywhere, but I could also potentially scale it out with other hardware I had lying around as well.

Naturally there will be auth and backend specification involved, but this is the vision I started with.

The trick is the worker dials out. There's a shoal server sitting on the public internet, and the worker opens a WebSocket to it. No port forward, no firewall rule, no VPN, no thinking about it. Clients hit the server over HTTPS at an OpenAI-compatible endpoint, the server picks a worker, dispatches the job, and streams the response back through. From opencode's point of view it's just another /v1 it's pointed at. It has no idea the model is on a box in my lounge room.

The same trick ngrok and Tailscale Funnel have been doing for years, just applied to inference.

So I tossed it together. A tanstack start server, postgres (or pglite if you want the whole thing in a single container), better-auth for tokens and tenancy. I even put together a basic website to try and convey to others the reason I'd built it in the first place.

Also I'd been playing with WebGL and had what I thought was a cool idea for a landing page animation... so why not.

And that's basically where it is now. As I'm writing this, opencode is open in another window of this laptop, talking to a model running on the Framework Desktop in my lounge room, with nothing port-forwarded and no VPN in sight. The 128GB of RAM is finally doing the job I bought it for, and the Steam Machine is... still a steam machine (I've got scripts that stop the shoal worker when I start my Steam games).

Check out the repo if you're interested. It's pretty straightforward to start and get going (at least I think so), and is a great way to use old hardware that might be gathering dust somewhere or is currently underutilized.