How we built browser SSH for cloud VMs without a backend agent
CloudNx's CloudShell terminal opens an SSH session right in your browser without installing anything on the VM. Here's the architecture and the trade-offs.
Most cloud providers ship browser SSH the same way: a small daemon runs on every VM, connects out to the control plane over a websocket, and proxies a PTY through it. It works. It also means every customer VM has an extra process you can't audit, a long-lived outbound connection, and a vector for the platform to escalate into the tenant.
CloudNx CloudShell deliberately runs no agent inside the customer VM. Here's how it works and what we gave up to get there.
The constraint
Browser SSH for us has to:
- Open in <2 seconds from
Consoleclick to a usable prompt. - Be auditable end-to-end — every command logged with the right
user_id. - Add zero attack surface inside the VM. No daemon, no listener, no out-of-band tunnel.
- Survive across customer-initiated reboots without re-onboarding.
The architecture
We exploit the fact that we already operate the host (Proxmox). Every customer VM has a private IP on 10.10.0.0/24and an SSH port-forward managed by the NAT reconciler. The host can SSH into the VM directly using a platform key that we install during provisioning — once, idempotent, scoped to cnx-platform.
When the user clicks Console, the compute service:
- Validates the JWT and the user's IAM permission for that instance.
- Mints a short-lived (5 min) access token for the websocket.
- Spawns a containerized SSH client that connects to the VM with the platform key, pty + stderr captured.
- Pipes the websocket to the SSH client's stdin/stdout. Browser ↔ websocket ↔ SSH ↔ VM.
What we gave up
Resilience-to-host-compromise. If our host is breached, the attacker has the platform key — they can SSH into every customer VM. We document this. The agent-based approach has the same weakness via the daemon's outbound channel, but at least the tenant could observe traffic from inside; ours is fully invisible to them.
We accept this trade because the alternative — running an agent — doesn't actually solve the underlying trust assumption (you already trust the platform to run your code). It just adds surface.
What we got
Sub-second open times. Zero footprint inside customer VMs. Audit logs that match exactly what was typed because the host wraps every command. And one less thing to maintain across kernel versions, distros, and customer-installed firewalls.
If you've ever debugged a flaky ssm-agent on Amazon Linux 2 or chased an out-of-memory ec2-instance-connect, this trade-off probably makes sense to you.