Building a Cloud Platform From Scratch
2025-04-01
If you've ever tried to build your own cloud platform — from scratch — you know the sheer scale of things you have to understand and implement. What starts as "let's spin up some VMs" quickly becomes a deep dive into isolation, routing, DHCP, NAT, DNS, overlays, and more. This post is my attempt to document what it's like to walk into the fire of virtual networking using VMware, and how I'm carving out a path that works — sometimes painfully — for Carpathian Cloud.
The Beginning: Provisioning VMs with VMware
I’m using VMware on Ubuntu to provision and manage virtual machines. My storage is mapped, my images are templated, and things were booting. But that was the easy part. Networking was the black box I was about to crack open.
Initially, all VMs were on the same subnet. They could all see each other. It was fast. It worked. But I knew this wouldn’t scale — or be secure — especially as I started bringing in beta testers.
So I started looking into isolation.
Finding IPs Without VMware Tools
Without reliable VMware Tools integration inside the guest OS, I had to rely on tools like:
arp -a
nmap -sn 192.168.x.0/24
arp-scan --interface=vmnetX --localnet
Useful, but hacky. This was a temporary band-aid. I wanted to know: How do I properly route and isolate traffic between tenants while still giving them internet access?
Enter the Virtual Network Maze
VMware gives you vmnet0
(bridged), vmnet1
(host-only), and vmnet8
(NAT). These are the basics. I needed multiple NAT-enabled, isolated networks. One per tenant.
Here’s what I learned:
- VMware Workstation/Fusion only supports one NAT network by default
- Custom NAT/DHCP setups require manually creating directories like
/etc/vmware/vmnet9/nat/nat.conf
- VMware's tooling isn’t aware of these unless you wire everything up manually
So I did.
Manual Networking: A Deep Dive
I created new vmnets (like vmnet9
) with their own NAT and DHCP:
sudo mkdir -p /etc/vmware/vmnet9/nat /etc/vmware/vmnet9/dhcpd
I wrote my own nat.conf
and dhcpd.conf
, carefully matching subnets and MAC addresses. I restarted the network services:
sudo /usr/bin/vmware-networks --stop
sudo /usr/bin/vmware-networks --start
This... sometimes worked. Often I’d see:
Failed to start NAT service on vmnet9
Failed to start DHCP service on vmnet9
So I learned about vmnet-natd
and vmnet-dhcpd
. Turns out, you can launch these manually, if you know where the binaries live. In my case:
/usr/bin/vmnet-natd
/usr/bin/vmnet-dhcpd
Command-line flags were finicky. Case-sensitive config values. File permissions that had to be just right. At one point, I spent an hour figuring out that VMware was rejecting my NAT service because of a lowercase MAC address mismatch.
Router VMs vs Host Routing
I debated this for days:
- Should I spin up a router VM per tenant?
- Or should the hypervisor itself do NAT and routing for all
vmnetX
networks?
Ultimately, I chose to run NAT and DHCP directly on the hypervisor — for now. It simplified things, and I didn’t have to waste resources spinning up a full router per user.
That said, I know that long-term, per-tenant router VMs make user migration between hypervisors easier. It’s on the roadmap.
What’s Working Now
Today, every VM I provision:
- Gets assigned a
vmnetX
based on the user’s plan - Uses host NAT to reach the internet
- Gets DNS from DHCP (Google’s 8.8.8.8 for now)
- Can’t expose random services to the internet (firewall + NGINX-controlled)
All ingress is through a central NGINX reverse proxy. I assign ports for app forwarding, or default to 443/80 for websites. Users can’t expose anything arbitrarily.
That means: no unsolicited SSH, no rogue FTP, no open databases.
But There Are Still Problems
- Everything is still on the same subnet. For now, I trust my beta testers, but I know it’s not sustainable.
- I'm still using password SSH. This will change — I plan to switch to key-only very soon.
- No 2FA yet. It’s needed for the dashboard and terminal access web tools.
- VMs can sniff or scan each other. I haven't enforced egress or lateral firewalls yet.
That’s okay — because I know what to fix, when to fix it, and why.
Security Roadmap
I’ve put together a Security Roadmap that walks through:
- Phase 1: Lock down SSH and traffic
- Phase 2: Per-tenant networks and DHCP automation
- Phase 3: Overlay networks + advanced monitoring
My rule is: once I hit 10 users — or paid customers — I lock down everything internally. For now, trust is my firewall. But that won’t last.
What I Learned (And Maybe You Will Too)
- VMware gives you building blocks — not a platform.
- Manual NAT and DHCP are possible — but brittle.
- Everything works, eventually, if you read every log and inspect every config.
- Networking isn't magic — it's just structured communication.
- You don’t need OpenStack to build something real.
Clouds aren’t born — they’re forged in hours of tcpdump sessions, failed NAT startups, and aha moments when you realize why ARP replies aren’t propagating.
Final Thoughts
Yes, it’s complex. Yes, I had to build more than I thought. But honestly? It’s kind of amazing to step back and say:
I built a cloud platform where users can deploy VMs, run apps, get isolated networks, and access the internet — all from a bash script and a vision.
Carpathian isn’t AWS. But it’s mine. And it works.
If you’re walking a similar road, I hope this helps you skip some of the potholes — or at least gives you comfort that someone else fell into them first.
Thanks for reading.
– Sam, Founder of Carpathian