All posts
Engineering

Scaling to 10K Concurrent Users on a Budget

Edge caching, queues, and read replicas — practical patterns from a real project.

Rohit IyerStaff Engineer Mar 28, 2026 10 min read
Scaling to 10K Concurrent Users on a Budget

A fintech client needed to handle 10,000 concurrent users for a Friday product launch — on a $400/month infrastructure budget. We hit the target with room to spare. Here's exactly how, with the trade-offs we accepted.

Edge caching does 70% of the work

Before optimizing anything, we audited which routes were truly user-specific. Turns out only the dashboard and checkout flow were. Every marketing page, pricing card, and product image got pushed to Cloudflare's edge with aggressive cache headers. Origin traffic dropped 92% overnight.

Queues for everything async

Sign-up emails, webhook deliveries, analytics events, PDF generation — all of it moved to a queue with a single worker process. Latency on the request path collapsed from 800ms to 40ms. The queue handled bursts the database never had to see.

One read replica, used correctly

We added a single Postgres read replica and routed every analytics query, every dashboard widget, and every list view through it. Writes stayed on primary. The replica cost $60/month and absorbed 80% of database load.

What we deliberately didn't do

No Kubernetes. No service mesh. No multi-region active-active. These add complexity that small teams cannot operate. Boring infrastructure run well beats clever infrastructure run poorly — every single time.

R

Rohit Iyer

Staff Engineer at InfotechZone

Want to build with us?

We help teams ship AI-first products, faster. Tell us what you're working on.

Start a conversation