🚀 How Many Requests Per Second Can a Server or Database Handle?

🧠 The Big Question

How many requests per second (RPS) can a server or database handle?

At first, it feels like there should be a fixed number.

But in reality:

❌ There is no universal RPS number
✅ It depends on workload, architecture, and optimizations

⚙️ Why There’s No Fixed RPS

RPS depends on multiple factors:

Request complexity (simple vs heavy logic)
Number of DB queries per request
Query efficiency (indexed vs full scan)
Response size (small JSON vs large payload)
Caching (huge impact 🚀)
Tech stack (Node, Django, FastAPI, etc.)

🖥️ Server Capacity (2 CPU, 4GB RAM)

📊 Realistic Estimates

Scenario	Approx RPS
Simple API (no DB / cached)	1000 – 2000
Typical API (with DB calls)	200 – 800
Heavy logic / multiple queries	50 – 300

💡 Key Takeaway

A 2 CPU, 4GB server can handle anywhere from:

👉 ~100 to ~2000 RPS depending on workload

For typical DB-backed APIs:

👉 ~300–500 RPS is a safe assumption

🗄️ Database Capacity (2 CPU, 4GB RAM)

📊 Rough Estimates

Operation	Approx QPS
Reads (indexed)	500 – 1500
Writes	100 – 500
Complex queries	50 – 300

⚠️ Important Insight

💥 Database is usually the bottleneck

Even if:

Server can handle → 1000 RPS
DB can handle → 200 RPS

👉 Your system is limited to 200 RPS

🔁 Can Databases Be Horizontally Scaled?

👉 Yes — but it’s more complex than scaling servers.

🔼 1. Vertical Scaling (Simplest)

Increase CPU / RAM
Easy to implement
Limited by hardware and cost

🔄 2. Read Replicas (Most Common First Step)

Writes → Primary DB
Reads → Replica DBs

Pros:

✅ Reduces read load

Cons:

⚠️ Eventual consistency (replication lag)

🔀 3. Sharding (True Horizontal Scaling)

Example:

userId % 3 → DB1 / DB2 / DB3

Pros:

✅ Distributes data across multiple DBs

Challenges:

⚠️ Complex queries
⚠️ Cross-shard joins
⚠️ Rebalancing data

🚀 Real-World Scaling Strategy

Optimize queries
Add caching (Redis)
Add read replicas
Scale app servers
Sharding (last resort)

💡 Why Sharding Is Last

Hard to maintain
Complex application logic
Expensive to rebalance

🧪 The Only Reliable Method: Load Testing

All numbers above are assumptions.

👉 Real capacity comes from testing

🛠️ Tools

k6
JMeter
Locust

🔬 Process

Deploy your system
Start with low RPS (e.g., 50)
Gradually increase load
Monitor:
- Latency
- CPU usage
- Error rate
- DB connections

📈 Example

800 RPS → latency spikes ❌
600 RPS → stable ✅

👉 Safe capacity ≈ 500 RPS

🧠 Mental Model

Max RPS =
  min(
    App server capacity,
    DB capacity,
    Network limits
  )

💡 Final Takeaways

❌ Don’t say: “This system handles X RPS”
✅ Say: “It depends, we estimate and validate with load testing”

🎯 Summary

A 2 CPU, 4GB server can typically handle around 300–500 RPS for DB-backed APIs, while lightweight endpoints can go higher. Databases are usually the bottleneck, handling around:

100–500 writes/sec
500–1500 reads/sec

These are rough estimates — real capacity is determined through load testing.

For scaling, start with:
👉 Caching → Read replicas → App scaling → Sharding (last)

🚀 How Many Requests Per Second Can a Server or Database Handle?

🧠 The Big Question

⚙️ Why There’s No Fixed RPS

🖥️ Server Capacity (2 CPU, 4GB RAM)

📊 Realistic Estimates

💡 Key Takeaway

🗄️ Database Capacity (2 CPU, 4GB RAM)

📊 Rough Estimates

⚠️ Important Insight

🔁 Can Databases Be Horizontally Scaled?

🔼 1. Vertical Scaling (Simplest)

🔄 2. Read Replicas (Most Common First Step)

🔀 3. Sharding (True Horizontal Scaling)

🚀 Real-World Scaling Strategy

💡 Why Sharding Is Last

🧪 The Only Reliable Method: Load Testing

🛠️ Tools

🔬 Process

📈 Example

🧠 Mental Model

💡 Final Takeaways

🎯 Summary

Comments

More from this blog

🚀 How Video Streaming Works (From FPS to Adaptive Bitrate)

🚀 Rate Limiting Explained: Protecting Systems from Overload

Consistent Hashing Explained

🌀 Behind the Scenes of Asynchronous JavaScript: The Event Loop in Depth

Command Palette

🧠 The Big Question

⚙️ Why There’s No Fixed RPS

🖥️ Server Capacity (2 CPU, 4GB RAM)

📊 Realistic Estimates

💡 Key Takeaway

🗄️ Database Capacity (2 CPU, 4GB RAM)

📊 Rough Estimates

⚠️ Important Insight

🔁 Can Databases Be Horizontally Scaled?

🔼 1. Vertical Scaling (Simplest)

🔄 2. Read Replicas (Most Common First Step)

🔀 3. Sharding (True Horizontal Scaling)

🚀 Real-World Scaling Strategy

💡 Why Sharding Is Last

🧪 The Only Reliable Method: Load Testing

🛠️ Tools

🔬 Process

📈 Example

🧠 Mental Model

💡 Final Takeaways

🎯 Summary

Comments

More from this blog