Skip to main content

Command Palette

Search for a command to run...

πŸš€ How Many Requests Per Second Can a Server or Database Handle?

Updated
β€’4 min read

🧠 The Big Question

How many requests per second (RPS) can a server or database handle?

At first, it feels like there should be a fixed number.

But in reality:

  • ❌ There is no universal RPS number

  • βœ… It depends on workload, architecture, and optimizations


βš™οΈ Why There’s No Fixed RPS

RPS depends on multiple factors:

  • Request complexity (simple vs heavy logic)

  • Number of DB queries per request

  • Query efficiency (indexed vs full scan)

  • Response size (small JSON vs large payload)

  • Caching (huge impact πŸš€)

  • Tech stack (Node, Django, FastAPI, etc.)


πŸ–₯️ Server Capacity (2 CPU, 4GB RAM)

πŸ“Š Realistic Estimates

Scenario Approx RPS
Simple API (no DB / cached) 1000 – 2000
Typical API (with DB calls) 200 – 800
Heavy logic / multiple queries 50 – 300

πŸ’‘ Key Takeaway

A 2 CPU, 4GB server can handle anywhere from:

πŸ‘‰ ~100 to ~2000 RPS depending on workload

For typical DB-backed APIs:

πŸ‘‰ ~300–500 RPS is a safe assumption


πŸ—„οΈ Database Capacity (2 CPU, 4GB RAM)

πŸ“Š Rough Estimates

Operation Approx QPS
Reads (indexed) 500 – 1500
Writes 100 – 500
Complex queries 50 – 300

⚠️ Important Insight

πŸ’₯ Database is usually the bottleneck

Even if:

  • Server can handle β†’ 1000 RPS

  • DB can handle β†’ 200 RPS

πŸ‘‰ Your system is limited to 200 RPS


πŸ” Can Databases Be Horizontally Scaled?

πŸ‘‰ Yes β€” but it’s more complex than scaling servers.


πŸ”Ό 1. Vertical Scaling (Simplest)

  • Increase CPU / RAM

  • Easy to implement

  • Limited by hardware and cost


πŸ”„ 2. Read Replicas (Most Common First Step)

  • Writes β†’ Primary DB

  • Reads β†’ Replica DBs

Pros:

  • βœ… Reduces read load

Cons:

  • ⚠️ Eventual consistency (replication lag)

πŸ”€ 3. Sharding (True Horizontal Scaling)

Example:

userId % 3 β†’ DB1 / DB2 / DB3

Pros:

  • βœ… Distributes data across multiple DBs

Challenges:

  • ⚠️ Complex queries

  • ⚠️ Cross-shard joins

  • ⚠️ Rebalancing data


πŸš€ Real-World Scaling Strategy

  1. Optimize queries

  2. Add caching (Redis)

  3. Add read replicas

  4. Scale app servers

  5. Sharding (last resort)


πŸ’‘ Why Sharding Is Last

  • Hard to maintain

  • Complex application logic

  • Expensive to rebalance


πŸ§ͺ The Only Reliable Method: Load Testing

All numbers above are assumptions.

πŸ‘‰ Real capacity comes from testing


πŸ› οΈ Tools

  • k6

  • JMeter

  • Locust


πŸ”¬ Process

  1. Deploy your system

  2. Start with low RPS (e.g., 50)

  3. Gradually increase load

  4. Monitor:

    • Latency

    • CPU usage

    • Error rate

    • DB connections


πŸ“ˆ Example

  • 800 RPS β†’ latency spikes ❌

  • 600 RPS β†’ stable βœ…

πŸ‘‰ Safe capacity β‰ˆ 500 RPS


🧠 Mental Model

Max RPS =
  min(
    App server capacity,
    DB capacity,
    Network limits
  )

πŸ’‘ Final Takeaways

  • ❌ Don’t say: β€œThis system handles X RPS”

  • βœ… Say: β€œIt depends, we estimate and validate with load testing”


🎯 Summary

A 2 CPU, 4GB server can typically handle around 300–500 RPS for DB-backed APIs, while lightweight endpoints can go higher. Databases are usually the bottleneck, handling around:

  • 100–500 writes/sec

  • 500–1500 reads/sec

These are rough estimates β€” real capacity is determined through load testing.

For scaling, start with:
πŸ‘‰ Caching β†’ Read replicas β†’ App scaling β†’ Sharding (last)