π How Many Requests Per Second Can a Server or Database Handle?
π§ The Big Question
How many requests per second (RPS) can a server or database handle?
At first, it feels like there should be a fixed number.
But in reality:
β There is no universal RPS number
β It depends on workload, architecture, and optimizations
βοΈ Why Thereβs No Fixed RPS
RPS depends on multiple factors:
Request complexity (simple vs heavy logic)
Number of DB queries per request
Query efficiency (indexed vs full scan)
Response size (small JSON vs large payload)
Caching (huge impact π)
Tech stack (Node, Django, FastAPI, etc.)
π₯οΈ Server Capacity (2 CPU, 4GB RAM)
π Realistic Estimates
| Scenario | Approx RPS |
|---|---|
| Simple API (no DB / cached) | 1000 β 2000 |
| Typical API (with DB calls) | 200 β 800 |
| Heavy logic / multiple queries | 50 β 300 |
π‘ Key Takeaway
A 2 CPU, 4GB server can handle anywhere from:
π ~100 to ~2000 RPS depending on workload
For typical DB-backed APIs:
π ~300β500 RPS is a safe assumption
ποΈ Database Capacity (2 CPU, 4GB RAM)
π Rough Estimates
| Operation | Approx QPS |
|---|---|
| Reads (indexed) | 500 β 1500 |
| Writes | 100 β 500 |
| Complex queries | 50 β 300 |
β οΈ Important Insight
π₯ Database is usually the bottleneck
Even if:
Server can handle β 1000 RPS
DB can handle β 200 RPS
π Your system is limited to 200 RPS
π Can Databases Be Horizontally Scaled?
π Yes β but itβs more complex than scaling servers.
πΌ 1. Vertical Scaling (Simplest)
Increase CPU / RAM
Easy to implement
Limited by hardware and cost
π 2. Read Replicas (Most Common First Step)
Writes β Primary DB
Reads β Replica DBs
Pros:
- β Reduces read load
Cons:
- β οΈ Eventual consistency (replication lag)
π 3. Sharding (True Horizontal Scaling)
Example:
userId % 3 β DB1 / DB2 / DB3
Pros:
- β Distributes data across multiple DBs
Challenges:
β οΈ Complex queries
β οΈ Cross-shard joins
β οΈ Rebalancing data
π Real-World Scaling Strategy
Optimize queries
Add caching (Redis)
Add read replicas
Scale app servers
Sharding (last resort)
π‘ Why Sharding Is Last
Hard to maintain
Complex application logic
Expensive to rebalance
π§ͺ The Only Reliable Method: Load Testing
All numbers above are assumptions.
π Real capacity comes from testing
π οΈ Tools
k6
JMeter
Locust
π¬ Process
Deploy your system
Start with low RPS (e.g., 50)
Gradually increase load
Monitor:
Latency
CPU usage
Error rate
DB connections
π Example
800 RPS β latency spikes β
600 RPS β stable β
π Safe capacity β 500 RPS
π§ Mental Model
Max RPS =
min(
App server capacity,
DB capacity,
Network limits
)
π‘ Final Takeaways
β Donβt say: βThis system handles X RPSβ
β Say: βIt depends, we estimate and validate with load testingβ
π― Summary
A 2 CPU, 4GB server can typically handle around 300β500 RPS for DB-backed APIs, while lightweight endpoints can go higher. Databases are usually the bottleneck, handling around:
100β500 writes/sec
500β1500 reads/sec
These are rough estimates β real capacity is determined through load testing.
For scaling, start with:
π Caching β Read replicas β App scaling β Sharding (last)