π Rate Limiting Explained: Protecting Systems from Overload
π§ What is Rate Limiting?
Rate limiting is a mechanism used to control the number of requests a client can make to a server within a specific time frame.
It helps:
Prevent server overload
Protect against abuse (spam, brute force attacks)
Ensure fair usage across users
Rate limiting can be applied:
π Globally (server-level)
π€ Per user / per API key
βοΈ Why Rate Limiting Matters
Without rate limiting:
A single user could overwhelm the system
APIs could become unavailable
Infrastructure costs could spike
With rate limiting:
System stays stable β
Performance remains predictable β
π¦ Types of Rate Limiting Algorithms
1οΈβ£ Token Bucket Algorithm
π§© How it Works
Imagine a bucket with a maximum capacity of tokens
Tokens are added at a fixed rate (refill rate)
Each request consumes 1 token
If tokens are available β request is allowed
If empty β request is rejected
β Example
Bucket size = 5 tokens
Refill rate = 3 tokens/second
If requests come:
First 5 requests β allowed immediately
Next requests β allowed only as tokens refill
π Pros
Allows bursty traffic
Smooth handling of spikes
Simple to implement
π Cons
- Hard to tune bucket size & refill rate
2οΈβ£ Leaky Bucket Algorithm
π§© How it Works
Requests are added to a queue (bucket)
Processed at a constant rate (fixed outflow)
If the queue is full β new requests are rejected
π Behavior
Incoming traffic may be bursty
Outgoing traffic is always smooth and constant
π Pros
Ensures steady request processing rate
Prevents sudden spikes downstream
π Cons
Queue overflow β request drops
Increased latency during bursts
Requires tuning queue size & processing rate
3οΈβ£ Fixed Window Counter
π§© How it Works
Time is divided into fixed windows (e.g., 1 min)
A counter tracks requests in that window
Counter resets at the start of each new window
β Example
- Limit = 5 requests per 3 seconds
Requests:
Window 1 β 5 requests allowed
Window 2 β counter resets β again 5 allowed
β οΈ Problem: Boundary Issue
At window edges:
End of window β burst of requests
Start of next window β another burst
π This allows more requests than expected in a short time
π Pros
Very simple
Low memory usage
π Cons
- Unfair spikes at boundaries
4οΈβ£ Sliding Window Log
π§© How it Works
Store timestamps of each request
On every new request:
Remove timestamps older than the window
Count remaining requests
Allow/reject based on the limit
β Example
Limit: 2 requests per minute
1:00:10 β allowed (log stored)
1:00:20 β allowed (log stored)
1:00:30 β β rejected (limit reached, not logged)
1:01:40 β old logs removed β request allowed
π Pros
Accurate rate limiting
No boundary issue
π Cons
High memory usage (stores timestamps)
Expensive for large-scale systems
5οΈβ£ Sliding Window Counter (Optimized Version)
π§© How it Works
Instead of storing all timestamps:
Divide time into the current and previous windows
Use a weighted calculation
π Formula Insight
If request comes at 30% into current window:
Consider:
30% of current window requests
70% of previous window requests
π Approximate total requests:
effective_count =
(current_window_request_count) +
(previous_window_request_count Γ overlap_ratio)
Hereβs a clear, interview-ready example you can directly paste into your blog π
π Example: Sliding Window Counter
βοΈ Setup
Limit = 10 requests per minute
Window size = 60 seconds
Current request comes at 30% into the current window (i.e., at 18 seconds)
π¦ Data
Previous window (last 60s): 10 requests
Current window (so far): 3 requests
Current time position = 30% into window
π§ Calculation
We apply the formula:
effective_count =
(current_window_count) +
(previous_window_count Γ overlap_ratio)
Here:
- Overlap ratio = remaining portion of previous window = 70% = 0.7
So:
effective_count = 3 + (10 Γ 0.7)
= 3 + 7
= 10
π¦ Decision
Limit = 10
Effective count = 10
π This request will be β rejected (limit reached)
π― Intuition (Why this works)
Even though we are in a new window, recent traffic from the previous window still affects us
Instead of a sudden reset (like fixed window), we gradually decay past requests
This prevents burst attacks at window edges
β‘ Quick Variation
If current window had only 2 requests:
effective_count = 2 + (10 Γ 0.7)
= 2 + 7
= 9
π β Request allowed
π Pros
Memory efficient
More accurate than a fixed window
Scales better than the sliding log
π Cons
- Slight approximation (not 100% exact)
βοΈ Comparison Summary
| Algorithm | Accuracy | Memory | Burst Handling | Complexity |
|---|---|---|---|---|
| Token Bucket | Medium | Low | β Yes | Easy |
| Leaky Bucket | Medium | Low | β No (smooth) | Easy |
| Fixed Window | Low | Very Low | β No | Very Easy |
| Sliding Window Log | High | High | β Yes | Medium |
| Sliding Window Count | High | Medium | β Yes | Medium |
π Final Thoughts
Use Token Bucket β when bursts are acceptable
Use Leaky Bucket β when a constant processing rate is required
Use Fixed Window β when simplicity and low overhead matter more than accuracy
Use Sliding Window Log β when accuracy matters most
Use Sliding Window Counter β best balance for production systems
π‘ Real-World Tip
Most production systems (like APIs at scale) prefer:
Token Bucket or
Sliding Window Counter
Because they balance: π performance + memory + accuracy