🧠 What is Rate Limiting?

Rate limiting is a mechanism used to control the number of requests a client can make to a server within a specific time frame.

It helps:

Prevent server overload
Protect against abuse (spam, brute force attacks)
Ensure fair usage across users

Rate limiting can be applied:

🌍 Globally (server-level)
👤 Per user / per API key

⚙️ Why Rate Limiting Matters

Without rate limiting:

A single user could overwhelm the system
APIs could become unavailable
Infrastructure costs could spike

With rate limiting:

System stays stable ✅
Performance remains predictable ✅

📦 Types of Rate Limiting Algorithms

1️⃣ Token Bucket Algorithm

🧩 How it Works

Imagine a bucket with a maximum capacity of tokens
Tokens are added at a fixed rate (refill rate)
Each request consumes 1 token
If tokens are available → request is allowed
If empty → request is rejected

✅ Example

Bucket size = 5 tokens
Refill rate = 3 tokens/second

If requests come:

First 5 requests → allowed immediately
Next requests → allowed only as tokens refill

👍 Pros

Allows bursty traffic
Smooth handling of spikes
Simple to implement

👎 Cons

Hard to tune bucket size & refill rate

2️⃣ Leaky Bucket Algorithm

🧩 How it Works

Requests are added to a queue (bucket)
Processed at a constant rate (fixed outflow)
If the queue is full → new requests are rejected

📌 Behavior

Incoming traffic may be bursty
Outgoing traffic is always smooth and constant

👍 Pros

Ensures steady request processing rate
Prevents sudden spikes downstream

👎 Cons

Queue overflow → request drops
Increased latency during bursts
Requires tuning queue size & processing rate

3️⃣ Fixed Window Counter

🧩 How it Works

Time is divided into fixed windows (e.g., 1 min)
A counter tracks requests in that window
Counter resets at the start of each new window

✅ Example

Limit = 5 requests per 3 seconds

Requests:

Window 1 → 5 requests allowed
Window 2 → counter resets → again 5 allowed

⚠️ Problem: Boundary Issue

At window edges:

End of window → burst of requests
Start of next window → another burst

👉 This allows more requests than expected in a short time

👍 Pros

Very simple
Low memory usage

👎 Cons

Unfair spikes at boundaries

4️⃣ Sliding Window Log

🧩 How it Works

Store timestamps of each request
On every new request:
1. Remove timestamps older than the window
2. Count remaining requests
3. Allow/reject based on the limit

✅ Example

Limit: 2 requests per minute

1:00:10 → allowed (log stored)
1:00:20 → allowed (log stored)
1:00:30 → ❌ rejected (limit reached, not logged)
1:01:40 → old logs removed → request allowed

👍 Pros

Accurate rate limiting
No boundary issue

👎 Cons

High memory usage (stores timestamps)
Expensive for large-scale systems

5️⃣ Sliding Window Counter (Optimized Version)

🧩 How it Works

Instead of storing all timestamps:

Divide time into the current and previous windows
Use a weighted calculation

📌 Formula Insight

If request comes at 30% into current window:

Consider:
- 30% of current window requests
- 70% of previous window requests

👉 Approximate total requests:

effective_count =
(current_window_request_count) +
(previous_window_request_count × overlap_ratio)

Here’s a clear, interview-ready example you can directly paste into your blog 👇

📊 Example: Sliding Window Counter

⚙️ Setup

Limit = 10 requests per minute
Window size = 60 seconds
Current request comes at 30% into the current window (i.e., at 18 seconds)

📦 Data

Previous window (last 60s): 10 requests
Current window (so far): 3 requests
Current time position = 30% into window

🧠 Calculation

We apply the formula:

effective_count =
(current_window_count) +
(previous_window_count × overlap_ratio)

Here:

Overlap ratio = remaining portion of previous window = 70% = 0.7

So:

effective_count = 3 + (10 × 0.7)
                = 3 + 7
                = 10

🚦 Decision

Limit = 10
Effective count = 10

👉 This request will be ❌ rejected (limit reached)

🎯 Intuition (Why this works)

Even though we are in a new window, recent traffic from the previous window still affects us
Instead of a sudden reset (like fixed window), we gradually decay past requests
This prevents burst attacks at window edges

⚡ Quick Variation

If current window had only 2 requests:

effective_count = 2 + (10 × 0.7)
                = 2 + 7
                = 9

👉 ✅ Request allowed

👍 Pros

Memory efficient
More accurate than a fixed window
Scales better than the sliding log

👎 Cons

Slight approximation (not 100% exact)

⚖️ Comparison Summary

Algorithm	Accuracy	Memory	Burst Handling	Complexity
Token Bucket	Medium	Low	✅ Yes	Easy
Leaky Bucket	Medium	Low	❌ No (smooth)	Easy
Fixed Window	Low	Very Low	❌ No	Very Easy
Sliding Window Log	High	High	✅ Yes	Medium
Sliding Window Count	High	Medium	✅ Yes	Medium

🏁 Final Thoughts

Use Token Bucket → when bursts are acceptable
Use Leaky Bucket → when a constant processing rate is required
Use Fixed Window → when simplicity and low overhead matter more than accuracy
Use Sliding Window Log → when accuracy matters most
Use Sliding Window Counter → best balance for production systems

💡 Real-World Tip

Most production systems (like APIs at scale) prefer:

Token Bucket or
Sliding Window Counter

Because they balance: 👉 performance + memory + accuracy

Command Palette

🧠 What is Rate Limiting?

⚙️ Why Rate Limiting Matters

📦 Types of Rate Limiting Algorithms

1️⃣ Token Bucket Algorithm

🧩 How it Works

✅ Example

👍 Pros

👎 Cons

2️⃣ Leaky Bucket Algorithm

🧩 How it Works

📌 Behavior

👍 Pros

👎 Cons

3️⃣ Fixed Window Counter

🧩 How it Works

✅ Example

⚠️ Problem: Boundary Issue

👍 Pros

👎 Cons

4️⃣ Sliding Window Log

🧩 How it Works

✅ Example

👍 Pros

👎 Cons

5️⃣ Sliding Window Counter (Optimized Version)

🧩 How it Works

📌 Formula Insight

📊 Example: Sliding Window Counter

⚙️ Setup

📦 Data

🧠 Calculation

🚦 Decision

🎯 Intuition (Why this works)

⚡ Quick Variation

👍 Pros

👎 Cons

⚖️ Comparison Summary

🏁 Final Thoughts

💡 Real-World Tip

Comments

More from this blog