Skip to main content

Command Palette

Search for a command to run...

πŸš€ Rate Limiting Explained: Protecting Systems from Overload

Updated
β€’6 min read

🧠 What is Rate Limiting?

Rate limiting is a mechanism used to control the number of requests a client can make to a server within a specific time frame.

It helps:

  • Prevent server overload

  • Protect against abuse (spam, brute force attacks)

  • Ensure fair usage across users

Rate limiting can be applied:

  • 🌍 Globally (server-level)

  • πŸ‘€ Per user / per API key


βš™οΈ Why Rate Limiting Matters

Without rate limiting:

  • A single user could overwhelm the system

  • APIs could become unavailable

  • Infrastructure costs could spike

With rate limiting:

  • System stays stable βœ…

  • Performance remains predictable βœ…


πŸ“¦ Types of Rate Limiting Algorithms


1️⃣ Token Bucket Algorithm

🧩 How it Works

  • Imagine a bucket with a maximum capacity of tokens

  • Tokens are added at a fixed rate (refill rate)

  • Each request consumes 1 token

  • If tokens are available β†’ request is allowed

  • If empty β†’ request is rejected

βœ… Example

  • Bucket size = 5 tokens

  • Refill rate = 3 tokens/second

If requests come:

  • First 5 requests β†’ allowed immediately

  • Next requests β†’ allowed only as tokens refill


πŸ‘ Pros

  • Allows bursty traffic

  • Smooth handling of spikes

  • Simple to implement

πŸ‘Ž Cons

  • Hard to tune bucket size & refill rate

2️⃣ Leaky Bucket Algorithm

🧩 How it Works

  • Requests are added to a queue (bucket)

  • Processed at a constant rate (fixed outflow)

  • If the queue is full β†’ new requests are rejected

πŸ“Œ Behavior

  • Incoming traffic may be bursty

  • Outgoing traffic is always smooth and constant


πŸ‘ Pros

  • Ensures steady request processing rate

  • Prevents sudden spikes downstream

πŸ‘Ž Cons

  • Queue overflow β†’ request drops

  • Increased latency during bursts

  • Requires tuning queue size & processing rate


3️⃣ Fixed Window Counter

🧩 How it Works

  • Time is divided into fixed windows (e.g., 1 min)

  • A counter tracks requests in that window

  • Counter resets at the start of each new window

βœ… Example

  • Limit = 5 requests per 3 seconds

Requests:

  • Window 1 β†’ 5 requests allowed

  • Window 2 β†’ counter resets β†’ again 5 allowed


⚠️ Problem: Boundary Issue

At window edges:

  • End of window β†’ burst of requests

  • Start of next window β†’ another burst

πŸ‘‰ This allows more requests than expected in a short time


πŸ‘ Pros

  • Very simple

  • Low memory usage

πŸ‘Ž Cons

  • Unfair spikes at boundaries

4️⃣ Sliding Window Log

🧩 How it Works

  • Store timestamps of each request

  • On every new request:

    1. Remove timestamps older than the window

    2. Count remaining requests

    3. Allow/reject based on the limit


βœ… Example

Limit: 2 requests per minute

  • 1:00:10 β†’ allowed (log stored)

  • 1:00:20 β†’ allowed (log stored)

  • 1:00:30 β†’ ❌ rejected (limit reached, not logged)

  • 1:01:40 β†’ old logs removed β†’ request allowed


πŸ‘ Pros

  • Accurate rate limiting

  • No boundary issue

πŸ‘Ž Cons

  • High memory usage (stores timestamps)

  • Expensive for large-scale systems


5️⃣ Sliding Window Counter (Optimized Version)

🧩 How it Works

Instead of storing all timestamps:

  • Divide time into the current and previous windows

  • Use a weighted calculation


πŸ“Œ Formula Insight

If request comes at 30% into current window:

  • Consider:

    • 30% of current window requests

    • 70% of previous window requests

πŸ‘‰ Approximate total requests:

effective_count =
(current_window_request_count) +
(previous_window_request_count Γ— overlap_ratio)

Here’s a clear, interview-ready example you can directly paste into your blog πŸ‘‡


πŸ“Š Example: Sliding Window Counter

βš™οΈ Setup

  • Limit = 10 requests per minute

  • Window size = 60 seconds

  • Current request comes at 30% into the current window (i.e., at 18 seconds)


πŸ“¦ Data

  • Previous window (last 60s): 10 requests

  • Current window (so far): 3 requests

  • Current time position = 30% into window


🧠 Calculation

We apply the formula:

effective_count =
(current_window_count) +
(previous_window_count Γ— overlap_ratio)

Here:

  • Overlap ratio = remaining portion of previous window = 70% = 0.7

So:

effective_count = 3 + (10 Γ— 0.7)
                = 3 + 7
                = 10

🚦 Decision

  • Limit = 10

  • Effective count = 10

πŸ‘‰ This request will be ❌ rejected (limit reached)


🎯 Intuition (Why this works)

  • Even though we are in a new window, recent traffic from the previous window still affects us

  • Instead of a sudden reset (like fixed window), we gradually decay past requests

  • This prevents burst attacks at window edges


⚑ Quick Variation

If current window had only 2 requests:

effective_count = 2 + (10 Γ— 0.7)
                = 2 + 7
                = 9

πŸ‘‰ βœ… Request allowed


πŸ‘ Pros

  • Memory efficient

  • More accurate than a fixed window

  • Scales better than the sliding log

πŸ‘Ž Cons

  • Slight approximation (not 100% exact)

βš–οΈ Comparison Summary

Algorithm Accuracy Memory Burst Handling Complexity
Token Bucket Medium Low βœ… Yes Easy
Leaky Bucket Medium Low ❌ No (smooth) Easy
Fixed Window Low Very Low ❌ No Very Easy
Sliding Window Log High High βœ… Yes Medium
Sliding Window Count High Medium βœ… Yes Medium

🏁 Final Thoughts

  • Use Token Bucket β†’ when bursts are acceptable

  • Use Leaky Bucket β†’ when a constant processing rate is required

  • Use Fixed Window β†’ when simplicity and low overhead matter more than accuracy

  • Use Sliding Window Log β†’ when accuracy matters most

  • Use Sliding Window Counter β†’ best balance for production systems


πŸ’‘ Real-World Tip

Most production systems (like APIs at scale) prefer:

  • Token Bucket or

  • Sliding Window Counter

Because they balance: πŸ‘‰ performance + memory + accuracy