Skip to content

Rate Limiting and Backpressure Control

Introduction

On Double 11 at midnight, hundreds of millions of users flood in simultaneously — can the servers handle it? Every system has a processing capacity limit. When request volume exceeds what the system can bear, without control, the result is that nobody can use the service. Rate limiting and backpressure are the two lines of defense that protect systems from being "overwhelmed."

What will you learn in this article?

After reading this chapter, you will gain:

  • Why rate limiting is necessary: Understand why you need to proactively reject some requests to protect the system
  • Rate limiting algorithms: Master the principles and differences of three core algorithms — token bucket, leaky bucket, and sliding window
  • Backpressure mechanisms: Understand handling strategies when upstream speed exceeds downstream speed
  • Multi-layer rate limiting: Learn the multi-layer rate limiting architecture from client to gateway to service
  • Practical skills: Know which rate limiting strategy to choose for which scenario
ChapterContentCore Concepts
Chapter 1Why rate limiting is neededCascading failure, service protection
Chapter 2Rate limiting algorithmsToken bucket, leaky bucket, sliding window
Chapter 3Backpressure controlBuffer, drop strategy, elastic scaling
Chapter 4Multi-layer rate limiting architectureClient, gateway, server side
Chapter 5Practice and selectionNginx, Redis, Sentinel

0. The Big Picture: Why "Reject" Users?

This sounds counterintuitive — shouldn't we serve every user well? But the reality is: if you don't reject some requests, all requests will fail.

Imagine a restaurant that can only seat 100 people, and suddenly 1,000 people rush in. Without rate limiting, the result isn't that all 1,000 get to eat — it's that the kitchen crashes, the servers are overwhelmed, and nobody gets fed. The right approach is to queue and limit at the door, letting 100 people in first while the rest wait.

Core Goals of Rate Limiting

  • Protect the system: Prevent overload from causing complete service unavailability
  • Fair allocation: Ensure accepted requests can be processed normally
  • Graceful degradation: Rate-limited requests receive a clear 429 status code, rather than a timeout or 500 error

1. Rate Limiting Algorithms: Three Classic Approaches

The core question of rate limiting is: within a unit of time, what is the maximum number of requests allowed through? Different algorithms make different trade-offs in precision, burst traffic handling, and implementation complexity.

Rate Limiting Algorithm Comparison
Choose an algorithm, then send requests to observe the effect
Passed0
Rejected0
Tokens left5
Token bucket
Adds tokens to the bucket at a fixed rate. Each request consumes one token, and extra tokens are discarded when the bucket is full. It allows bursts when stored tokens are available.
AlgorithmPrincipleBurst TrafficPrecisionImplementation Complexity
Token bucketTokens added at a fixed rate; requests consume tokensAllowed (when bucket has surplus)HighMedium
Leaky bucketRequests queue up; processed at a fixed rateNot allowed (fully smoothed)HighMedium
Sliding windowCounts requests within a time windowPartially allowedFairly highLow
Fixed windowCounts by fixed time windowMay burst at boundariesLowLowest

Which Algorithm to Choose?

  • API rate limiting: Token bucket is most commonly used, allowing reasonable burst traffic
  • Traffic shaping: Leaky bucket suits scenarios requiring constant output rate
  • Simple counting: Sliding window is easy to implement, suitable for most web applications

2. Backpressure Control: When Upstream Is Faster Than Downstream

Rate limiting solves the problem of "too many external requests," while backpressure solves the problem of "internal component speed mismatch."

When a producer generates data faster than a consumer can process it, the intermediate buffer keeps growing, eventually leading to memory overflow or data loss. Backpressure mechanisms allow consumers to "notify upstream to slow down."

Backpressure Control
What happens when production is faster than consumption?
Produce rate:6/s
Consume rate:3/s
Producer
6/s
Buffer (0/20)
Running normally
Consumer
3/s
Backpressure strategies:
Drop strategy
Drop new data directly when the buffer is full
Example: log collection, real-time metrics
Blocking strategy
Make producers wait when the buffer is full
Example: Go channels, Java BlockingQueue
Sampling strategy
Process only part of the data and skip the rest
Example: downsampling high-frequency sensor data
Elastic scaling
Dynamically increase the number of consumers
Example: Kubernetes HPA autoscaling

Four Backpressure Strategies

  1. Drop: When the buffer is full, discard new or old data; suitable for scenarios with high real-time requirements but tolerable data loss
  2. Block: Pause the producer until the consumer finishes processing; suitable for scenarios where data cannot be lost
  3. Sample: Only process a portion of the data; suitable for high-frequency data streams
  4. Elastic Scaling: Dynamically increase the number of consumers; suitable for cloud-native environments

3. Multi-Layer Rate Limiting Architecture

In production environments, rate limiting at a single point is not enough — you need multi-layer protection, with each layer solving problems at a different granularity.

LayerLocationRate Limiting GranularityTools
ClientFrontend/AppButton debounce, request throttlinglodash.throttle, debounce
CDN/WAFEdge nodesIP-level, region-levelCloudflare Rate Limiting
API GatewayEntry gatewayRoute-level, user-levelNginx limit_req, Kong
Server sideInside applicationInterface-level, resource-levelSentinel, Resilience4j
DatabaseStorage layerConnection count, QPSConnection pool configuration, slow query circuit breaking

HTTP Specification for Rate Limiting

Rate-limited requests should return a 429 Too Many Requests status code with response headers including:

  • Retry-After: How long the client should wait before retrying (seconds or date)
  • X-RateLimit-Limit: Rate limit ceiling
  • X-RateLimit-Remaining: Remaining quota
  • X-RateLimit-Reset: Quota reset time

4. Practical Selection

ScenarioRecommended SolutionNotes
Nginx entry rate limitinglimit_req_zoneBased on leaky bucket algorithm, simple configuration
Distributed rate limitingRedis + Lua scriptToken bucket or sliding window, multi-instance shared counting
Java microservicesSentinel / Resilience4jSupports circuit breaking, degradation, hotspot rate limiting
Node.js APIexpress-rate-limitEasy to use, supports Redis storage
Go servicesgolang.org/x/time/rateStandard library token bucket implementation

Summary

Rate limiting and backpressure are two critical lines of defense for protecting system stability. Rate limiting controls the rate of incoming external traffic, while backpressure coordinates the processing speed of internal components.

Key takeaways from this chapter:

  1. Necessity of rate limiting: Without rejecting some requests, all requests will fail
  2. Three core algorithms: Token bucket (allows bursts), leaky bucket (fully smoothed), sliding window (simple and precise)
  3. Backpressure mechanisms: Four strategies — drop, block, sample, scale
  4. Multi-layer protection: From client to database, each layer solves problems at a different granularity
  5. 429 specification: Return standard status code and rate limit headers when rate-limited

Further Reading