Redis ZSets
Redis ZSets
Redis Sorted Sets, i.e. Redis ZSets, is a collection of unique strings (members) ordered by an association score. When ties occur, the strings are ordered lexicographically
The typical use cases for them are:
- Rate Limiters: You can use a sorted set to build a sliding window rate limiter
- Min Heap: You can do insertion and deletion while evicting stale entries (similar to rate limiter)
- Leaderboards: Maintaining an ordered list of highest scores in massive online games
Sorted Sets are a mix between a set and a hash, where it's composed of unique, non-repeating string elements, but alongside it each element is associated with a floating point value, the score, which allows them to be ordered. Elements from the set are taken in order:
- Items are not natively stored in order in the underlying set
- Upon request, the items are taken based on ordering of score
The implementation is done via skip list and hash tables:
- Skip lists aid in ordered access
- Hash tables for lookup
- Adding is via
ZADD- Inserting into a skip list is
- Retrieving an element is
- Checking hash table
- Retrieving ranges themselves are via
ZRANGEBYSCOREandZRANGE- Traversing a skip list
- Returning results
- Deletion is via
ZREM ZRANKgives the relative ranking of an item in the scorebaord
Rate Limiter
Implementing a rate limiter can be done easily in memory with a sliding window, but becomes increasingly complex when we hit the scale of millions of requests per minute (millions of entries in a list will explode local memory) and have to start distributing requests and keeping some sort of (eventually) consistent counter
The main trade-offs on strict consistency, memory limitations, and distribution lead to a wide range of solutions, and a fair amount of them are backed by Redis:
- Sliding window can be achieved with ZSets
- Fixed window counters are more efficient on memory, but can lead to edge cases where more than allowed requests get through
INCRandEXPIREcalls used to track based on fixed windows, but if there's 10 requests at 1:59, and 10 more at 2:01, we effectively allow 20 calls in 2 seconds
- Strict sliding windows can be implemented with ZSets, but expirey calls must be performed as an API call, so they can only occur on
GETrequests, or as a background process- If background process, need to ensure it also occurs during
GETto ensure no concurrency issues
- If background process, need to ensure it also occurs during
ZRANGEBYSCOREand deletionsZREMRANGEBYSCORErun efficiently in Redis- Expiry issues still matter, and periodic cleanup is needed. That being said, the range scores will ensure we check a strict range, and that old values aren't included in count
- Fixed window counters are more efficient on memory, but can lead to edge cases where more than allowed requests get through
- Token bucket can be achieved with counters based on fixed windows, or using
INCRandEXPIREas a generic counter (bucket)- Incrementing can only be done on check calls, or by a background process, so some calculations are needed to refill based on last timestamp called
- Lua scripting can be used to ensure consistency and concurrency issues are mitigated, but it's at the cost of latency and potential blocking
Background removal of old scores can be done via ZREMRANGEBYSCORE, and this can be done during GET or as a backhround process, but ZRANGEBYSCORE should still count only active values even if old ones exist. Cleanup is required for memory and performance, not correctness
Sliding Window
Implementing a sliding window rate limiter utilizing Redis ultimately comes in 2 flavors:
- True sliding window with
INCRandEXPIREcalls- Pros:
- Accurate (no edge cases). Strict deny
- Can be consistent with Lua scripting
- Efficient retrieval, addition, and deletion
- Can shard based on
userIdor some identifier to reduce strain- Con would lead to potential hot keys here
- Cons:
- Requires
EXPIREcalls onGET, or as background process - Keeps all requests stored, so memory is proportional to requests
- Redis can handle hundreds of thousands of RPS, but storing millions of events, i.e. linear space complexity, becomes extremely expensive at large scale
- Requires
- Pros:
- Fixed Bucket Sliding Window
- For each time bucket, like each minute, simply start a new counter
- Pros:
- Accurate to a certain degree, but has edge cases that allow for double overages
- Efficient memory utilization - because we are just storing integer counts across some fixed slice of time windows (not storing all historic time series info, just before window and curr window)
- Can shard based on
userIdor some identifier to reduce strain- Con would lead to potential hot keys here
- Time complexity at instead of as it's a simple fixed counter
- Cons:
- Edge cases allow for double rate limited set. Occurs if user makes requests at
t:59and another requests att+1:01, this means calls can be made in 2 seconds
- Edge cases allow for double rate limited set. Occurs if user makes requests at
- Weighted Estimate Sliding Window
- Based on
(prev_count * overlap_pct) + curr_window_count - Pros:
- Accurate to a certain degree, but has edge cases that allow for potential overages (not nearly as bad as fixed bucket)
- Efficient memory utilization - because we are just storing integer counts across some fixed slice of time windows (not storing all historic time series info, just before window and curr window)
- Can shard based on
userIdor some identifier to reduce strain- Con would lead to potential hot keys here
- Time complexity at instead of as it's a simple fixed counter
- Cons:
- We've removed the possible calls by utilizing the
overlap_pct, but there's still edge cases for calls at the boundaries
- We've removed the possible calls by utilizing the
- Based on
Min Heap
References OneUpTime Article
Implementing priority queue's / min heaps in a local process is fairly straightforward, but setting these up in a distributed manner utilizing another API is typically difficult. There aren't many solutions for distributing a min heap, but there are solutions for sharding min heaps, and aggregating multiple min heaps
Redis ZSets allow for pushing out the min-heap storage from a local process to a single Redis cluster - ultimately, it's the same operations with the same time complexity, but not bounded to whatever our local RAM memory is
Show Python Script
import redis
import time
import json
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
class PriorityQueue:
def __init__(self, name, redis_client):
self.key = f"pq:{name}"
self.redis = redis_client
def push(self, item, priority):
"""
Add item with priority (lower score = higher priority).
Use negative priority if you want higher numbers = higher priority.
"""
self.redis.zadd(self.key, {item: priority})
def pop(self):
"""Remove and return highest priority item (lowest score)"""
result = self.redis.zpopmin(self.key)
if result:
return result[0][0] # Return item without score
return None
def peek(self):
"""View highest priority item without removing"""
result = self.redis.zrange(self.key, 0, 0)
if result:
return result[0]
return None
def pop_with_priority(self):
"""Remove and return (item, priority) tuple"""
result = self.redis.zpopmin(self.key)
if result:
return (result[0][0], result[0][1])
return None
def size(self):
"""Get queue size"""
return self.redis.zcard(self.key)
def is_empty(self):
"""Check if queue is empty"""
return self.size() == 0
def clear(self):
"""Remove all items"""
self.redis.delete(self.key)
# Usage
pq = PriorityQueue('tasks', r)
# Add tasks with priorities (lower = more urgent)
pq.push('critical-alert', priority=1)
pq.push('normal-task', priority=5)
pq.push('low-priority-cleanup', priority=10)
pq.push('urgent-fix', priority=2)
# Process in priority order
while not pq.is_empty():
task = pq.pop()
print(f"Processing: {task}")
Distributed Heap
None of this covers sharding, which would still be bounded to a similar setup as the K Sorted Streams where you have multiple "local" (or redis based) priority queue's which have add and remove, and for each of those you aggregate them to a "global" (or yet another redis based) aggregator
It's suboptimal to store all local results also in the global result, as that simply introduces unnecessary local aggregators at that point. If the global one is evicting stale information, and the local ones are strictly to handle sharded writes, and they're also evicting old information, then there's a useful setup there. That being said if the global aggregator is overwhelmed for a period of time, there's a chance it doesn't see a high score in a certain time frame, and so the global and local aggregators need to be based on the same window