Fun with rate limiters

I've read this great article about Consistent Hashing: Algorithmic Tradeoffs.

It's really thorough. What especially caught my eye is @dgryski's workflow and approach to research. At the end of the article there's a list of repos, all Go implementations of research papers and algos for Caching and various flavors of Consistent Hashing.

This "just code the paper" approach is something that's been on my mind. It's a well known way to deepen one's understanting of a subject.

Hope I'd actually become more diligent and try it myself.

Some context

I've been trying to implement a rate limiting system for the largest system I've had the chance to work with so far. It's doing around 30-60k queries per second (QPS).

There are many algorithms for the task and many ways to build such a system. To name a few algos, there's Leaky Bucket, Token Bucket, Sliding window, Fixed window and others.

Software options that do this off-the-shelf: nginx, haproxy, kong (which is based on openresty that's nginx at core), envoy, etc.

If the available palette of software isn't ticking all the boxes, you can implement these algorithms yourself using a central store like Redis, a distributed store like Memcached, stream processing like Kinesis Analytics and many others.

Many implementations consist of key-value stores that track usage. You basically pick a set of parameters for which you want to track usage. That can be either a single param like (ip), (user_id) or multiple simoultaneous params for more granularity (user_id AND project_id).

You might also want to track usage over different time spans. Say I'm looking to track usage coming from an IP over 30 second intervals and only allow up to X queries in that frame, or maybe I also want to track usage per day for a user. To account for this, we'd also include the window size as part of the key so for each window size you have a different total requests count.

Wrapping up, imagine keys of the form window=30seconds;ip=1.2.3.4;user_id=foobar holding values like {hits: 500, max_allowed: 800, expires_at: 2021-01-30T10:51:42+02:00 }

One off-the-shelf ratelimiter: mailgun/gubernator

I've been playing with mailgun/gubernator which has quite a nice gRPC API. It can run in single node or cluster mode.

It's basically a distributed LRU cache that stores values computed by the ratelimiter algorithms that it implements. It's much inspired from Brad Fritzpatrick's groupcache.

It's able to shard the keyspace over multiple nodes using the Consistent Hashing algorithm.

It might seem pretty simple to implement an in-memory cache. It can be a simple map[Key]Value. Where it starts to show some difficulty is when you push performance. That's when more complex variants of a LRU cache become adequate, like dgraph's Ristretto. There's a really cool talk about how they achieve performance by minimizing locking so that threads can do their work fast without waiting on others to finish their business updating the store.

I couldn't get a 3-4-5 node cluster to ingest the 60k QPS with Gubernator. Initially i thought it's the cache mutex, but then I fixated on cross-node performance. Bummer, end of story so far, I'm taking a different approach for now.

What I ended up doing

The AHA! moment was running redis-benchmark and seeing RPS north of 600k-700k RPS. That should cover my needs for now. Lots.

So I went to Redis Cluster, managed in Elasticache.

The performance is stunning, just what you'd expect from redis: P50 sub millisecond, P95 < 5ms, P99 < 40ms at sustained traffic. Those values are measured on the client, so probably a bit increased due to client load.

It's also easy to scale your cluster: you can scale the shard count by adding master nodes and you can horizontally scale shards by setting the number of replicas.

There's some attention to give to consistency. Redis Cluster is not consistent, however that's not a detractor in a rate limiting application.

Connection pooling issues

Another thing to pay attention to is connection pooling. I've had the pleasure of seeing spikes of thousands of NewConnections (per sec? per min? don't remember), but those were problems on the client side, not Redis.

Keyword: connection churn. Cause: ContextTimeout.

Why? Client overload, go routines scheduled with significant delays, and contention on semaphores. All these issues are visible with pprof tracing.

A general solution, that applies in many other cases with microservices is Concurrency Limiting, or Adaptive Throttling. See what Netflix and Google have on this subject:

Netflix: concurrency limiting.
Google: https://sre.google/sre-book/handling-overload/ (see client-side throttling)

This section is worth a post of it's own. Until then,

Thanks for reading! 😄