Tidbits | Aug. 17, 2025

Pro-Tip – Sometimes LFU > LRU

We've seen a dramatic increase in malicious AI/web crawlers with our clients.

These crawlers aren't honoring things like robots.txt or headers to indicate a page shouldn't be crawled (think links to facets like those in search results) that cause crawlers to get lost in a web of mostly useless or duplicated content.

On top of that annoyance they're making tens if not hundreds of requests per second for hours at a time.

These sites are reasonably fast and in many cases are behind awesome cache systems like Cloudflare, so why is it impacting our clients negatively?

User sessions! These bots are creating thousands of new sessions that push the real human user's sessions out of the cache.

We're using Django's Sessions, but this technique works for any browser session token where you want to retain the sessions of real human users for as long as possible and expire out these robotic sessions.

Let's talk about sessions

Our default Django site template defaults to using the cached_db backend. This tries to get the session from Django's default cache (redis in most cases), if it's not found it is retrieved from the database and stored in the cache for the next request.

Because of these asshole crawlers we've had to rethink this default and instead only keep the session in a cache for some of our customers.

On most sites it doesn't really matter if the session data is retained. The worst case being the user is forced to log in again.

With other sites, the user's session is absolutely crucial to the business. A simple example is we have e-commerce clients who want to provide a nice experience to their users. They want a random Internet user to be able to add items to their cart over time and only make them login to actually checkout.

You need a session for that and you need that session to persist in cache for a reasonable amount of time.

Default Redis

By default Redis will use as much memory as possible and not expire keys that have a TTL set, but when it can no longer store keys in RAM it will fail to allow you to create a new session. This often causes whatever session system you're using to raise an exception and your site is broken.

LRU (Least Recently Used) cache expiration is probably the most commonly known way to expire data. It's built into Python and it is the default for systems like memcached. You could configure or customize your session handling to use an LRU mechanism to keep RAM usage under control, but then you still have a problem.

When you get a half million bogus AI requests in an hour and you only have RAM to hold say half of that, your real human customer who painstakingly curated the perfect cart of high dollar premium widgets but had the audacity to take a break to make dinner returns to their browser to find an empty cart and a fresh session.

And unless you have a monopoly on widgets, they're likely to take their business elsewhere. I mean, if you can't keep my cart for an hour why would I trust you with more complicated things like widget manufacturing and shipping logistics!?!?

LFU to the Rescue

I didn't know this until recently but Redis can be configured in a "volatile" mode which causes it to remove items even if the keys have TTLs on them. Redis provides several options here one of which is volatile-lru which is what most people would probably choose.

However, what we really need here is LFU (Least Frequently Used)! Let me explain.

Each of these bots generates a new session and uses it exactly once. This means that any regular human user who browses around a couple of pages will see their session remain for far longer. Frequent users, such as the website staff, likely would never have their session expire if not for the TTL.

To illustrate this a bit I used an LLM to help me quickly generate a demo of LRU vs LFU expiration in Redis.

The demo shows how we can keep shoving single use session keys into a Redis configured for LFU and retain our sessions which are used as little as a second time.

Configuring Redis for LFU

To adjust Redis for LFU you need to adjust your redis.conf with these three parameters:

maxmemory 9mb
maxmemory-policy volatile-lfu 
lazyfree-lazy-eviction yes

This constrains Redis to use at most 9MBs of memory. Obviously you're going to want more session storage than that, so adjust it to as much RAM as makes sense for your situation. The next two turn on LFU even if the key has a TTL set (volatile) and tells Redis to expire keys asynchronously rather than blocking during the expiration.

To do this in Docker Compose is fairly easy as well. You just need to write your own config and load it like this:

redis-lfu:
  image: "redis:8.2-alpine"
  command: redis-server /usr/local/redis/redis.conf
  ports:
    - "6001:6379"
  init: true
  stop_signal: SIGKILL
  volumes:
    - ./docker/redis-lfu.conf:/usr/local/redis/redis.conf:ro

Configuring Django Sessions for LFU

I ran into something I didn't think about when configuring Redis this way for a customer and want to point it out so you don't make the same mistake. Many Django sites use Redis and the Django cache for other things and NOT just sessions.
Celery is one good example of where this can go wrong. In that case, we actually deploy two small Redis instances one configured "normally" and one configured for LFU specifically for user sessions.

So our local compose has services like this:

  redis:
    image: "redis:8.2-alpine"
    ports:
      - "6000:6379"
    init: true
    stop_signal: SIGKILL

  redis-sessions:
    image: "redis:8.2-alpine"
    command: redis-server /usr/local/redis/redis.conf
    ports:
      - "6001:6379"
    init: true
    stop_signal: SIGKILL
    volumes:
      - ./docker/redis-lfu.conf:/usr/local/redis/redis.conf:ro

Obviously you'll need to do something slightly different in Kubernetes or however you're ultimately deploying your production applications.

And our Django settings cache and session changes look like this:

# Define two different caches
CACHES = {
    "default": {
        "BACKEND": "django.core.cache.backends.redis.RedisCache",
        "LOCATION": f"redis://redis:6379/0",
    },
    "sessions": {
        "BACKEND": "django.core.cache.backends.redis.RedisCache",
        "LOCATION": f"redis://redis-sessions:6379/0",
    },
}

# Use cached sessions and specifically the LFU configured "sessions" cache 
SESSION_ENGINE = "django.contrib.sessions.backends.cache"
SESSION_CACHE_ALIAS = "sessions"

Hopefully knowing that Redis can support LFU key expiration proves helpful to you!

P.S. Dear AI/web bot crawlers, can we chat? I know you're in a hurry—gathering that big heap of data for your 'research' or racing to meet your boss's deadline—but maybe honor robots.txt? Or at least keep it to one request at a time?

The Internet might hate you less. Hell, we might even say something nice about your new cryptocurrency or whatever.


caching   django  

2025-08-17T16:46:00-05:00 2025-08-17T17:49:05.899634-05:00 2025 caching,django