Tidbits | May 6, 2015

Django Performance: 4 Simple Things

by Frank Wiles |   More posts by Frank

Optimizing for performance often comes with a bunch of headaches and hard problems, but it doesn’t always have to be that way.

Below are four simple things you can do to quickly and easily improve the performance of your sites. They’re so simple you should make them part of your standard setup.

Persistent database connections

New in Django 1.6 Django has built-in database connection persistence. Most everyone uses PostgreSQL for their production systems, and while connecting to PG isn’t particularly slow, it is something we can optimize.

Without persistent connections every request to your site also makes a connection to your database, authenticates the user, etc. This can easily take as much as 20–75ms if your database is on another host even on a fast network.

To setup connection persistence you add the CONN_MAX_AGE parameter to your DATABASES setting to look like this:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        'NAME': 'whoohoodb',
        'CONN_MAX_AGE': 600,
    }
}

With this,we’ve setup persistent connections to stay alive for 10 minutes at a time. This helps to cut down on any possible memory leaks or a flaky connection causing a long term problem. You can certainly use a longer value, but I wouldn’t go much beyond an hour as there will be little benefit.

You can read more details about persistent database connections in the Django documentation.

Template Loading

By default Django comes configured to use the two standard template loaders:

TEMPLATE_LOADERS = (
    'django.template.loaders.filesystem.Loader',
    'django.template.loaders.app_directories.Loader',
)

These loaders search the file system and parse your templates on every request. Kinda surprising that it works as quickly as it does right?

You can turn on cached loading, so Django only has to find and parse your templates one time, by simply changing your configuration to be:

TEMPLATE_LOADERS = (
    ('django.template.loaders.cached.Loader', (
        'django.template.loaders.filesystem.Loader',
        'django.template.loaders.app_directories.Loader',
    )),
)

Now, don’t be silly and turn on cached loading in your development environments as it will get annoying. You’ll be forced to restart runserver (or whatever you’re using locally) on each and every template change.

Optimize Django Sessions

If you poll Django users nearly 82% of them have no idea where their sessions are stored. Ok, I admit I made that number up, but in our experience it’s pretty true.

By default, Django stores your user sessions in your database and expects you to occasionally prune out old entries. Very few people do this.

So on each request you’re doing a SQL query to get the session data and another to grab the User object information. If you’re the sort of person who’s concerned about performance you likely already have memcached or Redis setup to cache some things. You can switch to storing your sessions in your cache and easily remove a SQL query from every single request to your site with:

SESSION_ENGINE = 'django.contrib.sessions.backends.cache'

Now this assumes you can safely lose any data you store in your user sessions. If that’s not the case, you can still get some benefit from using:

SESSION_ENGINE = 'django.contrib.sessions.backends.cached_db'

This backend caches your data, but also writes any changes to your database for long term storage.

select_related() and prefetch_related()

The first two tips above were super easy, right? Unfortunately this last one, while still easy compared to totally changing up your data storage, does require a bit more than a single configuration setting change.

select_related() and prefetch_related() aren’t in everyone’s bag of tricks but they are incredibly useful for cutting down on the number of SQL queries your ORM code is firing.

The situation that usually happens is you have some BlogPost model with a ForeignKey to User. Then in a list view you use queryset = BlogPost.objects.active. Then in your templates you have something like:

<ul>
{% for post in object_list %}
  <li>{{ post.title }} - {{ post.user.email }}</li>
{% endfor %}
</ul>

This ends up doing the expected SELECT against the blog_post table, but then is forced in the template loop to do another SELECT against auth_user for each and every post in the list.

Instead, we can ask the Django ORM to go ahead and JOIN in the auth_user table to our results so that object.user is already an object for us. Taking us from BlogPost.objects.active().count() queries down to a single query.

We can fix it by simply changing the view’s queryset to be:

queryset = BlogPost.objects.select_related().active()

While not a simple settings change, still pretty simple to accomplish.

prefetch_related is the same mechanism, but for ManyToManyField relationships. Prefetch can’t use a join here, so the join happens in Python at the ORM level. This can be a HUGE performance win for situations where you have a small number of rows in your ManyToManyField() vs a large number of rows in the model itself. However, in general, it’s less useful and thus used less often that select_related().

So when do you use these you ask? The easiest thing to do is fire up django-debug-toolbar with your project and some representative data. By representative I mean you shouldn’t have a single BlogPost and a single User, try more like 100+ of each.

Then all you need to do is browse around your app locally with the debug toolbar open looking specifically at the query counts and times. If you see anything over 5–10 queries going on this is your clue to investigate.

Most of the time with 100s of objects in your database you’ll stumble across a view that is doing 200–500+ queries where one of these two options will be a huge win for you. Change your queryset, refresh, and make sure that both your query counts decreased but also your overall time. Occasionally you’ll have a situation where it’s actually faster to do the extra queries vs using joins it happens rarely, but still worth checking the overall query time to make sure you aren’t making things worse.


python   django   postgresql  

Four simple steps you can take to improve the performance of your Django applications{% else %}

2015-05-06T10:13:51 2018-04-18T16:26:28.894739 2015 python,django,postgresql