Request / Response Cycle

Today on RevSys' weird world of holiday humor, we're talking about using asynchronous worker processes.

Browsers make a request to your site and it returns a response right? And as we have mentioned before, Step 1 for having good performance is to do less work.

One of the best ways to achieve less work in a web application is to get as much of the work out of the direct request/response cycle. You do this by pushing a message with some data in it into a message queue which will later be picked up and processed by a worker process out of band.

Let's start with another Twitter example. Assume a logged in Twitter user @susan has clicked the 'Follow' button on another user @steve. A lot of things need to happen now:

Record that @susan is following @steve, increasing @susan's Following count
Record that @steve has a new follower @susan, increasing @steve's Follower count
Email @steve about his new follower
Give feedback to @susan that she is now following @steve

But does it truly have to happen right now?

Traditional webapp code would likely do all of these steps synchronously as part of the request/response cycle. However, we can instead fire off 3 different messages for the first 3 items and much more quickly give @susan feedback and let her get on with her Tweeting.

Now, this does assume we're ok with a slight lag between clicking the 'Follow' button and our follower/following data being truly up to date. If our example was instead a financial transaction we would likely opt to have Steps 1, 2, and 4 happen synchronously and only send the email asynchronously. Your application's business rules are going to differ, but you get the concept.

Some likely candidates for using workers are:

Anytime you send email
Updating leader boards
Recording metrics
Deleting/deactivating data
Talking with a remote or third-party API
Long running reports
Processing images
Creating thumbnails
Anything that takes more than 200ms to accomplish

Poor Man's Queue

Batch processing is the simplest example of having a queue and using a worker.

If your needs are relatively small, you can get away with just using some batch processing instead of setting up a bunch of queue and worker infrastructure.

For example, it's common to send new users to service a welcome email. You could easily add a boolean database column named 'welcome_email_sent'. Then with a cron job every few minutes look for users that haven't received the welcome email, send them one, and mark those users as having received it.

This requires setting up a bunch of tracking fields in your database and due to the fact cron jobs can run at most once per minute, not as real time as you may need.

Celery

Queue the A-Team theme, Celery has a plan.

Celery is a distributed task queue for Python projects. This library abstracts away all of the details of how to fire and process messages so we don't have to worry about the minutae.

You simply write Celery tasks that takes some arguments and processes them. You configure a message broker and then run one or more workers to process the messages as quickly as possible.

Celery supports many different message brokers, from using a SQL database to Redis, RabbitMQ or even Amazon SQS. For acceptable performance, avoid using a database as your broker. Using a database for this means your worker processes will have to be constantly querying looking for new messages to processes which isn't very efficient. Ths simplest setup, and the one we typically recommend to clients, is to use Redis as your message broker.

It used to be a bit painful to setup Celery, but lucky for you they've improved things quite a bit and now this is all you really need is a single file, let's call it tasks.py:

from celery import Celery
from ourapp.utils import send_welcome_email

app = Celery('tasks', broker='redis://localhost:6379/0')

@app.task
def welcome(username):
    send_welcome_email(username)

We build a Celery app and configure it to talk to our Redis broker. Then we can fire up a Celery worker by running:

$ celery -A tasks worker --loglevel=info

Now to call your task in your app you just say:

from tasks import welcome

welcome.delay('fwiles')

Now if you're using Celery with Django, you'll want things setup slightly differently. The typical setup is to create a celery.py file in your project directory next to your main urls.py and settings. Here is an example from one of our projects:

from __future__ import absolute_import

import os
from celery import Celery
from django.conf import settings

# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'ima.settings.dev')

app = Celery('ima')

app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)

Elsewhere in our settings we define:

BROKER_URL = 'redis://localhost:6379/10'  # Use DB 10

# Keep tasks results for one hour
BROKER_TRANSPORT_OPTIONS = {
    'visibility_timeout': 3600
}
CELERY_EAGER_PROPAGATES_EXCEPTIONS = True
CELERY_ALWAYS_EAGER = True

Because we have CELERY_ALWAYS_EAGER = True, this is setup for local development and debugging. This setting tells Celery to process all tasks synchronously which is perfect for running our tests and working locally so we don't have to run a separate worker process. You'll obviously want to turn that off in production.

Then in your various Django apps that need to define tasks, create a file named tasks.py. Since we're talking about a welcome email, our 'users' app names the most sense so in users/tasks.py we would have:

from celery import shared_task
from ourapp.utils import send_welcome_email

@shared_task
def welcome(username):
    send_welcome_email(username)

We want our Django apps to be reusable across multiple projects whenever possible. So instead of associating our tasks directly with a Celery app we define them as shared tasks. Celery's autodiscover_tasks method we used aboved then associates the tasks to the app we created for this project.

Worker Elves

Santa is only one person, so he uses a lot of worker Elves to make all those wonderful toys. How you configure your Celery workers depends on your needs.

The simplest setup is a single queue and a single worker. In this scenario all messages go into one big queue and our single worker processes them one at a time, roughly in order of when they were received. For systems that are going to generate only a few messages this is simple and perfect.

Sadly life isn't always so simple. We may have the need to try and process certain types of tasks more quickly than others or want to process one type of message on Server X and another type on Server Y. Luckily, Celery makes this easy for us by allowing us to use multiple message queues.

The docs on how to setup queues and route tasks to them are easy to understand so we won't rehash them here. The basic idea is to create named queues and set certain tasks to go into those queues. Then when worker processes are launched they define which of these queues, or possibly all of them, they should be processing.

If you feel your application is going to grow and get more complicated over time, one quick way to help ease the transition into multiple queues and workers is to break your tasks into three rough performance buckets. Let's call them 'low', 'normal', and 'high'.

We define these queues in our Celery configuration and then as we write our tasks we make a quick decision if this task is low, normal, or a high priority one. If we assume our worker server has 4 cores, we could launch our workers like this:

$ celery -A tasks worker -Q high --concurrency=2
$ celery -A tasks worker -Q normal --concurrency=1
$ celery -A tasks worker -Q low,normal --concurrency=1

Now we have 3 processes setup to handle tasks. The first will spawn two child processes, taking up 2 cores, solely to process messages from our high priority queue. We then setup a single process for normal priority items and a third that will process items from the normal and low queues.

The options here are endless, but there are some things to keep in mind. The more Celery worker processes setup to process a queue, the more simultaneous tasks can be processed. And if your tasks are particularly I/O bound, you may want to increase your concurrency to be 2X or more the number of cores you have. For example, if your tasks are grabbing info from remote systems and spend time waiting on the network this will ultimately let you processes more tasks more quickly.

If your traffic patterns are such that you have a ton of tasks to process during the day and considerably less at night, you can take advantage of the cloud. Just spin up additional cloud instances to processes your queues and then as task volume decreases you can scale back down to an appropriate number of instances/workers.

Advanced Usage

Using Celery Beat, Celery can be setup to process periodic tasks where you would otherwise use cron. This is nice if you have few periodic tasks and tons of async tasks and want to keep your configuration all in one system. Keep in mind that your ops folks are going to need to know these periodic tasks are defined here for when they see strange load spikes at the top of the hour and find no cron entries that are likely culprits.

There are also other more complicated tasks scenerios:

Groups – perform a set of tasks in parallel
Chain – perform a set of tasks in sequential order after each completes
Chord – run a group of tasks with a final body task after all the group tasks have completed
Chunks – break up a long list of arguments into chunks and call a task with each chunk

You can find more information here on these advanced setups. However, we generally recommend you do not use advanced work flows unless you really need them and you have a deep experience with Celery in general.

Monitoring your tasks

One of the things that often scares people away from using a system like this is visibility. If things blow up in your request/response cycle it will generate customer support emails, but no so much with async tasks.

Anyone using a system like this needs visibility into the process and luckily Celery makes this easy on us. There are several things we do to keep tabs on our tasks such as:

Use Flower to monitor our Celery cluster in real time. FYI it's pronounced Flow-er and not like the plant
We can monitor that our Celery worker processes are running using normal monitoring sytems like Nagios or Consul
We can still log any Exceptions from tasks into something like Sentry
We can monitor the number of jobs waiting in our queues to make sure we're processing tasks at all and processing them in a time frame that is reasonable for us
We can log to normal log locations and use other monitoring to ensure we see task related logs periodically

Keeping in mind that users can feel 100ms of performance difference on the web, start moving your non-critical processing into Celery Tasks and maybe next year you won't get so much coal. Happy Holidays!

Day 9 - Automate Common Tasks

Day 11 - Don't Break Your Cache

12 Days of Christmas Performance