Tidbits | July 19, 2018

Today I Learned – Celery and Django and Docker: Oh My!

by Lacey Williams Henschel |   More posts by Lacey

In this post, you will learn about how to:

  • create a Celery task
  • inside a Django project
  • in a Docker container

Versions: Django 1.11, Python 3.6, Celery 4.2.1, Redis 2.10.6, and Docker 17.12.

Note that especially for Celery, versions matter a lot. Celery changed the names of many of their settings between versions 3 and 4, so if internet tutorials have been tripping you up, that might be why. Be careful when Googling for advice and always check the version number if something isn’t working.

If you need a refresher on using Docker with Django, check out A Brief Intro to Docker for Djangonauts and Docker: Useful Command Line Stuff.

What is Celery?

Celery is a tool that helps you manage tasks that should occur outside the request/response cycle. It’s not specific to Django.

Celery is especially helpful for transforming blocking transactions on your site into non-blocking transactions. For example, you might have a site that takes payment information. Validating credit card information, processing a charge, and producing a receipt might take 3-15 seconds, during which time your user is waiting and other users’ requests are also held up. Celery can help by offloading that work to different tasks. Instead of waiting until the credit card has been processed to show your user a confirmation page, you can quickly show them a confirmation screen that assures them that a receipt is forthcoming in their email. Then, outside the request/response cycle in a series of Celery tasks, you can validate their credit card, charge it, create a receipt, and email the receipt to the user. This experience is much smoother for your user, a better use of your server resources, and increases the number of requests your website can process for other users.

You can use Celery to send email, update your database with side effects from the request that was just processed, query an API and store the result, and a lot more. Any task that takes more than half a second is a great candidate for turning into a Celery task.

Another thing Celery is helpful for is scheduling tasks to run at specific times. You might be familiar with cron jobs, which are tasks that run at specific intervals you define. Django doesn’t have the cleanest ways of handling scheduling jobs, but using Celery with Django to schedule jobs is pretty smooth. You might set up scheduled Celery tasks to send user notification emails, scrape a website, or process vendor payments.

This post focuses on getting a scheduled task to run inside Docker in a Django project.

Setting Up a Task

Assume this project has the following structure:

proj/
├── app/
     ├── __init__.py
     ├── tasks.py
├── proj/
     ├── __init__.py
     ├── celery.py
     ├── settings.py
├── docker-compose.yml
├── Dockerfile
├── manage.py
└── requirements.txt

You should already have Django specified in your requirements file, and the most recent version of Docker downloaded onto your computer. Since the Dockerfile takes care of installing packages for us, to access Celery and Redis we need to add the current versions of those libraries to the requirements.txt file:

celery==4.2.1
redis==2.10.6

Open proj/celery.py and add the following code. Most of it is boilerplate that you will see in all Celery configuration files.

import os
from celery import Celery


os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'proj.settings')

app = Celery('proj')
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()

@app.task(bind=True)
def debug_task(self):
    print('Request: {0!r}'.format(self.request))

In this code, you are identifying a default Django settings module to use and doing some configuration setup. You are also setting up Celery to “autodiscover” tasks from all apps in your project. (This project is, creatively, called proj.) Finally, you have a debug task.

Now let’s create a task. In app/tasks.py, add this code:

from celery import shared_task


@shared_task
def hello():
    print(Hello there!”) 

The task itself is the function hello(), which prints a greeting. The shared_task decorator creates an instance of the task for each app in your project, which makes the tasks easier to reuse. There’s a great explanation of shared_task here.

In proj/__init__.py, add the following:

from .celery import app as celery_app

__all__ = ['celery_app']

This code ensures that Celery finds the tasks you’ve written when your Django application starts.

To test that your hello() task works, you can run it locally as a regular Python function. Start a Python shell using docker-compose run web ./manage.py shell. Run:

>>> from app.tasks import hello
>>> hello()
Hello there! 

If you would like to test running your task as a Celery task, run:

>>> hello.delay()
<AsyncResult: ba845cf3-e60b-4432-a9d8-9943621cb8a0>

Back in your first tab, you will see the output from your task. delay() lets Celery execute the task, so instead of seeing the output in your shell like you’re used to, you see your output logged to the console where your server is running.

Introducing: Celery Beat

But the task, once found, will only execute once, and we’d like to schedule it to happen more frequently, like once a minute. We will use a feature called Celery beat to schedule our task to run periodically. Celery beat is the Celery scheduler. It executes tasks as often as you tell it to.

Open settings.py. Before we run our task through Celery, we need to configure some Django settings. All settings specific to Celery should have the CELERY_ prefix.

CELERY_BROKER_URL = 'redis://redis:6379'
CELERY_RESULT_BACKEND = 'redis://redis:6379'
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'

For CELERY_BROKER_URL and CELERY_RESULT_BACKEND, you may see tutorials that instruct you to set these to something like redis://localhost:6379, but you should replace localhost with the service name defined in your docker-compose file, redis. (We’ll get to that in a moment.)

Import crontab in your settings file.

from celery.schedules import crontab

Now add the following variable below your other CELERY_ settings.

CELERY_BEAT_SCHEDULE = {
    'hello': {
        'task': 'app.tasks.hello',
        'schedule': crontab()  # execute every minute
    }
}

This code sets up a dictionary, CELERY_BEAT_SCHEDULE, that contains the names of your tasks as keys and a dictionary of information about your task and its schedule as the value. In the dictionary that contains the keys “task” and “schedule,” the value of “task” should be a string with the fully qualified path to your task. The value of “schedule” is the information about how often you want this task to run. The default is to execute every minute; check out the docs for examples on more complex schedules.

Configuring Celery In Docker

The Dockerfile is here and doesn’t need any changes in order to work with Celery.

The docker-compose.yml file, however, needs some new services:

version: '3'

services:
  db:
    image: postgres:9.6.5
    volumes:
      - postgres_data:/var/lib/postgresql/data/
  redis:
    image: "redis:alpine"
  web:
    build: .
    command: bash -c "python /code/manage.py migrate --noinput && python /code/manage.py runserver 0.0.0.0:8000"
    volumes:
      - .:/code
    ports:
      - "8000:8000"
    depends_on:
      - db
      - redis
  celery:
    build: .
    command: celery -A proj worker -l info
    volumes:
      - .:/code
    depends_on:
      - db
      - redis
  celery-beat:
    build: .
    command: celery -A proj beat -l info
    volumes:
      - .:/code
    depends_on:
      - db
      - redis

volumes:
  postgres_data:

Let’s walk through the services we’ve added.

Redis is a data store and message broker that works with Celery to manage storing and processing your messages.

redis:
    image: "redis:alpine"

In your web service, add redis to the depends_on section. This ensures that your db and redis services will start before the web service. (Note: this won't guarantee that the db and redis services will be fully ready before the web service starts; look into restart: on-failure and other options for making sure a service doesn't start until other services it needs are ready.)*

depends_on:
      - db
      - redis

In order to have that task execute without needing to explicitly tell it to execute via the command line, we added the celery service.

celery:
    build: .
    command: celery -A proj worker -l info
    volumes:
      - .:/code
    depends_on:
      - db
      - redis

This code adds a Celery worker to the list of services defined in docker-compose. Now our app can recognize and execute tasks automatically from inside the Docker container once we start Docker using docker-compose up.

The celery worker command starts an instance of the celery worker, which executes your tasks. -A proj passes in the name of your project, proj, as the app that Celery will run. -l info sets the log-level as info. The Django docs have more info on logging; the log-level you set won’t matter until you have some code to determine how the different levels should be handled.

We also added a celery-beat service that will run this command automatically inside the Docker container.

celery-beat:
    build: .
    command: celery -A proj beat -l info
    volumes:
      - .:/code
    depends_on:
      - db
      - redis

The command is similar, but instead of celery -A proj worker we run celery -A proj beat to start the Celery beat service, which will run tasks on the schedule defined in CELERY_BEAT_SCHEDULE in settings.py.

Start Docker with docker-compose up. You should see the output from your task appear in the console once a minute (or on the schedule you specified).

celery_1 | [2018-02-01 22:44:00,957: WARNING/ForkPoolWorker-1] Hello there!

And there you have it! For even more fun, you might try:

  • Adding more complex tasks, like tasks that take arguments
  • Setting up different schedules
  • Setting up a non-periodic task

What next?

Review the Celery Tasks Checklist for a great introduction to Celery best practices. In particular, pay attention to:

  • Set a retry_limit so that failed tasks don’t keep trying forever.
  • Set a high default task_time_limit to avoid tasks that block the entire Celery worker.
  • For tasks that need to take in a Django model object as a parameter, pass in a primary key and not the object itself.
  • Do a lot of logging!

You will also want to monitor your tasks for success or failure. A great tool for this is Flower, Celery’s monitoring tool. Flower will show you a dashboard of all your workers and tasks and let you drill down into specific tasks, show you task statistics, let you restart workers, and let you rate-limit tasks (among many other things).

If you use an error-tracking system like Rollbar or Sentry, you can also set Celery up to report exceptions to those services.

Further Reading


* Thanks to kurashu89 for their correction on an earlier version of this article.


python   django   docker   celery  

In this post, you will learn how to create a Celery task inside a Django project in a Docker container. Sounds awesome, right?{% else %}

2018-07-19T14:37:00.947188 2018-07-23T10:45:07.477137 2018 python,django,docker,celery