brought to you by REVSYS
Picture of Christmas cookies

Ingesting Data Fast

Most of the time as web developers we're concerned with HTTP receiving requests from a user and responding with the appropriate dynamic content. Little attention is paid in blog posts and HOWTOs online for how to optimize the reverse situation.

Santa has to eat A LOT of cookies tonight. What if we needed to track that?

What if we're needing to handle tons of incoming requests, store the data, and process it later? What happens if our response can simply be a "Cool, got it thanks" ACK???

Well pretty much everything changes and you have a ton of options to play with to optimize large scale data ingestion.

Applications

So where does this situation pop up? We've run into several scenarios such as:

  • Collecting metrics not using a third-party service
  • Voting or rating content
  • Logging
  • REST to statsd gateway
  • GPS tracking
  • Write heavy REST APIs in general

Any situation where you need to collect data and the remote end doesn't need any real feedback other than the assurance that it was received.

How to do it

How you go about this will vary with the stack you're using, but there is one simple rule:

Return the ACK response as quickly as you can

You can process the data later. So your job is to grab it, push it into something as quickly as possible, and return a 200 response.

That means:

  • No auth
  • No data validation
  • As little processing as possible

Do all that crap later, out of band, when you're post processing it. Think of it as UDP for HTTP.

It also means:

  • Using a fast data store
  • Fast code
  • Worker processes to ultimately process the data

You may be tempted to reach for Go or NodeJS, which are particularly well suited for this task. However, you can scale up quite a bit further than you may realize with just Python.

Other bits of your stack will likely include nginx and a fast data store like Redis or Kafka. For our examples below we'll show just how far you can take Python and Redis for a task like this.

Django

We're big fans of Django obviously, so of course we're going to do an example using it. We'll be doing all of these tests using:

  • Vagrant
  • Ubuntu 14.04
  • Redis 2.8.4
  • Celery 3.1.17
  • Gunicorn 19.1.1
  • Django 1.7.1

In our Vagrant system, we've used all of our previous tips and tricks such as tuning nginx and using Unix sockets between nginx, gunicorn, and redis to try and remove all of the speed bottlenecks we can.

We also removed as much from Django as we could, so no auth, sessions, messages, the admin, etc. The only app in INSTALLED_APPS is our simple 'ingest' app.

We setup a simple task that, for the purposes of this test, just logs the data:

import json
from celery import shared_task
from celery.utils.log import get_task_logger

logger = get_task_logger(__name__)

@shared_task
def ingest_task(data):
    logger.info(json.dumps(data, indent=4))

And a simple view to fire off the task:

from django.http import JsonResponse
from .tasks import ingest_task

def ingest(request):
    ingest_task.delay(request.POST)
    return JsonResponse(data={'message': 'OK'})

Using Python 2.7, gunicorn, and gevent I was able to consistently get 1100-1200 requests/second. Posting to this view with siege. Scaling those numbers out means we could handle about 4 million requests/hour on a Vagrant VM on a MacBook Pro, not too shabby. Real world performance, say on AWS EC2, would be far better but the important part here is the differences between these configurations/setups.

We don't need no stinking rabbits

So how can we make it faster? Well first we can get rid of Celery. Don't get me wrong, Celery is great, but it adds a bit of overhead that isn't strictly necessary for ingesting this data. Besides rabbits are for Easter and this is Christmas. So lets change our view to be:

import redis
from django.http import JsonResponse

r = redis.Redis(unix_socket_path='/tmp/redis.sock')

def ingest(request):
    r.rpush('testing', request.POST)
    return JsonResponse(data={'message': 'OK'})

This yields on average 1850 requests/sec, a 20% boost. So we can now ingest about 6.5 million requests/hour. Nice! Swapping out Python 2.7 for PyPy 2.2 got us another 100 requests/sec another 5% boost.

NOTE: This would mean you would need to write your own worker code to process the items in the Redis list at some point since we don't have a Celery task to run a Celery worker against.

Get rid of Django?

Blashpemy I know, but what if we swap out Django for a faster WSGI framework like Falcon?

If we swap out Django for this Falcon app:

import falcon
import redis

r = redis.Redis(unix_socket_path='/tmp/redis.sock')

class ThingResource:
    def on_post(self, req, resp):
        r.rpush('testing', req.stream.read())
        resp.body = "{'message': 'OK'}"
        resp.content_type = 'application/json'
        resp.status = falcon.HTTP_200


app = falcon.API()
things = ThingResource()

app.add_route('/', things)

We can achieve 2500 requests per second pretty easily. Which takes us up to about 9 million requests per hour.

Running the Falcon app with PyPy instead of Python 2.7, bumped us up to a whopping 3500 requests/second or 12 million requests per hour.

That's about 288 million requests per day. Likely more than you will ever need!

But wait, there's more...

Let's go full on stupid here and even get rid of Falcon too. What do things look like when we just use a raw WSGI app? Well let's find out! I used this WSGI app:

import redis

r = redis.Redis(unix_socket_path='/tmp/redis.sock')

def app(environ, start_response):
    r.rpush('testing', environ['wsgi.input'].read())
    start_response('200 OK', [('Content-Type', 'application/json')])
    return ["{'message': 'OK'}"]

Running this under PyPy I was able to get 4300-4500 requests/sec. Which brings up our volume to about 15 million per hour or 360 million POSTs per day. Not too bad for a Vagrant VM on a MBP.

Conclusion

First a confession, I ran these tests using a Vagrant VM with 2GBs of RAM and 2 CPU cores when my MBP really has 4 to use. Which means I could have had even better performance than the numbers quoted above.

So let's assume Santa visits all 7.21 billion people on earth tonight, flying with the sun to evenly spread across time zones, and the fat bastard eats one cookie per person.

We would need a whole 10-11 MBPs or 1-2 c3.8xlarge EC2 instances. Let's call it two EC2 instances for redundancy sake. That means we can track all of Santa's cookie consumption for about $80 in AWS charges without even optimizing our cost using spot instances.

Far far less than Santa's diabetes treatment is going to cost him!

Sadly this is the last day of our 12 Days of Christmas Performance series. We hope you enjoyed it, but also learned something. Happy Holidays from all of us at Revolution Systems!