Three things you should never put in your database
As I've said in a few talks, the best way to improve your systems is by first not doing "dumb things". I don't mean you or your development staff is "dumb", it's easy to overlook the implications of these types of decisions and not realize how bad they are for maintainability let alone scaling. As a consultant I see this stuff all of the time and I have yet to ever see it work out well for anyone.
Images, files, and binary data
Your database supports BLOBs so it must be a good idea to shove your files in there right? No it isn't! Hell it isn't even very convenient to use with many DB language bindings.
There are a few of problems with storing files in your database:
- read/write to a DB is always slower than a filesystem
- your DB backups grow to be huge and more time consuming
- access to the files now requires going through your app and DB layers
The last two are the real killers. Storing your thumbnail images in your database? Great now you can't use nginx or another lightweight web server to serve them up.
Do yourself a favor and store a simple relative path to your files on disk in the database or use something like S3 or any CDN instead.
Ephemeral data
Usage statistics, metrics, GPS locations, session data anything that is only useful to you for a short period of time or frequently changes. If you find yourself DELETEing an hour, day, or weeks worth of some table with a cron job, you're using the wrong tool for the job.
Use redis, statsd/graphite, Riak anything else that is better suited to that type of work load. The same advice goes for aggregations of ephemeral data that doesn't live for very long.
Sure it's possible to use a backhoe to plant some tomatoes in the garden, but it's far faster to grab the shovel in the garage than schedule time with a backhoe and have it arrive at your place and dig. Use the right tool(s) for the job at hand.
Logs
This one seems ok on the surface and the "I might need to use a complex query on them at some point in the future" argument seems to win people over. Storing your logs in a database isn't a HORRIBLE idea, but storing them in the same database as your other production data is.
Maybe you're conservative with your logging and only emit one log line per web request normally. That is still generating a log INSERT for every action on your site that is competing for resources that your users could be using. Turn up your logging to a verbose or debug level and watch your production database catch on fire!
Instead use something like Splunk, Loggly or plain old rotating flat files for your logs. The few times you need to inspect them in odd ways, even to the point of having to write a bit of code to find your answers, is easily outweighed by the constant resources it puts on your system.
But wait, you're a unique snowflake and your problem is SO different that it's ok for you to do one of these three. No you aren't and no it really isn't. Trust me.
Posted on May 1st, 2012 at 16:24:03 and has 32 comments
PEP712 - Proposal to make unittest2 more accurate
| PEP: | 712 |
|---|---|
| Title: | Proposal to make unittest2 more accurate |
| Version: | a58437babcaa |
| Last-Modified: | 2011-10-20T14:40:01.661119 (Thu, 20 Oct 2011) |
| Authors: | Frank Wiles <frank@revsys.com>, Jacob Kaplan-Moss <jacob@jacobian.org>, Jeff Triplett <jeff@revsys.com> |
| Status: | Draft |
| Type: | Humor |
| Created: | 20-Oct-2011 |
| Python-Version | 2.7 |
Introduction
This PEP describes a proposal to make unittest2 output more accurate and fun.
The Proposed Solution
Upon having more than 7 failing tests in a test run replace all 'F' character output with 'U's for the remainder of the test run.
Rationale
This output formatting more accurately describes the mental state of the developer.
Example
.....FFFFFFFUUUUUUUUUUUUUUUUU
CK
----------------------------------------------------------------------
Ran 25 tests in 3.14159s
FAILED (failures=19)
Optional Add-On
Replace next passing test after 19 failures with 'CK'.
Reference Implementations
See Reddit.com
References
Posted on October 20th, 2011 at 14:35:56 and has 3 comments
longjmp() 2011 - Impromptu Party
Lawrence has been having a bunch of downtown block parties lately, which are always great fun. The last couple of days the Kansas Relays have been going on and the long jump event was setup directly outside our office windows. Since we had such a great view we quickly through together a little party and invited all of our geeky friends to come watch, we had a pretty good turn out (free beer works apparently). Here are some photos I took from our office. You can find more professional quality photos of the event here.
Thanks to everyone for coming out!
Posted on April 22nd, 2011 at 13:30:35 and has 0 comments
We're hiring!
UPDATE: I completely forgot to update this post until now, but we have filled this position.
We're looking for a full time developer. If you're reading this you probably know what we do, so you'll obviously need to know Django pretty well. Knowledge of PostgreSQL and devops skills are a big plus. We would prefer if you lived in Lawrence, Kansas, obviously, but a few visits here may be fine. Only because we get lonely.
Lawrence is an awesome town, so don't let the surroundings (the rest of Kansas) scare you off. It's the hometown of Django and basketball. People often compare it to a smaller version of Austin or Portland, OR. For example, check out the local music happening this weekend on the great Lawrence.com.
Interested? Email frank@revsys.com with your resume and code samples. Links to public repositories are ideal, also feel free to talk yourself up with respect to Open Source contributions. Docs, code, answering questions on a mailing list, your StackOverflow account, etc. are all good things to bring to our attention. We probably shouldn't need to tell you this, but if you've contributed code to Django you might want to lead with that!
Posted on April 7th, 2011 at 16:52:42 and has 2 comments
Django 1.3 is out - time to upgrade!
Nearly a year in the making, Django 1.3 is now shipping. It includes a ton of bugfixes along with a bunch of major new features:
- Class-based views.
- Better support of Python’s logging tools.
- A new tool to help with handling static files.
- Greatly improved testing utilities via the unittest2 library.
- Configurable on-delete behavior.
- And more!
To help people get a jump on upgrading, I'll be holding a webinar next week. We'll talk about the new features, go over the steps to follow for a safe and easy upgrade, and cover the "gotchas" to avoid as you upgrade.
You should join us — March 31 from 1-3 (central). It'll be a blast.
Posted on March 23rd, 2011 at 14:41:25 and has 0 comments
