Centralized logging for fun and profit!

Setting up a centralized log server using syslog isn't as hard as many may believe. Whether it's logs from Apache, nginx, email services, or even from your own Python applications having a central log server gives you many benefits:

Benefits to a centralized logs

  • Reduces disk space usage and disk I/O on core servers that should be busy doing something else. This is especially true if you want to log all queries to your database. Doing this on the same disk as your actual database creates a write for every read and an extra write for every write.
  • Removes logs from the server in the event of an intrusion or system failure. By having the logs elsewhere you at least have a chance of finding something useful about what happened.
  • All of your logs are in one place, duh! This makes things like grepping through say Apache error logs across multiple webservers easier than bouncing around between boxes. Any log processing and log rotation can also be centralized which may delay your sysadmin from finally snapping and killing everyone.

Syslog Review

In case you aren't terribly familiar with how syslog works, here's a quick primer. Syslog separates out various logs using two items. Facilities and Levels. Here are the standard facilities:

  • 0 kernel messages
  • 1 user-level messages
  • 2 mail system
  • 3 system daemons
  • 4 security/authorization messages
  • 5 messages generated internally by syslogd
  • 6 line printer subsystem
  • 7 network news subsystem
  • 8 UUCP subsystem
  • 9 clock daemon
  • 10 security/authorization messages
  • 11 FTP daemon
  • 12 NTP subsystem
  • 13 log audit
  • 14 log alert
  • 15 clock daemon
  • 16 local use 0 (local0)
  • 17 local use 1 (local1)
  • 18 local use 2 (local2)
  • 19 local use 3 (local3)
  • 20 local use 4 (local4)
  • 21 local use 5 (local5)
  • 22 local use 6 (local6)
  • 23 local use 7 (local7)

For each facility logs are sent using a particular level, the levels are:

  • 0 Emergency: system is unusable
  • 1 Alert: action must be taken immediately
  • 2 Critical: critical conditions
  • 3 Error: error conditions
  • 4 Warning: warning conditions
  • 5 Notice: normal but significant condition
  • 6 Informational: informational messages
  • 7 Debug: debug-level messages

So for any given log message you set these two options to give a hint as to where the logs should be directed. For example, if an email server receives a new message it would likely be sent as mail.info and a kernel panic would be sent using kern.emerg

The receiving syslog server then can be configured to direct log messages of a certain facility and/or log level to various files. For example, a default Ubuntu system has some settings like this:

daemon.*        /var/log/daemon.log
kern.*          /var/log/kern.log
mail.*          /var/log/mail.log

But you can also do more granular separation for example you might want to log mail.err into a separate file from the main mail logs to make it easier to spot new errors with this:

mail.*        /var/log/mail.log
mail.err      /var/log/mail-errors.log

Setting up your central server

Configuring the master log server is pretty easy. On Ubuntu the default syslog server is rsyslog and that's what I'll be using as an example here. You'll need to edit /etc/rsyslog.conf and uncomment the UDP module. You could also use the TCP module, but that one binds to all of your interfaces so you will need to restrict access to it with iptables (or some other mechanism) in order to not allow hackers to fill up your disks remotely. So your configuration should now contain these uncommented lines, where 'x' is an internal protected IP address:

$ModLoad imudp
$UDPServerAddress x.x.x.x
$UDPServerRun 514

And then restart rsyslogd. See that wasn't so hard...

Setting up the remote log sending servers

Setting up your remote servers is even easier. If you want to send ALL of your logs to the central server it's just a matter of adding one line to the top of /etc/rsyslog.d/50-default.conf. That line is:

*.* @x.x.x.x:514

This will send all logs of any facility and any level to the server. Note that the local syslog will, as configured by default, still log locally. So if you don't want that be sure to remove all of the other configuration in this file.

You can also get fancy here and keep some logs on the local server and only send some things remotely. For most of your custom apps and logs you'll want to be using the LOCAL[0-9] facilities. Let's say we're going to want to centrally log our Python logs and Apache error logs. We'll be using LOCAL0 and LOCAL1 for them respectively. That config would look like:

local0.* @x.x.x.x:514
local1.* @x.x.x.x:514

Keep in mind however that most systems have *.info, *.debug, etc. configurations setup so you might be duplicating your data. If you poke around this file you'll see lots of configurations ending in .none, this instructs rsyslog to not include those facilities in this particular file. So for example, you'd want to edit your /var/log/syslog to resemble this:

*.*;auth,authpriv,local0,local1.none        /var/log/syslog

Additional help and features

While most applications are easy to setup for use with syslog, here are some pointers for more info on the subject:

  • Apache support sending error logs to syslog via the ErrorLog syslog:local1 configuration option. However, it does not support sending access logs directly. To do that you'll need a small script and pipe your access logs through it.
  • For more information on setting up your own Python code to use syslog, check out the logging.handlers.SysLogHandler handler for the logging module.

We've only really scratched the surface of the features of rsyslog with this setup. You can configure it to do some fairly advanced separation of logs based on the sending host, application name, and other various aspects of the message itself. Refer to the rsyslog documentation for more information on that.

Happy Logging!

Tags: django, programming, python, systems administration

Comments

Hey.

Very in-depth article.

I wanted to ask: Have you had a go at syslog-ng? http://www.balabit.com/network-securi...

I remember years ago when I was working on a project that collected logs from different devices/machines across a network for collation etc purposes, I ended up using syslog-ng. It is pretty flexible.

by ayaz on Aug 26, 2010 at 8:07 PM

Ayaz, I've used syslog-ng several years ago, but at the moment I'm not remembering the feature set to say how it differs from rsyslog.

I remember it being much more flexible than the standard syslog daemon. When we set up a central logging server for a client recently I honestly was reaching for syslog-ng until I realize the latest Ubuntu has rsyslog which had all the features we needed on that particular project.

But I'll definitely give it another once over in the near future. Thanks for reminding me about it.

by Frank Wiles on Aug 26, 2010 at 8:14 PM

As a warning and a heads up, it's also possible for a very log-chatty set of servers to dominate network traffic. It can be very frustrating to set up a centralized logging infrastructure, only to discover that you've created a scaling bottleneck in the backend switching with syslog packets.

One solution is to configure with a third network layer dedicated to administrative traffic like logging and monitoring. That is, if you're still using physical servers somewhere.

by Judson on Aug 26, 2010 at 8:52 PM

Nice, thanks for sharing.

Tomas

by Tomas on Aug 27, 2010 at 1:20 AM

I've tinkered with Splunk [1] a little in the past. It's a pretty slick tool for identifying trends and searching *lots* of log data from multiple hosts. Not open source, but the free version is pretty full-featured and can be tucked behind an Nginx proxy to provide secure password protected access.

[1] http://www.splunk.com

by Peter Baumgartner on Aug 27, 2010 at 1:45 PM

Tomas, excellent point. There is definitely a traffic level at which you'd want to explore using an entirely separate network for logs. However, in my experience this is usually at a very very large level.

Peter, yeah Splunk is pretty awesome. And I haven't played with it in 4-5 years, so I'm sure it's even better today. We used it at my previous place of employment and it made sifting through Sendmail logs a breeze compared to grep!

by Frank Wiles on Aug 30, 2010 at 2:23 PM

Comments on this post are now closed. If you want to get in touch with us to discuss this post please send us something on Twitter @revsys or use our contact form. Thanks!