Andy's Unix FAQ

Monitoring your network with Syslog.
a.k.a. Learning about system & network problems before your users...
Also: Where are the priorities/facilities?
AMS - Syslog messages in real-time
AMS - Syslog messages in real-time
In this FAQ: Overview, Monitoring,

Configuration,

Centralisation,
Your own messages, Maintenance, Timestamps, troubleshooting


Unix FAQ Menu
Contents
Basic commands
Cron
Creating CDs
Device Files
DHCP server (Solaris)
Filesystem explained
Fsck
grub/lilo vanished!
Linux applications?
Linux databases?
Linux distributions
Serial Console
Solaris devices
Solaris disks - Intro
Solaris disks - Adding
Solaris x86 install
SQL/Shell script
Syslog/Monitoring
Time Synchronisation.
Virtual Memory
Web Multi-Language
Web Server Errors
Humour
Unix a Prank



 

Excellent This page has received an average rating of 97% from 11 readers

Question;

Hi,

We recently had a problem with the RAID array on our E250 running Solaris 8, basically both filesystems on the array were corrupted and we had to restore the whole thing. The server was down for three days.

When the engineer arrived he said that two of the disks in the array had failed, one the day before. Worryingly he said that the other disk had been having problems for several weeks - this was news to us.

Also we keep getting problem reports from users that disks are full. These reports are usually the first we know about this.

Can you give me any advice as to how we can learn about these problems before they become a bigger problem ?

Thanks,

Answer;

This is a very common problem with RAID arrays. They are designed to withstand a hardware failure, and frequently do so without anyone being aware there was ever a problem.

The one word answer to your question is Syslog. Syslog is responsible for gathering and saving all the error messages from your system. It is important to look at syslog log's on a regular, if not continual basis. Under Solaris the main syslog log file is /var/adm/messages. This is configured in the file /etc/syslog.conf.

[ Under Linux the main syslog log is usually /var/log/messages ]

Overview of Syslog

Syslog is best viewed as a router of system messages, it does very little with the messages received and doesn't (generally) generate any itself. It is designed to be simple and above all fast. Syslog messages are generated by programs and sometimes by the kernel itself using the C library call syslog(3).

The categorisation of syslog messages is fairly primitive by modern standards and consists simply of a 'Priority' and 'Facility' code. Whilst both codes can be used to filter messages out to different destinations it is unfortunate that many standard syslog implementations do not record either in their log files - See Where are the Priority codes? for a discussion about this.

Priority, as the name implies is simply a reflection of the urgency of a message and ranges from 0 ("Emergency") down to 7 ("Debug").

Priority Name Meaning.
0 EMERGENCY system is unusable
1 ALERT action must be taken immediately
2 CRITICAL critical conditions
3 ERROR error conditions
4 WARNING warning conditions
5 NOTICE normal but signification condition
6 INFO informational
7 DEBUG debug-level messages

The Facility code is an attempt to identify the source of the message. Internally, on most Unix platforms (inc. Solaris and Linux) the facility is a number in the range 0-23. Generally they are referred to by their name;

Facility name
Source
KERNEL The Unix kernel, including device drivers.
USER User level program - USER is the default facility code for the user level logger(1) program
MAIL The mail subsystem - usually sendmail
DAEMON Daemons (background programs)
AUTH Authentication stuff - login, su, and the like
SYSLOG From syslog itself
LPR The printing system - lpd, lpsched, and CUPS use this
NEWS USENET related daemons
CRON From the cron daemon - see cron for more information about this.
LOCAL* 8 codes intended for custom use.

The limited number of predefined facility codes, and a lack of understanding on the part of some programmers has considerably devalued their use for filtering & monitoring messages. One classic example of this is whereas the internal code for CRON is 15 on Solaris, Linux uses 9. This can cause some confusion when logging centrally the as code->name mapping is performed on the destination system, when it is performed at all...

syslog(3) itself simply takes it arguments, formats them into into a string and fires them over port 514(UDP mostly) to the syslog daemon. AIX users should note that that operating system uses TCP exclusively for syslog.

Monitoring Syslog

Syslog is a vital source of information on the health, well-being and security of both your systems and network in general. On all but the simplest of networks one should arrange for everything that generates syslog messages to send them to a central host.

I like to keep a continuous eye on syslog. It's truly enlightening to do this on a network that hasn't been continuously monitored before - Full disks, failed 'su' attempts, unusual "root" logins, etc.. All manner of small problems can be addressed before they become big ones..

The simplest way of monitoring syslog is to logon to your central syslog host and to keep a
tail -f /var/adm/messages

running in an terminal window.

You might also like to look at package we created called AMS which allows you to monitor syslog from a web browser or standalone GUI. When used in conjunction with Syslog-ng messages are coloured according to the message's priority, and facility code texts are customisable. The GUI runs on any system supporting Java. Reduced screenshot of AMS

Configuring Syslog

Most Unix systems come with a sensibly configured syslog, and there is rarely much to do. Syslog's configuration file is usually /etc/syslog.conf and consists of one or more lines specifying <facility>.<priority> pairs and the destination for matching messages, one line per destination. This example is from a Solaris machine but would work with any generic Unix syslog;

     *.err;kern.notice;auth.notice                       /dev/console
     *.err;kern.debug;daemon.notice;mail.crit;mark.info  /var/adm/messages
     *.debug                                             @centralhost

     *.alert                                             root
     mail.debug                                          /var/adm/mail.log
     kern.notice                                         |kernel_errors

Note: Not visible here are the TAB characters (ASCII code 7) separating the facility/priority pairs from the destination. Some syslog's are real fussy about this. Best practice is to keep at least one tab in there.

The destination is usually a file, but can be a device (/dev/console), another system (@centralhost), a logged in user (root) or a program (|kernel_errors). The least useful destinations are probably users and other programs - user's need to be logged in and buffering means programs don't get messages in real-time. /dev/console is probably the only useful device to send syslog messages to. You might like to know that banks and other security sensitive sites use lineprinters (/dev/lp) for this purpose, partly because printed logs are generally court admissable whereas electronic ones are not...

In the above example /var/adm/messages takes any message at or above priority ERROR ("*.err") as well as Kernel and Authentication messages at or above NOTICE ( "kern.notice;auth.notice" ). Mail system messages are only included if they are at or above CRITICAL ("mail.crit").

Mark: The 'mark.info' pair causes syslog to make a log entry every 15 minutes (usually, check your documentation). This doesn't seem of much use until you arrive one morning to find you server mysteriously hung. The '--- MARK ---' log entries can help to pinpoint when the system died and may assist in the resolution of the problem. Note: Under later versions of Linux, MARK is enabled via the '-m <time>' runtime option, see syslogd(8) for more details.

Centralising Syslog messages

If you are responsible for more than one machine you should arrange for all the machines to send their syslog records to a central host of your choosing. The simplest way of achieving this is to modify /etc/syslog.conf and to add a line of the form;
       *.debug        @centralhost

centralhost should be replaced by the name of the machine you wish to have syslog send to, and don't forget to put at least one tab character between debug and @centralhost.

The only requirement of centralhost is that it is running a syslog daemon - you can have Solaris systems log to Linux machines or visa-versa. You probably want to make sure that centralhost has enough disk space to take all the messages.

Note: Do not simply change /var/adm/messages to @centralhost, as then the system will not write it's messages locally, you want the central log to be additional to the local copy. Visiting engineers usually go straight to /var/adm/messages when called into look at a problem.

I suggest using an alias in /etc/syslog.conf for your central host. If you can define an alias in whatever naming service your site uses (typically NIS/YP, or DNS) it becomes a simple matter to move the centralhost function to another machine should the need arise.

Making your own Syslog entries

The primary tool for making your own entries in syslog is logger(1). logger is simply a wrapper around the syslog(3c) system call. Any one can use logger, and it might typically be run thus;
  % logger -p user.debug A message with facility USER at priority DEBUG

Logger is most useful when put to work in non-interactive scripts such as daemons and cron jobs. Perl users ought to use the module Sys::Syslog instead of logger.

If you're coding into a compiled language such as C or Fortran you should be looking at the manual pages for syslog(3) and openlog(3). The include file /usr/include/sys/syslog.h is also a useful reference point.

When making you own syslog entries be sure to use appropriate priority and facility codes. An event *you* consider to be an EMERGENCY likely isn't to the person looking at syslog on a regular basis. It's doubtful whether any custom entry should be logged above ERROR priority.

Maintenance

The only regular maintenance syslog needs is the rotation of it's log files. This is an especially important operation on your central host as it's logfiles are likely to grow quite big.

Do not be tempted to delete or truncate live syslog files. Doing so will not free any space, will likely lose syslog entries and may crash the daemon. The correct technique is to rename the live log file, create a new one and inform syslog by sending it a 'HUP' signal. Only in this fashion can you be sure not to lose log entries.

Most systems have supplied scripts or programs to perform this task. Solaris has the /usr/lib/newsyslog script and Linux the logrotate(8) program. Both these platforms are shipped with these jobs running on a daily basis. On systems with lots of syslog traffic or limited filesystem space you may need to increase the frequency of this cronjob. More details about cron.

Syslog Timestamps

The times & dates appearing on syslog log entries are usually part of the message received by the syslog daemon, and not added by syslog when it is received. This is point is significant for several reasons;

  • Whilst looking through a centralised log you may notice the Timestamps jump around. Most likely this is innocent, and probably the result of unsynchronised clocks on different systems.
  • On a busy network, or one in which syslog messages are sent over slow links, there may be a significant delay between the transmission and receipt of a syslog message.
  • Security: syslog Timestamps can be easily faked by anyone with basic programming and network knowledge. Thus taken on their own they cannot be considered a authoritative record of an event.

Unsynchronised clocks quickly become quite irksome when you start looking at syslog. Most likely they're causing network users some problems as well. You may want to consider implementing Clock synchronisation.

Troubleshooting

There is very little to go wrong with syslog. The section focuses on two main issues;

Log file stops growing or is empty.
Three likely causes, in order of likelihood, most likely first.
  1. log's filesystem is full.
    Use 'df' to check the /var filesystem, or wherever /etc/syslog.conf is pointing to. Free up some space either by rotating the logfiles or removing something else.
  2. Somebody deleted the log file;
    Actually in this case syslog is probably still writing to it, but you can't see it because there's no longer a directory entry for it. Deleting open files isn't a good idea because it doesn't free any space, and you can no longer see that huge file. Deleting syslog's live log file is damn-right stupid, because that huge file is still growing - Sadness..
    In practical terms there is very little you can do other than re-starting syslog (see Maintenance), this will lose all the current log entries however.
    For a further discussion about the deletion of open files see the Filesystem FAQ - Inodes
  3. Syslog died.
    See if you can find a core file for syslog. Check for syslog patches with your vendor, etc..
 
Program targets don't receive syslog records
Syslog buffers its output to programs and thus it tends to arrive in large blocks. Check that your program/script can actually cope with large records. Check also that it's running as a child of syslog. Most generic syslog's only start program targets (via fork/exec) when syslog itself starts - if the program dies you'll need to stop and restart syslog.

Where are the Priority/Facility codes ?

Shortly after you start working with you're likely to wonder where the Priority and Facility codes are for syslog's messages. Well the simple answer is that most generic syslog implementation do not keep them. Unbelievable I know, but after syslog uses them to determine where to send any given message the codes are just dumped.

Whilst the priority/facility codes are generally not stored in log files, they are sometimes passed to program specified using the "|program" syntax. Syslog-ng in particular, passes the unparsed syslog record to an external program specified in it's 'program()' directive.

If you want to keep the priority/facility pairs I'd suggest you consider replacing the syslog on your central host with syslog-ng, it's considerably more flexible and when combined with our AMS package makes monitoring your network a doddle.

References:

RFC3164 - technical details about the operation of Syslog.


Feedback

I hope you found this FAQ to be of some use. It would be most helpful if you could rate it below. All fields are optional...
Please do not use this form to seek free technical assistance - Try AllExperts...

Excellent Your Email:
Good Comments or Suggestions
Useful
Slightly useful
Not useful
        


Home Thai Guide   Great Circle Calculator WorldClock AMS Services Contact us