|
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Unix FAQ Menu |
Question; Hi, When the engineer arrived he said that two of the disks in the array
had failed, one the day before. Worryingly he said that the other disk
had been having problems for several weeks - this was news to us. Can you give me any advice as to how we can learn about these problems before they become a bigger problem ? Thanks, Answer;This is a very common problem with RAID arrays. They are designed to withstand a hardware failure, and frequently do so without anyone being aware there was ever a problem. The one word answer to your question is Syslog. Syslog is responsible for gathering and saving all the error messages from your system. It is important to look at syslog log's on a regular, if not continual basis. Under Solaris the main syslog log file is /var/adm/messages. This is configured in the file /etc/syslog.conf. [ Under Linux the main syslog log is usually /var/log/messages ] Overview of SyslogSyslog is best viewed as a router of system messages, it does very little with the messages received and doesn't (generally) generate any itself. It is designed to be simple and above all fast. Syslog messages are generated by programs and sometimes by the kernel itself using the C library call syslog(3). The categorisation of syslog messages is fairly primitive by modern standards and consists simply of a 'Priority' and 'Facility' code. Whilst both codes can be used to filter messages out to different destinations it is unfortunate that many standard syslog implementations do not record either in their log files - See Where are the Priority codes? for a discussion about this. Priority, as the name implies is simply a reflection of the urgency of a message and ranges from 0 ("Emergency") down to 7 ("Debug").
The Facility code is an attempt to identify the source of the message. Internally, on most Unix platforms (inc. Solaris and Linux) the facility is a number in the range 0-23. Generally they are referred to by their name;
The limited number of predefined facility codes, and a lack of understanding
on the part of some programmers has considerably devalued their use for
filtering & monitoring messages. One classic example of this is whereas
the internal code for CRON is 15 on Solaris, Linux uses 9. This can cause
some confusion when logging centrally the as code->name mapping is
performed on the destination system, when it is performed at all... Monitoring SyslogSyslog is a vital source of information on the health, well-being and security of both your systems and network in general. On all but the simplest of networks one should arrange for everything that generates syslog messages to send them to a central host. I like to keep a continuous eye on syslog. It's truly enlightening to do this on a network that hasn't been continuously monitored before - Full disks, failed 'su' attempts, unusual "root" logins, etc.. All manner of small problems can be addressed before they become big ones.. The simplest way of monitoring syslog is to logon to your central
syslog host and to keep a
Configuring SyslogMost Unix systems come with a sensibly configured syslog, and there is rarely much to do. Syslog's configuration file is usually /etc/syslog.conf and consists of one or more lines specifying <facility>.<priority> pairs and the destination for matching messages, one line per destination. This example is from a Solaris machine but would work with any generic Unix syslog;
*.err;kern.notice;auth.notice /dev/console
*.err;kern.debug;daemon.notice;mail.crit;mark.info /var/adm/messages
*.debug @centralhost
*.alert root
mail.debug /var/adm/mail.log
kern.notice |kernel_errors
Note: Not visible here are the TAB characters (ASCII code 7) separating the facility/priority pairs from the destination. Some syslog's are real fussy about this. Best practice is to keep at least one tab in there. The destination is usually a file, but can be a device (/dev/console), another system (@centralhost), a logged in user (root) or a program (|kernel_errors). The least useful destinations are probably users and other programs - user's need to be logged in and buffering means programs don't get messages in real-time. /dev/console is probably the only useful device to send syslog messages to. You might like to know that banks and other security sensitive sites use lineprinters (/dev/lp) for this purpose, partly because printed logs are generally court admissable whereas electronic ones are not... In the above example /var/adm/messages takes any message at or above priority ERROR ("*.err") as well as Kernel and Authentication messages at or above NOTICE ( "kern.notice;auth.notice" ). Mail system messages are only included if they are at or above CRITICAL ("mail.crit"). Mark: The 'mark.info' pair causes syslog to make a log entry every 15 minutes (usually, check your documentation). This doesn't seem of much use until you arrive one morning to find you server mysteriously hung. The '--- MARK ---' log entries can help to pinpoint when the system died and may assist in the resolution of the problem. Note: Under later versions of Linux, MARK is enabled via the '-m <time>' runtime option, see syslogd(8) for more details. Centralising Syslog messages If you are responsible for more than one machine you should arrange
for all the machines to send their syslog records to a central host of
your choosing. The simplest way of achieving this is to modify /etc/syslog.conf
and to add a line of the form; The only requirement of centralhost is that it is running a syslog daemon - you can have Solaris systems log to Linux machines or visa-versa. You probably want to make sure that centralhost has enough disk space to take all the messages. Note: Do not simply change /var/adm/messages to @centralhost, as then the system will not write it's messages locally, you want the central log to be additional to the local copy. Visiting engineers usually go straight to /var/adm/messages when called into look at a problem. I suggest using an alias in /etc/syslog.conf for your central
host. If you can define an alias in whatever naming service your site
uses (typically NIS/YP, or DNS) it becomes a simple matter to move the
centralhost function to another machine should the need arise. Making your own Syslog entriesThe primary tool for making your own entries in syslog is logger(1).
logger is simply a wrapper around the syslog(3c) system call. Any one
can use logger, and it might typically be run thus; Logger is most useful when put to work in non-interactive scripts such as daemons and cron jobs. Perl users ought to use the module Sys::Syslog instead of logger. If you're coding into a compiled language such as C or Fortran you should be looking at the manual pages for syslog(3) and openlog(3). The include file /usr/include/sys/syslog.h is also a useful reference point. When making you own syslog entries be sure to use appropriate priority and facility codes. An event *you* consider to be an EMERGENCY likely isn't to the person looking at syslog on a regular basis. It's doubtful whether any custom entry should be logged above ERROR priority. MaintenanceThe only regular maintenance syslog needs is the rotation of it's log
files. This is an especially important operation on your central host
as it's logfiles are likely to grow quite big. Most systems have supplied scripts or programs to perform this task. Solaris has the /usr/lib/newsyslog script and Linux the logrotate(8) program. Both these platforms are shipped with these jobs running on a daily basis. On systems with lots of syslog traffic or limited filesystem space you may need to increase the frequency of this cronjob. More details about cron. Syslog TimestampsThe times & dates appearing on syslog log entries are usually part of the message received by the syslog daemon, and not added by syslog when it is received. This is point is significant for several reasons;
Unsynchronised clocks quickly become quite irksome when you start looking at syslog. Most likely they're causing network users some problems as well. You may want to consider implementing Clock synchronisation. TroubleshootingThere is very little to go wrong with syslog. The section focuses on two main issues;
Where are the Priority/Facility codes ?Shortly after you start working with you're likely to wonder where the Priority and Facility codes are for syslog's messages. Well the simple answer is that most generic syslog implementation do not keep them. Unbelievable I know, but after syslog uses them to determine where to send any given message the codes are just dumped. Whilst the priority/facility codes are generally not stored in log files, they are sometimes passed to program specified using the "|program" syntax. Syslog-ng in particular, passes the unparsed syslog record to an external program specified in it's 'program()' directive. If you want to keep the priority/facility pairs I'd suggest you consider replacing the syslog on your central host with syslog-ng, it's considerably more flexible and when combined with our AMS package makes monitoring your network a doddle. References: RFC3164 - technical details about the operation of Syslog. FeedbackI hope you found this FAQ to be of some use. It would be most helpful if you
could rate it below. All fields are optional... |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||