How To Monitor A System With Sysstat On Centos 4.3
A common task for System Administrators is to monitor and care for a server. That's fairly easy to do at a moment's notice, but how to keep a record of this information over time? One way to monitor your server is to use the Sysstat package.
Sysstat is actually a collection of utilities designed to collect information about the performance of a linux installation, and record them over time.
It's fairly easy to install too, since it is included as a package on many distributions.
To install on Centos 4.3, just type the following:
yum install sysstat
We now have the sysstat scripts installed on the system. Lets try the sar command.
Linux 2.6.16-xen (xen30) 08/17/2006 11:00:02 AM CPU %user %nice %system %iowait %idle 11:10:01 AM all 0.00 0.00 0.00 0.00 99.99 Average: all 0.00 0.00 0.00 0.00 99.99
Several bits of information, such as Linux kernel, hostname, and date are reported.
More importantly, the various ways CPU time being spent on the system is shown.
%user, %nice, %system, %iowait, and %idle describe ways that the CPU may be utilized.
%user and %nice refer to your software programs, such as MySQL or Apache.
%system refers to the kernelâ€™s internal workings.
%iowait is time spent waiting for Input/Output, such as a disk read or write. Finally, since the kernel accounts for 100% of the runnable time it can schedule, any unused time goes into %idle.
The information above is shown for a 1 second interval. How can we keep track of that information over time?
If our system was consistently running heavy in %iowait, we might surmise that a disk was getting overloaded, or going bad.
At least, we would know to investigate.
So how do we track the information over time? We can schedule sar to run at regular intervals, say, every 10 minutes.
We then direct it to send the output to sysstatâ€™s special log files for later reports.
The way to do this is with the Cron daemon.
By creating a file called sysstat in /etc/cron.d, we can tell cron to run sar every day.
Fortunately, the Systat package that yum installed already did this step for us.
# run system activity accounting tool every 10 minutes */10 * * * * root /usr/lib/sa/sa1 1 1 # generate a daily summary of process accounting at 23:53 53 23 * * * root /usr/lib/sa/sa2 -A
The sa1 script logs sar output into sysstatâ€™s binary log file format, and sa2 reports it back in human readable format.
The report is written to a file in /var/log/sa.
sa17 is the binary sysstat log, sar17 is the report. (Todayâ€™s date is the 17th)
There is quite alot of information contained in the sar report, but there are a few values that can tell us how busy the server is.
Values to watch are swap usage, disk IO wait, and the run queue.
These can be obtained by running sar manually, which will report on those values.
Linux 2.6.16-xen (xen30) 08/17/2006 11:00:02 AM CPU %user %nice %system %iowait %idle 11:10:01 AM all 0.00 0.00 0.00 0.00 99.99 11:20:01 AM all 0.00 0.00 0.00 0.00 100.00 11:30:02 AM all 0.01 0.26 0.19 1.85 97.68 11:39:20 AM all 0.00 2.41 2.77 0.53 94.28 11:40:01 AM all 1.42 0.00 0.18 3.24 95.15 Average: all 0.03 0.62 0.69 0.64 98.02
There were a few moments where of disk activity was high in the %iowait column, but it didnt stay that way for too long. An average of 0.64 is pretty good.
How about my swap usage, am I running out of Ram? Being swapped out is normal for the Linux kernel, which will swap from time to time. Constant swapping is bad, and generally means you need more Ram.
Linux 2.6.16-xen (xen30) 08/17/2006 11:00:02 AM pswpin/s pswpout/s 11:10:01 AM 0.00 0.00 11:20:01 AM 0.00 0.00 11:30:02 AM 0.00 0.00 11:39:20 AM 0.00 0.00 11:40:01 AM 0.00 0.00 11:50:01 AM 0.00 0.00 Average: 0.00 0.00
Nope, we are looking good. No persistant swapping has taken place.
How about system load? Are my processes waiting too long to run on the CPU?
Linux 2.6.16-xen (xen30) 08/17/2006 11:00:02 AM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 11:10:01 AM 0 47 0.00 0.00 0.00 11:20:01 AM 0 47 0.00 0.00 0.00 11:30:02 AM 0 47 0.28 0.21 0.08 11:39:20 AM 0 45 0.01 0.24 0.17 11:40:01 AM 0 46 0.07 0.22 0.17 11:50:01 AM 0 46 0.00 0.02 0.07 Average: 0 46 0.06 0.12 0.08
No, an average load of .06 is really good.
Notice that there is a 1, 5, and 15 minute interval on the right.
Having the three time intervals gives you a feel for how much load the system is carrying.
A 3 or 4 in the 1 minute average is ok, but the same number in the 15 minute
column may indicate that work is not clearing out, and that a closer look is warranted.
This was a short look at the Sysstat package.
We only looked at the out put of three of sarâ€™s attributes, but there are others.
Now, armed with sar in your toolbox, your system administration job just became a little easier.