Mardi

Latest release: 0.1 Download the latest release.

Overview

Mardi is a tool for tracking the values of system variables, logging them such that they can be graphed with Gnuplot, and sending alerts when any variable exceeds an upper or lower limit. Any variable that can be accessed by a shell command can be monitored, and any action that can be expressed as a shell command can be taken as an alarm.

Mardi 0.1 has only been tested on Linux (specifically, Fedora Core 3), but should work with little to no modification on other *nix systems. It can only monitor positive integer variables; real number (floating point) support may be added in a later version. For the moment, real numbers can be monitored by converting them to integers (by multiplying by a constant and truncating least significant digits).

Command-line usage

$ mardi [-c <config file>] [-D] [-h] [-v]
where

The default configuration file is /etc/mardi.conf.

Configuration file format

Configuration is stored in an XML document with doctype "mardi". The configuration file must have a root element of type mardi, which may have the following attributes:

The root element must contain one or more elements of type tracker, which represent values being tracked. tracker elements have the following attributes:

As content, each tracker element must contain a command to execute to retrieve the value being tracked.

If min or max is not set, or is set to zero, on a tracker, then no checking will be done against that value. If neither are set (or both are set to zero), then the value will be logged, but no alerts will be sent.

Example configuration file

<?xml version="1.0" encoding="UTF-8"?>
<!-- Sample configuration file for Mardi -->
<mardi 
    interval="60" 
    logfile="track.dat" 
    over="echo $name over limit $max: $val" 
    under="echo $name under limit $min: $val"
>

  <!-- Track the number of bytes received on eth0 -->
  <!-- when more than 10 MB are send in one minute, send mail to root -->
  <tracker 
      name="Byte count" 
      diff="true"
      max="10000000"
      over="echo -e "Current value: $val\nLimit: $max\n" | mail -s " $name exceeded" root"
  >
    awk 'BEGIN {FS="[ \t:]*"} {if ($$2 == "eth0") {printf("%s", $$3)}}' /proc/net/dev
  </tracker>
  
  <!-- Track user degraaf's disk usage -->
  <!-- if disk usage is over 10 GB, send a message to syslog -->
  <tracker 
      name="deGraaf's disk usage" 
      max="1000000"
      user="degraaf"
      over="logger "degraaf's home directory is getting full: $val kB currently used""
  >
    du -sk ~ | cut -f 1
  </tracker>
  
  <!-- Track memory usage -->
  <!-- if there is less than 10 MB of free memory, kill the process with the highest memory usage -->
  <tracker
      name="Memory usage"
      min="10000"
      max="0"
      user="root"
      under="kill -9 `ps -e -o "pid vsz" --sort=-vsz --no-headers | head -n 1 | cut -f 2 -d " "`"
  >
    grep "^MemFree" /proc/meminfo | awk '{print $$2}'
  </tracker>
</mardi>        

In this configuration, all trackers are updated ever 60 seconds, data is logged to track.dat in the current directory, and the default actions tot take when a value exceeds its bounds are to print messages to the screen. Note that this will not work in daemon mode. By default, all commands are run as user and group "nobody".

The first tracker, "Byte count", logs the number of bytes received on the eth0 network interface; this data is read from /proc/net/dev. If more than 10 MB are received in any one minute interval, then email is sent to root.

The second tracker, "deGraaf's disk usage", tracks the size of user "degraaf"'s home directory. If it exceeds 10 GB, then a warning message is written to syslog. All commands are run as the user "degraaf".

The third tracker, "Memory usage", monitors the amount of free memory, as reported in /proc/meminfo. If the total free memory ever drops below 10 MB, then the process with the highest memory usage is killed.

You can download the sample configuration here.

Commands

All commands must be able to run in a restricted Bourne shell (/bin/sh -r). Be careful not to use commands which can block for any significant amount of time, especially if a short update interval is used.

The following macros are expanded at runtime in all commands:

Thus, the command

echo "current value: $val" | mail -s "$name is over limit" root@mydomail.zz

might expand to

echo "current value: 100" | mail -s "Memory usage is over limit" root@mydomail.zz

If Mardi is run with superuser privileges, then by default, all commands are run as user and group nobody. Otherwise, they are run as the current user. If a user and group are specified for a tracker, then this identity is used instead of nobody. To run a command requiring superuser privileges, set the user to root. Note that it may be a security risk to do so. Do not set the default user to root.

Log file

The log file consists of a set of records in tab-separated columns. Each record contains a timestamp and the values of all variables being tracked at that time. The first column contains the timestamp, as the number of seconds since the epoch (00:00:00 UTC, Jan 1st, 1970). Subsequent columns contain the values of each variable being monitored, in the order that the variables are given in the configuration file.

For instance, a portion of a log file generated using the sample configuration above might be:

1122412557	1019	15388353	32024
1122412557	67884	15388353	29916
1122412557	208354	15388353	27916
1122415970	212621	15388373	46284
1122415970	159082	15388373	46284
1122415970	269273	15388373	46284
1122415970	196967	15388373	46268
1122415970	299984	15388373	46284        

The results of the first tracker (bytes received on eth0) could be graphed using a command such as the following:

echo "set term postscript color
set output \"graph.ps\"
set ylabel \"Number of bytes received\"
set xlabel \"Time of day (seconds since the Epoch)\" 
set title \"Number of bytes received on eth0 per minute over a 30 minute period\"
plot \"track.dat\" using 1:2 title \"Bytes received on eth0\"  with linespoints" | gnuplot        

to produce a graph similar to this:

graph

Mardi does not necessarily write all logged data out to disk as soon as it is recorded; it may be cached in memory for some time. To force Mardi to flush all cached data out to disk without stopping it, send it a SIGHUP signal. This can be accomplished with a command such as killall -HUP mardi.

Error reporting

In normal (console) mode, Mardi prints all run-time error messages to the console. In daemon mode, run-time errors are reported to syslog, under the LOG_USER facility. If a command fails (exits with a status other than 0), then its exit status and any output produced on stdout will be reported.

Stopping Mardi

To stop Mardi, send it a SIGINT signal. In console mode, this can be done by entering ctrl-c. If mardi is running in daemon mode, then this can be accomplished with a command such as killall -INT mardi.

Compilation and installation

See the file INSTALL in the top-level directory of the source tarball.

Name

In case you're curious about the name, "Mardi": I started with the name "Monitoring, Alerting, and Recording Daemon". This is a little unwieldy, so I abbreviated it to the acronym "MARD". If I left the name as that, people would most likely pronounce it to rhyme with "lard", which would be unfortunate. The intended pronunciation is "mar-dee", so I added the 'i' to the name to encourage the correct pronunciation.

Also, the initial public release (version 0.1) was made on a Tuesday.

License

Mardi is distributed under the terms of the GNU General Public License (GPL).

Contact information

Should you have questions, comments, or concerns about Mardi, contact me by email at degraaf-at-cpsc-dot-ucalgary-dot-ca. Please put the word "Mardi" in the subject of your message.

Contributing

If you wish to contribute to Mardi, please contact me and let me know what you wish to do. Or if you want to contribute but don't have anything in particular in mind, contact me and we'll find something suitable to your abilities. Please don't send me patches against anything older than my latest release.

Sorry, but my version control system is not publically accessible at the moment. I use Subversion, which SourceForge doesn't yet support. I'll set up public SVN access whenever SourceForge supports it or I get my own server. If you need a copy of my latest development sources, please contact me.