EXIMSTATS

Section: EXIM (8)
Updated: 2004-10-07
 

NAME

eximstats - generates statistics from Exim mainlog files.  

SYNOPSIS

 eximstats [Options] mainlog1 mainlog2 ... > report.txt
 eximstats -merge [Options] report.1.txt report.2.txt ... > weekly_report.txt

Options:

-hnumber
histogram divisions per hour. The default is 1, and 0 suppresses histograms. Valid values are:

0, 1, 2, 3, 5, 10, 15, 20, 30 or 60.

-ne
Don't display error information.
-nr
Don't display relaying information.
-nr/pattern/
Don't display relaying information that matches.
-nt
Don't display transport information.
-nt/pattern/
Don't display transport information that matches
-qlist
List of times for queuing information single 0 item suppresses.
-tnumber
Display top <number> sources/destinations default is 50, 0 suppresses top listing.
-tnl
Omit local sources/destinations in top listing.
-t_remote_users
Include remote users in the top source/destination listings.
-byhost
Show results by sending host. This may be combined with -bydomain and/or -byemail and/or -byedomain. If none of these options are specified, then -byhost is assumed as a default.
-bydomain
Show results by sending domain. May be combined with -byhost and/or -byemail and/or -byedomain.
-byemail
Show results by sender's email address. May be combined with -byhost and/or -bydomain and/or -byedomain.
-byemaildomain or -byedomain
Show results by sender's email domain. May be combined with -byhost and/or -bydomain and/or -byemail.
-pattern Description /Pattern/
Look for the specified pattern and count the number of lines in which it appears. This option can be specified multiple times. Eg:

 -pattern 'Refused connections' '/refused connection/'

-merge
This option allows eximstats to merge old eximstat reports together. Eg:

 eximstats mainlog.sun > report.sun.txt
 eximstats mainlog.mon > report.mon.txt
 eximstats mainlog.tue > report.tue.txt
 eximstats mainlog.wed > report.web.txt
 eximstats mainlog.thu > report.thu.txt
 eximstats mainlog.fri > report.fri.txt
 eximstats mainlog.sat > report.sat.txt
 eximstats -merge       report.*.txt > weekly_report.txt
 eximstats -merge -html report.*.txt > weekly_report.html

*
You can merge text or html reports and output the results as text or html.
*
You can use all the normal eximstat output options, but only data included in the original reports can be shown!
*
When merging reports, some loss of accuracy may occur in the top n lists. This will be towards the ends of the lists.
*
The order of items in the top n lists may vary when the data volumes round to the same value.
-html
Output the results in HTML.
-charts
Create graphical charts to be displayed in HTML output.

This requires the following modules which can be obtained from http://www.cpan.org/modules/01modules.index.html

GD
GDTextUtil
GDGraph

To install these, download and unpack them, then use the normal perl installation procedure:

 perl Makefile.PL
 make
 make test
 make install

-chartdirI <dir>
Create the charts in the directory <dir>
-chartrelI <dir>
Specify the relative directory for the ``img src='' tags from where to include the charts
-d
Debug flag. This outputs the eval()'d parser onto STDOUT which makes it easier to trap errors in the eval section. Remember to add 1 to the line numbers to allow for the title!
 

DESCRIPTION

Eximstats parses exim mainlog files and outputs a statistical analysis of the messages processed. By default, a text analysis is generated, but you can request an html analysis by using the -html flag. See the help (-help) to learn about how to create charts from the tables.  

AUTHOR

There is a web site at http://www.exim.org - this contains details of the mailing list exim-users@exim.org.  

TO DO

This program does not perfectly handle messages whose received and delivered log lines are in different files, which can happen when you have multiple mail servers and a message cannot be immeadiately delivered. Fixing this could be tricky...  

SUBROUTINES

The following section will only be of interest to the program maintainers:  

volume_rounded();

 $rounded_volume = volume_rounded($bytes,$gigabytes);

Given a data size in bytes, round it to KB, MB, or GB as appropriate.

Eg 12000 => 12KB, 15000000 => 14GB, etc.

Note: I've experimented with Math::BigInt and it results in a 33% performance degredation as opposed to storing numbers split into bytes and gigabytes.  

un_round();

 un_round($rounded_volume,\$bytes,\$gigabytes);

Given a volume in KB, MB or GB, as generated by volume_rounded(), do the reverse transformation and convert it back into Bytes and Gigabytes. These are added to the $bytes and $gigabytes parameters.

Given a data size in bytes, round it to KB, MB, or GB as appropriate.

EG: 500 => (500,0), 14GB => (0,14), etc.  

add_volume();

  add_volume(\$bytes,\$gigs,$size);

Add $size to $bytes/$gigs where this is a number split into bytes ($bytes) and gigabytes ($gigs). This is significantly faster than using Math::BigInt.  

format_time();

 $formatted_time = format_time($seconds);

Given a time in seconds, break it down into weeks, days, hours, minutes, and seconds.

Eg 12005 => 3h20m5s  

unformat_time();

 $seconds = unformat_time($formatted_time);

Given a time in weeks, days, hours, minutes, or seconds, convert it to seconds.

Eg 3h20m5s => 12005  

seconds();

 $time = seconds($timestamp);

Given a time-of-day timestamp, convert it into a time() value using POSIX::mktime. We expect the timestamp to be of the form ``$year-$mon-$day $hour:$min:$sec'', with month going from 1 to 12, and the year to be absolute (we do the necessary conversions). The timestamp may be followed with an offset from UTC like ``+$hh$mm''; if the offset is not present, and we have not been told that the log is in UTC (with the -utc option), then we adjust the time by the current local time offset so that it can be compared with the time recorded in message IDs, which is UTC.

To improve performance, we only use mktime on the date ($year-$mon-$day), and only calculate it if the date is different to the previous time we came here. We then add on seconds for the '$hour:$min:$sec'.

We also store the results of the last conversion done, and only recalculate if the date is different.

We used to have the '-cache' flag which would store the results of the mktime() call. However, the current way of just using mktime() on the date obsoletes this.  

id_seconds();

 $time = id_seconds($message_id);

Given a message ID, convert it into a time() value.  

calculate_localtime_offset();

 $localtime_offset = calculate_localtime_offset();

Calculate the the localtime offset from gmtime in seconds.

 $localtime = time() + $localtime_offset.

These are the same semantics as ISO 8601 and RFC 2822 timezone offsets. (West is negative, East is positive.)  

print_queue_times();

 $time = print_queue_times($message_type,\@queue_times,$queue_more_than);

Given the type of messages being output, the array of message queue times, and the number of messages which exceeded the queue times, print out a table.  

print_histogram();

 print_histogram('Deliverieds|Messages received',@interval_count);

Print a histogram of the messages delivered/received per time slot (hour by default).  

print_league_table();

 print_league_table($league_table_type,\%message_count,\%message_data,\%message_data_gigs);

Given hashes of message count and message data, which are keyed by the table type (eg by the sending host), print a league table showing the top $topcount (defaults to 50).  

top_n_sort();

  @sorted_keys = top_n_sort($n,$href1,$href2,$href3);

Given a hash which has numerical values, return the sorted $n keys which point to the top values. The second and third hashes are used as tiebreakers. They all must have the same keys.

The idea behind this routine is that when you only want to see the top n members of a set, rather than sorting the entire set and then plucking off the top n, sort through the stack as you go, discarding any member which is lower than your current n'th highest member.

This proves to be an order of magnitude faster for large hashes. On 200,000 lines of mainlog it benchmarked 9 times faster. On 700,000 lines of mainlog it benchmarked 13.8 times faster.

We assume the values are > 0.  

html_header();

 $header = html_header($title);

Print our HTML header and start the <body> block.  

help();

 help();

Display usage instructions and exit.  

generate_parser();

 $parser = generate_parser();

This subroutine generates the parsing routine which will be used to parse the mainlog. We take the base operation, and remove bits not in use. This improves performance depending on what bits you take out or add.

I've tested using study(), but this does not improve performance.

We store our parsing routing in a variable, and process it looking for #IFDEF (Expression) or #IFNDEF (Expression) statements and corresponding #ENDIF (Expression) statements. If the expression evaluates to true, then it is included/excluded accordingly.  

parse();

 parse($parser,\*FILEHANDLE);

This subroutine accepts a parser and a filehandle from main and parses each line. We store the results into global variables.  

print_header();

 print_header();

Print our headers and contents.  

print_grandtotals();

 print_grandtotals();

Print the grand totals.  

print_user_patterns()

 print_user_patterns();

Print the counts of user specified patterns.  

print_transport();

 print_transport();

Print totals by transport.  

print_relay();

 print_relay();

Print our totals by relay.  

print_errors();

 print_errors();

Print our errors. In HTML, we display them as a list rather than a table - Netscape doesn't like large tables!  

parse_old_eximstat_reports();

 parse_old_eximstat_reports($fh);

Parse old eximstat output so we can merge daily stats to weekly stats and weekly to monthly etc.

To test that the merging still works after changes, do something like the following. All the diffs should produce no output.

 options='-bydomain -byemail -byhost -byedomain'
 options="$options -pattern 'Completed Messages' /Completed/"
 options="$options -pattern 'Received Messages' /<=/"

 ./eximstats $options mainlog > mainlog.txt
 ./eximstats $options -merge mainlog.txt > mainlog.2.txt
 diff mainlog.txt mainlog.2.txt

 ./eximstats $options -html mainlog > mainlog.html
 ./eximstats $options -merge -html mainlog.txt  > mainlog.2.html
 diff mainlog.html mainlog.2.html

 ./eximstats $options -merge mainlog.html > mainlog.3.txt
 diff mainlog.txt mainlog.3.txt

 ./eximstats $options -merge -html mainlog.html > mainlog.3.html
 diff mainlog.html mainlog.3.html

 ./eximstats $options -nvr   mainlog > mainlog.nvr.txt
 ./eximstats $options -merge mainlog.nvr.txt > mainlog.4.txt
 diff mainlog.txt mainlog.4.txt

 # double_mainlog.txt should have twice the values that mainlog.txt has.
 ./eximstats $options mainlog mainlog > double_mainlog.txt

 

update_relayed();

 update_relayed($count,$sender,$recipient);

Adds an entry into the %relayed hash. Currently only used when merging reports.  

add_to_totals();

 add_to_totals(\%totals,\@keys,$values);

Given a line of space seperated values, add them into the provided hash using @keys as the hash keys.

If the value contains a '%', then the value is set rather than added. Otherwise, we convert the value to bytes and gigs. The gigs get added to Key-gigs.  

get_report_total();

 $total = get_report_total(\%hash,$key);

If %hash contains values split into Units and Gigs, we calculate and return

  $hash{$key} + 1024*1024*1024 * $hash{"${key}-gigs"}

 

html2txt();

 $text_line = html2txt($html_line);

Convert a line from html to text. Currently we just convert HTML tags to spaces and convert &gt;, &lt;, and &nbsp; tags back.  

get_next_arg();

 $arg = get_next_arg();

Because eximstats arguments are often passed as variables, we can't rely on shell parsing to deal with quotes. This subroutine returns $ARGV[1] and does a shift. If $ARGV[1] starts with a quote (' or "), and doesn't end in one, then we append the next argument to it and shift again. We repeat until we've got all of the argument.

This isn't perfect as all white space gets reduced to one space, but it's as good as we can get! If it's esential that spacing be preserved precisely, then you get that by not using shell variables.


 

Index

NAME
SYNOPSIS
DESCRIPTION
AUTHOR
TO DO
SUBROUTINES
volume_rounded();
un_round();
add_volume();
format_time();
unformat_time();
seconds();
id_seconds();
calculate_localtime_offset();
print_queue_times();
print_histogram();
print_league_table();
top_n_sort();
html_header();
help();
generate_parser();
parse();
print_header();
print_grandtotals();
print_user_patterns()
print_transport();
print_relay();
print_errors();
parse_old_eximstat_reports();
update_relayed();
add_to_totals();
get_report_total();
html2txt();
get_next_arg();
blog comments powered by Disqus