dbolo - Distributed Bolo Monitoring Agent
bolo is a lightweight monitoring system kernel that aggregates counter
data and sample readings, maintains event and state information, and
broadcasts its findings to all connected *bolo subscribers*. These
subscribers can perform a wide variety of functions, from storing metric
data in RRDs to sending notifications on state changes.
dbolo is a standalone daemon that schedules and executes Bolo collectors
and submits the results up to a central Bolo core endpoint.
Print version and copyright information.
-e, --endpoint *tcp://host:port*
The bolo listener to connect to. Defaults to *tcp://bolo:2999*.
Supports DNS resolution for both IPv4 and IPv6 endpoints.
-c, --commands */path/to/dbolo.conf*
Path to a file containing the commands to run and their intervals.
Defaults to */etc/dbolo.conf*. See dbolo.conf(5) and the section
COMMANDS FILE, below, for details.
-s, --splay *FACTOR*
The initial run of each command will be randomized to start within
*INTERVAL* * *FACTOR* seconds. This helps more evenly distribute the
load on the local machine, without triggering freshness windows by
delaying too long.
Normally, you will want to specify a splay factor less than 1.0, to
avoid scheduling anomalies when dbolo is restarted.
See SCHEDULING CONCERNS, below, for more information.
Increase logging verbosity. In daemon mode, this bumps up the syslog
logging level (i.e. from INFO to WARNING or from WARNING to ERR).
Suppress non-critical logging and output.
Normally, dbolo forks into the background and detaches from the
controlling terminal, so that it can run as a system service. This
flag disables that behavior. As a side effect, all log messages will
be printed to the screen, bypassing syslog entirely.
-p, --pidfile */path/to/pidfile*
Specify where dbolo should write its PID to, for control by init
scripts. Defaults to /var/run/dbolo. Has no effect if -F is given.
-u, --user *USER*
-g, --group *GROUP*
User and group to drop privileges to. By default, dbolo will run as
root:root, which is probably not what you want.
Keep in mind that all commands executed by dbolo will be run by this
user and group.
-b, --beacon *tcp://host:port*
To enable beaconing (heartbeats) specify the 0mq Beacon endpoint of
the Core server.
If no endpoint is specified, beaconing is disabled, by default.
-r, --reconnects *INTERGER*
The maximum allowable reconnects when heartbeating is enabled,
otherwise normal opertion is suspended and the agent waits until a
beacon is recieved, or is manually restarted.
Defaults to 4.
-t, --timeout *MILLISECONDS*
dbolo derives most of its behavior from its command file. This file
specifies what commands should be run, and how often.
Here's an example:
# run the `linux' collector every minute
# run these every 15s
@15s /usr/lib/bolo/collectors/process -n sshd
@15s /usr/lib/bolo/collectors/process -n dbolo
# run the log checks hourly
@1h /usr/local/collectors/log_check /var/log/messages
@1h /usr/local/collectors/log_check /var/log/syslog
@1h /usr/local/collectors/log_check /var/log/secure
Comments start with '#' and continue to the end of the line. Blank lines
Remaining lines specify how often to run a command (the
@-specification), and the command to run. The following time units are
recognized: h (hours), m (minutes) and s (seconds). Time must be
specifed in whole numbers; i.e. a half hour is *@30m*, not *@0.5h*.
Everything after the interval (excluding whitespace) is the full
command, with arguments, up to the newline.
Or, more rigorously:
Line continuation is not possible; each command must be specified on
exactly one line, with no newlines.
The dbolo scheduler ensures that checks start properly, according to
their configured interval, regardless of the execution time of a single
run. If you configure a command to run every 60s, and it takes 5s to
execute, there will only be a 55s delay between the end of one run and
the start of another. This keeps metric submission more regular, and
avoids the problem of scheduling drift.
It is possible to schedule command runs too close together. For example,
if a command takes 90s to timeout when some network endpoint it deals
with is offline, running it every 60s would introduce an overlap
condition. In this case, the scheduler behaves thusly:
0s dbolo executes command
dbolo schedules next run at 0s (now) + 60s (interval) = 60s
90s command exits
dbolo processes results and submits them to bolo
dbolo executes command (immediately)
(should have run at 60s)
dbolo schedules next run at 90s (now) + 60s (interval) = 150s
95s network issue clears up
dbolo processes results and submits them to bolo
150s dbolo executes command
As you can see, at 0s, dbolo schedules the next run of the command to be
in *interval* seconds, at 60s. However, since the command is still
timing out (and hasn't exited yet), dbolo will delay execution until 90s
when the command does exit. At that point, execution is late, so dbolo
will immediately execute the command a second time, and reset the next
run to be in *interval* seconds, which is now at 150s.
The important thing to remember is that dbolo will wait until the
command executes before consulting the schedule, and will not get so far
behind that it cannot catch up. Imagine what happens to the schedule if
the command keeps timing out for another 10 or 20 runs.
bolo(7) for general information,
bolo(1) and bolo.conf(5) for documentation on the CLI tools,
dbolo(1), dbolo.conf(5) for details on the distributed bolo agent,
and read about subscribers in bolo2rrd(8), bolo2pg(8), bolo2meta(8), and
Bolo was designed and written by James Hunt and Dan Molik.