aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSergey Poznyakoff <gray@gnu.org>2018-07-18 12:40:47 +0300
committerSergey Poznyakoff <gray@gnu.org>2018-07-18 12:40:47 +0300
commite7b4a06a27acce5ab555bac18d52485aa18df1d0 (patch)
tree527978aacde6c007c784fa6709469205c934b48d
parent08048173d3201f5a000cfdbb67c564f0f9f8063a (diff)
downloadtallyman-e7b4a06a27acce5ab555bac18d52485aa18df1d0.tar.gz
tallyman-e7b4a06a27acce5ab555bac18d52485aa18df1d0.tar.bz2
Add documentation
-rw-r--r--README165
-rw-r--r--src/Makefile.am4
-rw-r--r--src/config.c4
-rw-r--r--src/stevedore.8238
-rw-r--r--src/tallyman.186
-rw-r--r--src/tallyman.c2
6 files changed, 495 insertions, 4 deletions
diff --git a/README b/README
new file mode 100644
index 0000000..601cc63
--- /dev/null
+++ b/README
@@ -0,0 +1,165 @@
+* Overview
+
+Tallyman is a tool for monitoring health status of docker containers
+and reporting it via SNMP.
+
+* Tha Package
+
+The package provides two executable files:
+
+** tallyman
+
+ A helper program to be run as HEALTHCHECK CMD within containers
+
+** stevedore
+
+ SNMP agent for serving the collected statistics
+
+In addition, the file TALLYMAN-MIB.txt contains the Management
+Information Base for monitoring container status.
+
+* Container Configuration
+
+It is supposed that each container is responsible for certain
+"service". Each service is assigned a name. Multiple containers can run
+the same service (for example you can have several database
+containers).
+
+Containers are configured to run tallyman as their healthcheck
+command. The utility takes two or more arguments. First argument is
+the name of the service the container is responsible for. Rest of
+arguments supply the name of the actual health-checking program and
+its command line arguments. Tallyman will run this command, collect
+its standard error and standard output, pack them along with the
+program exit code in a JSON packet, and send this packat to the
+predefined address using HTTP POST request. It will then exit with the
+same code as the health-checking program it ran. To the container, the
+effect of running tallyman is the same as if it ran the
+health-checking program itself: error code, standard error and
+standard output are all preserved. On the other hand, they are copied
+to the collector listening on the predefined address outside the
+container. This collector is the "stevedore" program, described below.
+
+Suppose for example, that you run several database containers running
+MySQL and name the corresponding service "DB". You could then specify
+the following statement in the Dockerfile for creating these
+containers:
+
+ HEALTHCHECK CMD /sbin/tallyman DB mysqladmin ping
+
+* Stevedore: the Collector Daemon
+
+Stevedore performs two important tasks. First, it collects health
+reports coming from various containers and stores them in its cache.
+Secondly, it acts as a subagent of the snmpd daemon, serving these
+data on request.
+
+By default, stevedore listens on port 8990 on all available
+interfaces. On the other hand, tallyman sends its report to port 8990
+on the gateway address of the container it runs in. This means that
+for so long as you have only one docker farm, you don't need to
+explicitly configure IP address or port on either side.
+
+If you have several servers running docker containers, you can supply
+the address of the collector to tallyman using the -s (--server)
+option.
+
+Stevedore reads its configuration from file named /etc/stevedore.conf.
+The configuration consists of statements. Each statement begins with
+a keyword, followed by one or more arguments and is terminated with a
+semicolon. Whitespace characters (horizontal space, tabulation and
+newline) are ignored except as they serve to separate tokens. Comments
+can be introduced by '#' and '//', in which case they extend to the end of
+the physical line, or enclosed between '/*' and '*/', in which case they
+can occupy multiple lines. For a detailed discussion of the available
+keywords, see stevedore(3) or run
+
+ stevedore --config-help
+
+which will output a succinct summary. Here we will mention only the
+most important (and actually, the only required) statement:
+
+ service NAME ;
+
+This statement informs stevedore that it will be receiving updates
+about the service NAME. There must be a separate "service" statement
+for each service you are planning to monitor.
+
+Continuing the example from the previous section, after configuring
+the HEALTHCHECK in the container setup, you will need to add the line
+
+ service DB;
+
+to your /etc/stevedore.conf and restart the daemon.
+
+* Configuring snmpd
+
+Add the following statement to the /etc/snmp/snmpd.conf file:
+
+ master agentx
+
+Depending on the privileges with which stevedore is run, you might
+need to add the agentXPerms statement to fix up ownership and
+permissions of the agentx socket. Please refer to the snmpd(8) and
+snmpd.conf(5) documentation for details.
+
+* How to Build
+
+Prerequisites:
+
+ - Net-SNMP <http://www.net-snmp.org>
+ - libmicrohttpd <https://www.gnu.org/software/libmicrohttpd>
+
+Usual incantations apply:
+
+ ./configure
+ make
+ make install
+
+Obviously, the last command requires root privileges.
+
+Please, refer to the file INSTALL for details about common options to
+configure. Apart from ones discussed there, the following two are of
+interest:
+
+** --without-preprocessor
+
+By default, stevedore uses m4(1) to preprocess its configuration file
+prior to parsing. Use this option to disable preprocessing.
+
+** --with-mibdir=DIR
+
+Where to install the TALLYMAN-MIB.txt file. Default is
+
+ $(datarootdir)/snmp/mibs
+
+where $(datarootdir) is the directory for read-only architecture-independent
+data.
+
+* Note to the Packagers
+
+It is convenient to split the package into two installable packages:
+tallyman, to be used inside containers, and stevedore, to be used on
+the host server.
+
+
+* Copyright information:
+
+Copyright (C) 2018 Sergey Poznyakoff
+
+ Permission is granted to anyone to make or distribute verbatim copies
+ of this document as received, in any medium, provided that the
+ copyright notice and this permission notice are preserved,
+ thus giving the recipient permission to redistribute in turn.
+
+ Permission is granted to distribute modified versions
+ of this document, or of portions of it,
+ under the above conditions, provided also that they
+ carry prominent notices stating who last changed them.
+
+
+Local Variables:
+mode: outline
+paragraph-separate: "[ ]*$"
+version-control: never
+End:
diff --git a/src/Makefile.am b/src/Makefile.am
index a0e4249..23e9245 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -35,7 +35,7 @@ stevedore_LDADD=\
$(NET_SNMP_LIBS)
tallyman_LDADD=@GRECS_LDADD@ @RUNCAP_LDADD@
-tallyman_LDFLAGS=-static
+tallyman_LDFLAGS=
noinst_HEADERS = defs.h
@@ -59,3 +59,5 @@ mibdir=@MIBDIR@
mib_DATA = TALLYMAN-MIB.txt
EXTRA_DIST = TALLYMAN-MIB.txt tallyman_mib.mib2c
+
+dist_man_MANS=stevedore.8 tallyman.1
diff --git a/src/config.c b/src/config.c
index 784f3b8..78689d8 100644
--- a/src/config.c
+++ b/src/config.c
@@ -197,8 +197,8 @@ void
config_help(void)
{
static char docstring[] =
- "Configuration file structure for tallymand.\n"
- "For more information, see tallymand(8).";
+ "Configuration file structure for stevedore.\n"
+ "For more information, see stevedore(8).";
grecs_print_docstring(docstring, 0, stdout);
grecs_print_statement_array(tallymand_kw, 1, 0, stdout);
}
diff --git a/src/stevedore.8 b/src/stevedore.8
new file mode 100644
index 0000000..b6d5c21
--- /dev/null
+++ b/src/stevedore.8
@@ -0,0 +1,238 @@
+.TH STEVEDORE 1 "July 18, 2018" "TALLYMAN" "Tallyman User Reference"
+.SH NAME
+stevedore \- container state collector and SNMP agent daemon
+.SH SYNOPSIS
+.na
+.nh
+\fBstevedore\fR\
+ [\fB\-Fsd\fR]\
+ [\fB\-f\fR \fIFILE\fR]\
+ [\fB\-\-config\-file=\fIFILE\fR]\
+ [\fB\-\-foreground\fR]\
+ [\fB\-\-single\fR]\
+ [\fB\-\-debug\fR]
+.sp
+\fBstevedore\fR\
+ \fB\-?\fR | \fB\-\-help\fR | \fB\-\-config\-help\fR
+.ad
+.hy
+.SH DESCRIPTION
+Monitoring the health state of a collection of docker containers is
+based on the premise that each container is responsible for a certain
+.IR service ,
+which is assigned an identifier (\fISID\fR). In the contrast to
+container IDs, service IDs are not necessarily unique for each
+container. It is quite OK (and even common) for several containers to
+have same SID. This can happen, for example, if one runs a distributed
+database server, with one container running master server and the rest
+running its slaves.
+.PP
+Each container is supposed to run the
+.BR tallyman (1)
+command as part of its
+.B HEALTHCHECK
+configuration. This tool takes as its argument the command line that
+does the actual checking, collects its return and sends it over to the
+\fBstevedore\fR daemon that acts as a collector (see
+.BR tallyman (1),
+for details).
+.PP
+The purpose of \fBstevedore\fR is two-fold. First, it provides a
+RESTful service that collects health check reports from multiple
+containers, and secondly it acts as SNMP subagent, delivering the
+collected information.
+.SH CONFIGURATION
+The program reads its configuration from file \fB/etc/stevedore.conf\fR
+(exact location can differ depending on how the package was
+configured; if unsure, examine the output of
+.BR "stevedore --help" ). The file must exist and be readable.
+.PP
+The configuration consists of statements. Each statement begins with
+a keyword, followed by one or more arguments and is terminated with a
+semicolon. Arguments containing whitespace or special characters (
+.BR { ,
+.BR } ,
+or
+.BR ; )
+must be quoted.
+.PP
+Whitespace characters (horizontal space, tabulation and
+newline) are ignored except as they serve to separate tokens. Comments
+can be introduced by \fB#\fR and \fB//\fR, in which case they extend
+to the end of the physical line, or enclosed between
+.BR "/* " and " */" ,
+in which case they can occupy multiple lines. Comments may appear
+anywhere where white space may appear in the configuration file.
+.PP
+.SS Statements
+The following statements can appear in the configuration file:
+.TP
+.BI "listen " IP : PORT
+Listen on this IP address and port. Default is \fB0.0.0.0:8990\fR,
+i.e. all available IP addresses, port 8990.
+.TP
+.BI "pidfile " FILE
+Store PID of the daemon process in \fIFILE\fR. If this statement is
+not supplied, no pidfile will be used.
+.TP
+.BI "user " UID
+Run as this user. \fIUID\fR is either the user login name or numeric
+UID prefixed with a plus sign.
+.TP
+.BI "group " GID
+Run with this group privileges. \fIGID\fR is either the group name or
+numeric GID prefixed with a plus sign. In the absence of this
+statement, the primary group of the \fIUID\fR specified with the \fBuser\fR
+statement will be used. Auxiliary groups of \fIUID\fR are always honored.
+.TP
+.BI "service " SID
+Define service to monitor. This is actually the only statement that
+must be present in the configuration file. It informs \fBstevedore\fR
+that it will be receiving updates about service ID \fISID\fR and
+instructs it to create SNMP OIDs for reporting the state of this
+service.
+.sp
+There should be as many \fBservice\fR statements as there are services
+to monitor.
+.TP
+.BI "instance-state-ttl " SECONDS
+Sets the time during which the state of the instance (container) is
+retained in cache. If no update arrives during the specified number of
+seconds, the container is marked as \fBexpired\fR. Default is 30 seconds.
+.SS Syslog configuration
+Unless the program is started in foreground mode (see the \fB\-F\fR
+option), its logging output goes to syslog facility \fBdaemon\fR. The
+syslog configuration can be changed using the following
+.IR "block statement" :
+.EX
+syslog {
+ facility NAME;
+ tag STRING;
+}
+.EE
+.PP
+The substatements are:
+.TP
+.BI "facility " NAME
+Set syslog facility. \fINAME\fR is one of:
+.BR user ,
+.BR daemon ,
+.BR auth ,
+.BR authpriv ,
+.BR mail ,
+.BR cron ,
+.B local0
+through
+.B local7
+(case-insensitive), or a decimal facility number.
+.TP
+.BI "tag " STRING
+Tag syslog messages with this string, instead of the program name.
+.SH OPTIONS
+.TP
+\fB\-f\fR, \fB\-\-config\-file=\fIFILE\fR
+Read configuration from \fIFILE\fR.
+.TP
+\fB\-\-config\-help\fR
+Describe configuration file syntax and variables.
+.TP
+\fB\-F\fR, \fB\-\-foreground\fR
+By default, \fBstevedore\fR disconnects itself from the controlling
+terminal and runs as a daemon. This option disables this behavior,
+instructing it to remain in foreground and print its diagnostic
+messages on standard error, instead of using the syslog interface. Use
+it for debugging.
+.TP
+\fB\-s\fR, \fB\-\-single\fR
+By default, the program runs in two-process mode: there is a top-level
+sentinel process that starts a single working process and restarts it
+if it exits on error or signal. The purpose of this design is to catch
+and recover from possible bugs.
+.sp
+This option instructs \fBstevedore\fR to start the worker process
+directly.
+.TP
+\fB\-d\fR, \fB\-\-debug
+Increase debug verbosity.
+.TP
+\fB\-?\fR, \fB\-\-help\fR
+Display short usage summary.
+.SH MIB
+The MIB is kept in file \fBTALLYMAN-MIB.txt\fR which is normally
+installed to the location where \fBnet-snmp\fR tools expect to find
+their MIBs.
+.PP
+The following OIDs are defined:
+.TP
+.B servicesUpTime.0
+Total uptime of the Stevedore server.
+.TP
+.B servicesTotal.0
+Total number of configured services.
+.TP
+.B servicesRunning.0
+Number of running services, i.e. services that have at least one running
+container.
+.TP
+.B serviceTable
+This branch provides a conceptual table of services with the
+corresponding statistics. It is indexed by \fBserviceIndex\fR. Each row
+has the following elements:
+.RS
+.TP
+.B serviceName
+Name of the service.
+.TP
+.B serviceInstances
+Number of running instances (containers) in this service.
+.RE
+.TP
+.B instanceTable
+This branch provides a conceptual table of instances and is indexed by
+\fBinstanceIndex\fR. Each row has the following OIDs:
+.RS
+.TP
+.B instanceName
+Hostname of the instance.
+.TP
+.B instanceService
+Service name (ID) of the instance.
+.TP
+.B instanceState
+State of the instance. Possible values are:
+.BR stopped ,
+.BR running ,
+.BR expired ,
+and
+.BR error .
+.TP
+.B instanceTimeStamp
+Time of the last successful probe.
+.TP
+.B instanceErrorMessage
+Error message associated with this instance if \fBinstanceState\fR is
+\fBerror\fR.
+.RE
+.SH "SEE ALSO"
+.BR tallyman (1).
+.SH AUTHORS
+Sergey Poznyakoff
+.SH "BUG REPORTS"
+Report bugs to <gray@gnu.org>.
+.SH COPYRIGHT
+Copyright \(co 2018 Sergey Poznyakoff
+.br
+.na
+License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
+.br
+.ad
+This is free software: you are free to change and redistribute it.
+There is NO WARRANTY, to the extent permitted by law.
+.\" Local variables:
+.\" eval: (add-hook 'write-file-hooks 'time-stamp)
+.\" time-stamp-start: ".TH [A-Z_][A-Z0-9_.\\-]* [0-9] \""
+.\" time-stamp-format: "%:B %:d, %:y"
+.\" time-stamp-end: "\""
+.\" time-stamp-line-limit: 20
+.\" end:
+
diff --git a/src/tallyman.1 b/src/tallyman.1
new file mode 100644
index 0000000..542ca24
--- /dev/null
+++ b/src/tallyman.1
@@ -0,0 +1,86 @@
+.TH TALLYMAN 1 "July 18, 2018" "TALLYMAN" "Tallyman User Reference"
+.SH NAME
+tallyman \- health state collector for docker containers
+.SH SYNOPSIS
+.na
+.nh
+\fBtallyman\fR\
+ [\fB\-d\fR]\
+ [\fB\-h\fR \fINAME\fR]\
+ [\fB\-s\fR \fIHOST:PORT\fR]\
+ [\fB\-v\fR \fIJSON\fR]\
+ [\fB\-\-connection\-timeout=\fISECONDS\fR]\
+ [\fB\-\-debug\fR]\
+ [\fB\-\-execution\-timeout=\fISECONDS\fR]\
+ [\fB\-\-hostname=\fINAME\fR]\
+ [\fB\-\-server=\fIHOST:PORT\fR]\
+ [\fB\-\-value=\fIJSON\fR]\
+ \fISRVID\fR\
+ \fICOMMAND\fR\
+ \fIARGS\fR...
+.sp
+\fBtallyman\fR\
+ \fB\-?\fR | \fB\-\-help\fR
+.ad
+.hy
+.SH DESCRIPTION
+Runs \fICOMMAND\fR with \fIARGS\fR and sends its return code, standard
+output and error to the remote data collector. Exits with the exit
+status of \fICOMMAND\fR.
+.PP
+The program must be configured to run periodically via the
+.B HEALTHCHECK CMD
+statement in the
+.BR Dockerfile .
+.PP
+The data collector program
+.BR stevedore (8)
+must be listening at \fIHOST:PORT\fR. See its manual for
+details. Container default gateway is the default \fIHOST\fR.
+Default port is 8990.
+.SH OPTIONS
+.TP
+\fB\-d\fR, \fB\-\-debug\fR
+Increase debug verbosity.
+.TP
+\fB\-h\fR, \fB\-\-hostname=\fINAME\fR
+Set this server hostname. By default it is determined automatically.
+.TP
+\fB\-s\fR, \fB\-\-server=\fIHOST:PORT\fR
+Address and port of the data collector. Default is \fIGW\fR:8990,
+where \fIGW\fR is the default gateway of the container.
+.TP
+\fB\-v\fR, \fB\-\-value=\fIJSON\fR
+Add \fIJSON\fR object to each report.
+.TP
+\fB\-\-connection\-timeout=\fISECONDS\fR
+Set timeout for initial connection to the collector. Default is 5 seconds.
+.TP
+\fB\-\-execution\-timeout=\fISECONDS\fR
+Set \fICOMMAND\fR execution timeout. Default is 5 seconds.
+.TP
+\fB\-?\fR, \fB\-\-help\fR
+Display short help text.
+.SH "SEE ALSO"
+.BR stevedore (8).
+.SH AUTHORS
+Sergey Poznyakoff
+.SH "BUG REPORTS"
+Report bugs to <gray@gnu.org>.
+.SH COPYRIGHT
+Copyright \(co 2018 Sergey Poznyakoff
+.br
+.na
+License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
+.br
+.ad
+This is free software: you are free to change and redistribute it.
+There is NO WARRANTY, to the extent permitted by law.
+.\" Local variables:
+.\" eval: (add-hook 'write-file-hooks 'time-stamp)
+.\" time-stamp-start: ".TH [A-Z_][A-Z0-9_.\\-]* [0-9] \""
+.\" time-stamp-format: "%:B %:d, %:y"
+.\" time-stamp-end: "\""
+.\" time-stamp-line-limit: 20
+.\" end:
+
diff --git a/src/tallyman.c b/src/tallyman.c
index b37ad8b..7181381 100644
--- a/src/tallyman.c
+++ b/src/tallyman.c
@@ -39,7 +39,7 @@ struct option longopts[] = {
{ "execution-timeout", required_argument, 0, OPT_EXECUTION_TIMEOUT },
{ NULL }
};
-static char shortopts[] = "?ds:h:v:";
+static char shortopts[] = "+?ds:h:v:";
void
help(void)

Return to:

Send suggestions and report system problems to the System administrator.