diff options
author | Sergey Poznyakoff <gray@gnu.org> | 2018-07-18 12:40:47 +0300 |
---|---|---|
committer | Sergey Poznyakoff <gray@gnu.org> | 2018-07-18 12:40:47 +0300 |
commit | e7b4a06a27acce5ab555bac18d52485aa18df1d0 (patch) | |
tree | 527978aacde6c007c784fa6709469205c934b48d | |
parent | 08048173d3201f5a000cfdbb67c564f0f9f8063a (diff) | |
download | tallyman-e7b4a06a27acce5ab555bac18d52485aa18df1d0.tar.gz tallyman-e7b4a06a27acce5ab555bac18d52485aa18df1d0.tar.bz2 |
Add documentation
-rw-r--r-- | README | 165 | ||||
-rw-r--r-- | src/Makefile.am | 4 | ||||
-rw-r--r-- | src/config.c | 4 | ||||
-rw-r--r-- | src/stevedore.8 | 238 | ||||
-rw-r--r-- | src/tallyman.1 | 86 | ||||
-rw-r--r-- | src/tallyman.c | 2 |
6 files changed, 495 insertions, 4 deletions
@@ -0,0 +1,165 @@ +* Overview + +Tallyman is a tool for monitoring health status of docker containers +and reporting it via SNMP. + +* Tha Package + +The package provides two executable files: + +** tallyman + + A helper program to be run as HEALTHCHECK CMD within containers + +** stevedore + + SNMP agent for serving the collected statistics + +In addition, the file TALLYMAN-MIB.txt contains the Management +Information Base for monitoring container status. + +* Container Configuration + +It is supposed that each container is responsible for certain +"service". Each service is assigned a name. Multiple containers can run +the same service (for example you can have several database +containers). + +Containers are configured to run tallyman as their healthcheck +command. The utility takes two or more arguments. First argument is +the name of the service the container is responsible for. Rest of +arguments supply the name of the actual health-checking program and +its command line arguments. Tallyman will run this command, collect +its standard error and standard output, pack them along with the +program exit code in a JSON packet, and send this packat to the +predefined address using HTTP POST request. It will then exit with the +same code as the health-checking program it ran. To the container, the +effect of running tallyman is the same as if it ran the +health-checking program itself: error code, standard error and +standard output are all preserved. On the other hand, they are copied +to the collector listening on the predefined address outside the +container. This collector is the "stevedore" program, described below. + +Suppose for example, that you run several database containers running +MySQL and name the corresponding service "DB". You could then specify +the following statement in the Dockerfile for creating these +containers: + + HEALTHCHECK CMD /sbin/tallyman DB mysqladmin ping + +* Stevedore: the Collector Daemon + +Stevedore performs two important tasks. First, it collects health +reports coming from various containers and stores them in its cache. +Secondly, it acts as a subagent of the snmpd daemon, serving these +data on request. + +By default, stevedore listens on port 8990 on all available +interfaces. On the other hand, tallyman sends its report to port 8990 +on the gateway address of the container it runs in. This means that +for so long as you have only one docker farm, you don't need to +explicitly configure IP address or port on either side. + +If you have several servers running docker containers, you can supply +the address of the collector to tallyman using the -s (--server) +option. + +Stevedore reads its configuration from file named /etc/stevedore.conf. +The configuration consists of statements. Each statement begins with +a keyword, followed by one or more arguments and is terminated with a +semicolon. Whitespace characters (horizontal space, tabulation and +newline) are ignored except as they serve to separate tokens. Comments +can be introduced by '#' and '//', in which case they extend to the end of +the physical line, or enclosed between '/*' and '*/', in which case they +can occupy multiple lines. For a detailed discussion of the available +keywords, see stevedore(3) or run + + stevedore --config-help + +which will output a succinct summary. Here we will mention only the +most important (and actually, the only required) statement: + + service NAME ; + +This statement informs stevedore that it will be receiving updates +about the service NAME. There must be a separate "service" statement +for each service you are planning to monitor. + +Continuing the example from the previous section, after configuring +the HEALTHCHECK in the container setup, you will need to add the line + + service DB; + +to your /etc/stevedore.conf and restart the daemon. + +* Configuring snmpd + +Add the following statement to the /etc/snmp/snmpd.conf file: + + master agentx + +Depending on the privileges with which stevedore is run, you might +need to add the agentXPerms statement to fix up ownership and +permissions of the agentx socket. Please refer to the snmpd(8) and +snmpd.conf(5) documentation for details. + +* How to Build + +Prerequisites: + + - Net-SNMP <http://www.net-snmp.org> + - libmicrohttpd <https://www.gnu.org/software/libmicrohttpd> + +Usual incantations apply: + + ./configure + make + make install + +Obviously, the last command requires root privileges. + +Please, refer to the file INSTALL for details about common options to +configure. Apart from ones discussed there, the following two are of +interest: + +** --without-preprocessor + +By default, stevedore uses m4(1) to preprocess its configuration file +prior to parsing. Use this option to disable preprocessing. + +** --with-mibdir=DIR + +Where to install the TALLYMAN-MIB.txt file. Default is + + $(datarootdir)/snmp/mibs + +where $(datarootdir) is the directory for read-only architecture-independent +data. + +* Note to the Packagers + +It is convenient to split the package into two installable packages: +tallyman, to be used inside containers, and stevedore, to be used on +the host server. + + +* Copyright information: + +Copyright (C) 2018 Sergey Poznyakoff + + Permission is granted to anyone to make or distribute verbatim copies + of this document as received, in any medium, provided that the + copyright notice and this permission notice are preserved, + thus giving the recipient permission to redistribute in turn. + + Permission is granted to distribute modified versions + of this document, or of portions of it, + under the above conditions, provided also that they + carry prominent notices stating who last changed them. + + +Local Variables: +mode: outline +paragraph-separate: "[ ]*$" +version-control: never +End: diff --git a/src/Makefile.am b/src/Makefile.am index a0e4249..23e9245 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -35,7 +35,7 @@ stevedore_LDADD=\ $(NET_SNMP_LIBS) tallyman_LDADD=@GRECS_LDADD@ @RUNCAP_LDADD@ -tallyman_LDFLAGS=-static +tallyman_LDFLAGS= noinst_HEADERS = defs.h @@ -59,3 +59,5 @@ mibdir=@MIBDIR@ mib_DATA = TALLYMAN-MIB.txt EXTRA_DIST = TALLYMAN-MIB.txt tallyman_mib.mib2c + +dist_man_MANS=stevedore.8 tallyman.1 diff --git a/src/config.c b/src/config.c index 784f3b8..78689d8 100644 --- a/src/config.c +++ b/src/config.c @@ -197,8 +197,8 @@ void config_help(void) { static char docstring[] = - "Configuration file structure for tallymand.\n" - "For more information, see tallymand(8)."; + "Configuration file structure for stevedore.\n" + "For more information, see stevedore(8)."; grecs_print_docstring(docstring, 0, stdout); grecs_print_statement_array(tallymand_kw, 1, 0, stdout); } diff --git a/src/stevedore.8 b/src/stevedore.8 new file mode 100644 index 0000000..b6d5c21 --- /dev/null +++ b/src/stevedore.8 @@ -0,0 +1,238 @@ +.TH STEVEDORE 1 "July 18, 2018" "TALLYMAN" "Tallyman User Reference" +.SH NAME +stevedore \- container state collector and SNMP agent daemon +.SH SYNOPSIS +.na +.nh +\fBstevedore\fR\ + [\fB\-Fsd\fR]\ + [\fB\-f\fR \fIFILE\fR]\ + [\fB\-\-config\-file=\fIFILE\fR]\ + [\fB\-\-foreground\fR]\ + [\fB\-\-single\fR]\ + [\fB\-\-debug\fR] +.sp +\fBstevedore\fR\ + \fB\-?\fR | \fB\-\-help\fR | \fB\-\-config\-help\fR +.ad +.hy +.SH DESCRIPTION +Monitoring the health state of a collection of docker containers is +based on the premise that each container is responsible for a certain +.IR service , +which is assigned an identifier (\fISID\fR). In the contrast to +container IDs, service IDs are not necessarily unique for each +container. It is quite OK (and even common) for several containers to +have same SID. This can happen, for example, if one runs a distributed +database server, with one container running master server and the rest +running its slaves. +.PP +Each container is supposed to run the +.BR tallyman (1) +command as part of its +.B HEALTHCHECK +configuration. This tool takes as its argument the command line that +does the actual checking, collects its return and sends it over to the +\fBstevedore\fR daemon that acts as a collector (see +.BR tallyman (1), +for details). +.PP +The purpose of \fBstevedore\fR is two-fold. First, it provides a +RESTful service that collects health check reports from multiple +containers, and secondly it acts as SNMP subagent, delivering the +collected information. +.SH CONFIGURATION +The program reads its configuration from file \fB/etc/stevedore.conf\fR +(exact location can differ depending on how the package was +configured; if unsure, examine the output of +.BR "stevedore --help" ). The file must exist and be readable. +.PP +The configuration consists of statements. Each statement begins with +a keyword, followed by one or more arguments and is terminated with a +semicolon. Arguments containing whitespace or special characters ( +.BR { , +.BR } , +or +.BR ; ) +must be quoted. +.PP +Whitespace characters (horizontal space, tabulation and +newline) are ignored except as they serve to separate tokens. Comments +can be introduced by \fB#\fR and \fB//\fR, in which case they extend +to the end of the physical line, or enclosed between +.BR "/* " and " */" , +in which case they can occupy multiple lines. Comments may appear +anywhere where white space may appear in the configuration file. +.PP +.SS Statements +The following statements can appear in the configuration file: +.TP +.BI "listen " IP : PORT +Listen on this IP address and port. Default is \fB0.0.0.0:8990\fR, +i.e. all available IP addresses, port 8990. +.TP +.BI "pidfile " FILE +Store PID of the daemon process in \fIFILE\fR. If this statement is +not supplied, no pidfile will be used. +.TP +.BI "user " UID +Run as this user. \fIUID\fR is either the user login name or numeric +UID prefixed with a plus sign. +.TP +.BI "group " GID +Run with this group privileges. \fIGID\fR is either the group name or +numeric GID prefixed with a plus sign. In the absence of this +statement, the primary group of the \fIUID\fR specified with the \fBuser\fR +statement will be used. Auxiliary groups of \fIUID\fR are always honored. +.TP +.BI "service " SID +Define service to monitor. This is actually the only statement that +must be present in the configuration file. It informs \fBstevedore\fR +that it will be receiving updates about service ID \fISID\fR and +instructs it to create SNMP OIDs for reporting the state of this +service. +.sp +There should be as many \fBservice\fR statements as there are services +to monitor. +.TP +.BI "instance-state-ttl " SECONDS +Sets the time during which the state of the instance (container) is +retained in cache. If no update arrives during the specified number of +seconds, the container is marked as \fBexpired\fR. Default is 30 seconds. +.SS Syslog configuration +Unless the program is started in foreground mode (see the \fB\-F\fR +option), its logging output goes to syslog facility \fBdaemon\fR. The +syslog configuration can be changed using the following +.IR "block statement" : +.EX +syslog { + facility NAME; + tag STRING; +} +.EE +.PP +The substatements are: +.TP +.BI "facility " NAME +Set syslog facility. \fINAME\fR is one of: +.BR user , +.BR daemon , +.BR auth , +.BR authpriv , +.BR mail , +.BR cron , +.B local0 +through +.B local7 +(case-insensitive), or a decimal facility number. +.TP +.BI "tag " STRING +Tag syslog messages with this string, instead of the program name. +.SH OPTIONS +.TP +\fB\-f\fR, \fB\-\-config\-file=\fIFILE\fR +Read configuration from \fIFILE\fR. +.TP +\fB\-\-config\-help\fR +Describe configuration file syntax and variables. +.TP +\fB\-F\fR, \fB\-\-foreground\fR +By default, \fBstevedore\fR disconnects itself from the controlling +terminal and runs as a daemon. This option disables this behavior, +instructing it to remain in foreground and print its diagnostic +messages on standard error, instead of using the syslog interface. Use +it for debugging. +.TP +\fB\-s\fR, \fB\-\-single\fR +By default, the program runs in two-process mode: there is a top-level +sentinel process that starts a single working process and restarts it +if it exits on error or signal. The purpose of this design is to catch +and recover from possible bugs. +.sp +This option instructs \fBstevedore\fR to start the worker process +directly. +.TP +\fB\-d\fR, \fB\-\-debug +Increase debug verbosity. +.TP +\fB\-?\fR, \fB\-\-help\fR +Display short usage summary. +.SH MIB +The MIB is kept in file \fBTALLYMAN-MIB.txt\fR which is normally +installed to the location where \fBnet-snmp\fR tools expect to find +their MIBs. +.PP +The following OIDs are defined: +.TP +.B servicesUpTime.0 +Total uptime of the Stevedore server. +.TP +.B servicesTotal.0 +Total number of configured services. +.TP +.B servicesRunning.0 +Number of running services, i.e. services that have at least one running +container. +.TP +.B serviceTable +This branch provides a conceptual table of services with the +corresponding statistics. It is indexed by \fBserviceIndex\fR. Each row +has the following elements: +.RS +.TP +.B serviceName +Name of the service. +.TP +.B serviceInstances +Number of running instances (containers) in this service. +.RE +.TP +.B instanceTable +This branch provides a conceptual table of instances and is indexed by +\fBinstanceIndex\fR. Each row has the following OIDs: +.RS +.TP +.B instanceName +Hostname of the instance. +.TP +.B instanceService +Service name (ID) of the instance. +.TP +.B instanceState +State of the instance. Possible values are: +.BR stopped , +.BR running , +.BR expired , +and +.BR error . +.TP +.B instanceTimeStamp +Time of the last successful probe. +.TP +.B instanceErrorMessage +Error message associated with this instance if \fBinstanceState\fR is +\fBerror\fR. +.RE +.SH "SEE ALSO" +.BR tallyman (1). +.SH AUTHORS +Sergey Poznyakoff +.SH "BUG REPORTS" +Report bugs to <gray@gnu.org>. +.SH COPYRIGHT +Copyright \(co 2018 Sergey Poznyakoff +.br +.na +License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> +.br +.ad +This is free software: you are free to change and redistribute it. +There is NO WARRANTY, to the extent permitted by law. +.\" Local variables: +.\" eval: (add-hook 'write-file-hooks 'time-stamp) +.\" time-stamp-start: ".TH [A-Z_][A-Z0-9_.\\-]* [0-9] \"" +.\" time-stamp-format: "%:B %:d, %:y" +.\" time-stamp-end: "\"" +.\" time-stamp-line-limit: 20 +.\" end: + diff --git a/src/tallyman.1 b/src/tallyman.1 new file mode 100644 index 0000000..542ca24 --- /dev/null +++ b/src/tallyman.1 @@ -0,0 +1,86 @@ +.TH TALLYMAN 1 "July 18, 2018" "TALLYMAN" "Tallyman User Reference" +.SH NAME +tallyman \- health state collector for docker containers +.SH SYNOPSIS +.na +.nh +\fBtallyman\fR\ + [\fB\-d\fR]\ + [\fB\-h\fR \fINAME\fR]\ + [\fB\-s\fR \fIHOST:PORT\fR]\ + [\fB\-v\fR \fIJSON\fR]\ + [\fB\-\-connection\-timeout=\fISECONDS\fR]\ + [\fB\-\-debug\fR]\ + [\fB\-\-execution\-timeout=\fISECONDS\fR]\ + [\fB\-\-hostname=\fINAME\fR]\ + [\fB\-\-server=\fIHOST:PORT\fR]\ + [\fB\-\-value=\fIJSON\fR]\ + \fISRVID\fR\ + \fICOMMAND\fR\ + \fIARGS\fR... +.sp +\fBtallyman\fR\ + \fB\-?\fR | \fB\-\-help\fR +.ad +.hy +.SH DESCRIPTION +Runs \fICOMMAND\fR with \fIARGS\fR and sends its return code, standard +output and error to the remote data collector. Exits with the exit +status of \fICOMMAND\fR. +.PP +The program must be configured to run periodically via the +.B HEALTHCHECK CMD +statement in the +.BR Dockerfile . +.PP +The data collector program +.BR stevedore (8) +must be listening at \fIHOST:PORT\fR. See its manual for +details. Container default gateway is the default \fIHOST\fR. +Default port is 8990. +.SH OPTIONS +.TP +\fB\-d\fR, \fB\-\-debug\fR +Increase debug verbosity. +.TP +\fB\-h\fR, \fB\-\-hostname=\fINAME\fR +Set this server hostname. By default it is determined automatically. +.TP +\fB\-s\fR, \fB\-\-server=\fIHOST:PORT\fR +Address and port of the data collector. Default is \fIGW\fR:8990, +where \fIGW\fR is the default gateway of the container. +.TP +\fB\-v\fR, \fB\-\-value=\fIJSON\fR +Add \fIJSON\fR object to each report. +.TP +\fB\-\-connection\-timeout=\fISECONDS\fR +Set timeout for initial connection to the collector. Default is 5 seconds. +.TP +\fB\-\-execution\-timeout=\fISECONDS\fR +Set \fICOMMAND\fR execution timeout. Default is 5 seconds. +.TP +\fB\-?\fR, \fB\-\-help\fR +Display short help text. +.SH "SEE ALSO" +.BR stevedore (8). +.SH AUTHORS +Sergey Poznyakoff +.SH "BUG REPORTS" +Report bugs to <gray@gnu.org>. +.SH COPYRIGHT +Copyright \(co 2018 Sergey Poznyakoff +.br +.na +License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> +.br +.ad +This is free software: you are free to change and redistribute it. +There is NO WARRANTY, to the extent permitted by law. +.\" Local variables: +.\" eval: (add-hook 'write-file-hooks 'time-stamp) +.\" time-stamp-start: ".TH [A-Z_][A-Z0-9_.\\-]* [0-9] \"" +.\" time-stamp-format: "%:B %:d, %:y" +.\" time-stamp-end: "\"" +.\" time-stamp-line-limit: 20 +.\" end: + diff --git a/src/tallyman.c b/src/tallyman.c index b37ad8b..7181381 100644 --- a/src/tallyman.c +++ b/src/tallyman.c @@ -39,7 +39,7 @@ struct option longopts[] = { { "execution-timeout", required_argument, 0, OPT_EXECUTION_TIMEOUT }, { NULL } }; -static char shortopts[] = "?ds:h:v:"; +static char shortopts[] = "+?ds:h:v:"; void help(void) |