diff options
author | Sergey Poznyakoff <gray@gnu.org.ua> | 2014-10-30 16:58:00 +0200 |
---|---|---|
committer | Sergey Poznyakoff <gray@gnu.org.ua> | 2015-12-17 15:26:28 +0200 |
commit | 59d4374b24e9f9f077f2e2e973fa75f3c3d505e0 (patch) | |
tree | 0ac1fef3e04c45554b1f893fbc28a8fc2ba019ac /doc | |
parent | 56a02e741cd8d8b9dce27a79ae9bbcaf1713c4f7 (diff) | |
download | grecs-59d4374b24e9f9f077f2e2e973fa75f3c3d505e0.tar.gz grecs-59d4374b24e9f9f077f2e2e973fa75f3c3d505e0.tar.bz2 |
Finish wordsplit docs, improve tests
Diffstat (limited to 'doc')
-rw-r--r-- | doc/wordsplit.3 | 298 |
1 files changed, 286 insertions, 12 deletions
diff --git a/doc/wordsplit.3 b/doc/wordsplit.3 index 2f0cced..123bfb7 100644 --- a/doc/wordsplit.3 +++ b/doc/wordsplit.3 @@ -14,7 +14,7 @@ .\" You should have received a copy of the GNU General Public License .\" along with Grecs. If not, see <http://www.gnu.org/licenses/>. .\" -.TH WORDSPLIT 3 "October 28, 2014" "GRECS" "Grecs User Reference" +.TH WORDSPLIT 3 "October 30, 2014" "GRECS" "Grecs User Reference" .SH NAME wordsplit \- split string into words .SH SYNOPSIS @@ -39,8 +39,7 @@ wordsplit \- split string into words \fBvoid wordsplit_clearerr (wordsplit_t *\fIws\fB);\fR .SH DESCRIPTION The function \fBwordsplit\fR splits the string \fIs\fR into words -using a set of rules governed by \fIflags\fR and stores the result -in the memory location pointed to by \fIws\fR. Depending on +using a set of rules governed by \fIflags\fR. Depending on \fIflags\fR, the function performs the following: whitespace trimming, tilde expansion, variable expansion, quote removal, command substitution, and path expansion. On success, the function returns 0 @@ -96,11 +95,39 @@ not try to alter or deallocate it. The function .B wordsplit_clearerr clears the error condition associated with \fIws\fR. +.SH INCREMENTAL MODE +In incremental mode \fBwordsplit\fR parses one word per invocation. +It returns \fBWRDSF_OK\fR on success and \fBWRDSF_NOINPUT\fR when it +has processed entire input string. +.PP +This mode is enabled if the flag \fBWRDSF_INCREMENTAL\fR is set in +the \fIflags\fR argument. Subsequent calls to \fBwordsplit\fR must +have \fBNULL\fR as first argument. Each successful +call will return exactly one word in \fBws.ws_wordv[0]\fR. +.PP +An example usage: +.PP +.EX +wordsplit_t ws; +int rc; +flags = WRDSF_DEFFLAGS|WRDSF_INCREMENTAL; + +for (rc = wordsplit(s, &ws, flags); rc == WRDSF_OK; + rc = wordsplit(NULL, &ws, flags)) { + process(ws.ws_wordv[0]); +} + +if (rc != WRDSE_NOINPUT) + wordsplit_perror(&ws); + +wordsplit_free(&ws); +.EE .SH EXPANSION -The number of expansions performed on the input is controlled by -appropriate bits set in the \fIflags\fR argument. Whatever expansions -are enabled, they are always run in the same order as described in this -section. +Expansion is performed on the input after it has been split into +words. There are several kinds of expansion, which of them are +performed is controlled by appropriate bits set in the \fIflags\fR +argument. Whatever expansion kinds are enabled, they are always run +in the same order as described in this section. .SS Whitespace trimming Whitespace trimming removes any leading and trailing whitespace from the initial word array. It is enabled by the @@ -206,8 +233,153 @@ Otherwise, the value of \fIvariable\fR is substituted. If \fIvariable\fR is null or unset, nothing is substituted, otherwise the expansion of \fIword\fR is substituted. .SS Quote removal +Quote removal translates unquoted escape sequences into corresponding bytes. +An escape sequence is a backslash followed by one or more characters. By +default, each sequence \fB\\\fIC\fR appearing in unquoted words is +replaced with the character \fIC\fR. In doubly-quoted strings, two +backslash sequences are recognized: \fB\\\\\fR translates to a single +backslash, and \fB\\\(dq\fR translates to a double-quote. +.PP +Two flags are provided to modify this behavior. If +.I WRDSF_CESCAPES +flag is set, the following escape sequences are recognized: +.sp +.nf +.ta 8n 18n 42n +.ul + Sequence Expansion ASCII + \fB\\\\\fR \fB\\\fR 134 + \fB\\\(dq\fR \fB\(dq\fR 042 + \fB\\a\fR audible bell 007 + \fB\\b\fR backspace 010 + \fB\\f\fR form-feed 014 + \fB\\n\fR new line 012 + \fB\\r\fR charriage return 015 + \fB\\t\fR horizontal tabulation 011 + \fB\\v\fR vertical tabulation 013 +.fi +.sp +The sequence \fB\\x\fINN\fR or \fB\\X\fINN\fR, where \fINN\fR stands +for a two-digit hex number is replaced with ASCII character \fINN\fR. +The sequence \fB\\0\fINNN\fR, where \fINNN\fR stands for a three-digit +octal number is replaced with ASCII character whose code is \fINNN\fR. +.PP +The \fBWRDSF_ESCAPE\fR flag allows the caller to customize escape +sequences. If it is set, the \fBws_escape\fR member must be +initialized. This member provides escape tables for unquoted words +(\fBws_escape[0]\fR) and quoted strings (\fBws_escape[1]\fR). Each +table is a string consisting of even number of charactes. In each +pair of characters, the first one is a character that can appear after +backslash, and the following one is its translation. For example, the +above table of C escapes is represented as +\fB\(dqa\\ab\\bf\\fn\\nr\\rt\\tv\\v\(dq\fR. +.PP +It is valid to initialize \fBws_escape\fR elements to zero. In this +case, no backslash translation occurs. +.PP +The handling if octal and hex escapes is controlled by the following +bits in \fBws_options\fR: +.TP +.B WRDSO_BSKEEP_WORD +When an unrecognized escape sequence is encountered in a word, +preserve it on output. If that bit is not set, the backslash is +removed from such sequences. +.TP +.B WRDSO_OESC_WORD +Handle octal escapes in words. +.TP +.B WRDSO_XESC_WORD +Handle hex escapes in words. +.TP +.B WRDSO_BSKEEP_QUOTE +When an unrecognized escape sequence is encountered in a doubly-quoted +string, preserve it on output. If that bit is not set, the backslash is +removed from such sequences. +.TP +.B WRDSO_OESC_QUOTE +Handle octal escapes in doubly-quoted strings. +.TP +.B WRDSO_XESC_QUOTE +Handle hex escapes in doubly-quoted strings. .SS Command substitution -.SS Path expansion +During \fIcommand substitution\fR, each word is scanned for commands. +Each command found is executed and replaced by the output it creates. +.PP +The syntax is: +.PP +.RS +4 +.BI $( command ) +.RE +.PP +Command substitutions may be nested. +.PP +Unless the substitution appears within double quotes, word splitting and +pathname expansion are performed on its result. +.PP +To enable command substitution, the caller must initialize the +.I ws_command +member with the address of the substitution function and make sure the +.B WRDSF_NOCMD +flag is not set. +.PP +The substitution function should be defined as follows: +.PP +.RS +4 +\fBint \fIcommand\fB\ + (char **\fIret\fB,\ + const char *\fIcmd\fB,\ + size_t \fIlen,\fB\ + char **\fIargv\fB,\ + void *\fIclos\fB);\fR +.RE +.PP +First \fIlen\fR bytes of \fIcmd\fR contain the command invocation as +it appeared between +.BR $( and ), +with all expansions performed. If the +.I WRDSO_ARGV +option is set, the parameter \fIargv\fR contains the command line split into +words using the same settings as the input \fIws\fR structure. +Otherwise, \fIargv\fR is \fBNULL\fR. +.PP +The \fIclos\fR parameter supplies user-specific data, passed in the +\fIws_closure\fR member). +.PP +On success, the function stores a pointer to the +output string in the memory location pointed to by \fIret\fR and +returns \fBWRDSE_OK\fR (\fB0\fR). On error, it must return one of the +error codes described in the section +.BR "ERROR CODES" . +If +.BR WRDSE_USERERR , +is returned, a pointer to the error description string must be stored in +.BR *ret . +.PP +When \fBWRDSE_OK\fR or \fBWRDSE_USERERR\fR is returned, the +data stored in \fB*ret\fR must be allocated using +.BR malloc (3). +.SS Pathname expansion +Pathname expansion is performed if the \fBWRDSF_PATHEXPAND\fR flag is +set. Each unquoted word is scanned for characters +.BR * , ? ", and " [ . +If one of these appears, the word is considered a \fIpattern\fR (in +the sense of +.BR glob (3)) +and is replaced with an alphabetically sorted list of file names matching the +pattern. +.PP +If no matches are found for a word +and the \fIws_options\fR member has the +.B WRDSO_NULLGLOB +bit set, the word is removed. +.PP +If the \fBWRDSO_FAILGLOB\fR option is set, an error message is output +for each such word using +.IR ws_error . +.PP +When matching a pattern, the dot at the start of a name or immediately +following a slash must be matched explicitly, unless +the \fBWRDSO_DOTGLOB\fR option is set, .SH WORDSPLIT_T STRUCTURE The data type \fBwordsplit_t\fR has three members that contain output data upon return from \fBwordsplit\fR or \fBwordsplit_len\fR, @@ -264,8 +436,15 @@ If initialized on input, the .B WRDSF_COMMENT flag must be set. By default, it's value is \fB\(dq#\(dq\fR. .TP -.BI "const char *" ws_escape -Characters to be escaped with backslash. The +.BI "const char *" ws_escape [2] +Escape tables for unquoted words (\fBws_escape[0]\fR) and quoted +strings (\fBws_escape[1]\fR). These are used to translate escape +sequences (\fB\\\fIC\fR) into characters. Each table is a string +consisting of even number of charactes. In each pair of characters, +the first one is a character that can appear after backslash, and the +following one is its representation. For example, the string +\fB\(dqt\\tn\\n\(dq\fR translates \fB\\t\fR into horisontal +tabulation character and \fB\\n\fR into newline. .B WRDSF_ESCAPE flag must be set if this member is initialized. .TP @@ -367,7 +546,7 @@ flag must be set. const char *cmd,\ size_t len,\ char **argv,\ - void *clos) + void *clos)\fR Pointer to the function that performs command substitution. It treats the first \fIlen\fR bytes of the string \fIcmd\fR as a command (whatever it means for the caller) and attempts to execute it. On @@ -376,7 +555,7 @@ in the memory location pointed to by \fBret\fR and \fB0\fR is returned. On error, the function must return one of the error codes described in the section .BR "ERROR CODES" . -If \fIws_getvar\fR returns +If \fIws_command\fR returns .BR WRDSE_USERERR , it must store the pointer to the error description string in .BR *ret . @@ -555,7 +734,102 @@ for details. The .I ws_options member is initialized. +.SH OPTIONS +The +.I ws_options +member is consulted if the +.B WRDSF_OPTIONS +flag is set. It contains a bitwise \fBOR\fR of one or more of the +following options: +.TP +.B WRDSO_NULLGLOB +Remove the words that produce empty string after pathname expansion. +.TP +.B WRDSO_FAILGLOB +Output error message if pathname expansion produces empty string. +.TP +.B WRDSO_DOTGLOB +During pathname expansion allow a leading period to be matched by +metacharacters. +.TP +.B WRDSO_ARGV +Split command invocation into words and pass the result to the +\fIws_command\fR function in \fIargv\fR parameter. +.PP +.TP +.B WRDSO_BSKEEP_WORD +Quote removal: when an unrecognized escape sequence is encountered in a word, +preserve it on output. If that bit is not set, the backslash is +removed from such sequences. +.TP +.B WRDSO_OESC_WORD +Quote removal: handle octal escapes in words. +.TP +.B WRDSO_XESC_WORD +Quote removal: handle hex escapes in words. +.TP +.B WRDSO_BSKEEP_QUOTE +Quote removal: when an unrecognized escape sequence is encountered in +a doubly-quoted string, preserve it on output. If that bit is not +set, the backslash is removed from such sequences. +.TP +.B WRDSO_OESC_QUOTE +Quote removal: handle octal escapes in doubly-quoted strings. +.TP +.B WRDSO_XESC_QUOTE +Quote removal: handle hex escapes in doubly-quoted strings. +.SH "ERROR CODES" +.TP +.BR WRDSE_OK ", " WRDSE_EOF +Successful return. +.TP +.B WRDSE_QUOTE +Missing closing quote. The \fIws_endp\fR points to the position in +the input string where the error occurred. +.TP +.B WRDSE_NOSPACE +Memory exhausted. +.TP +.B WRDSE_USAGE +Invalid wordsplit usage. +.TP +.B WRDSE_CBRACE +Unbalanced curly brace. +.TP +.B WRDSE_UNDEF +Undefined variable. This error is returned only if the +\fBWRDSF_UNDEF\fR flag is set. +.TP +.B WRDSE_NOINPUT +Input exhausted. This is not acually an error. This code is returned +if \fBwordsplit\fR (or \fBwordsplit_len\fR) is invoked in incremental +mode and encounters end of input string. See the section +.BR "INCREMENTAL MODE" . +.TP +.B WRDSE_PAREN +Unbalanced parenthesis. +.TP +.B WRDSE_GLOBERR +An error occurred during pattern matching. +.TP +.B WRDSE_USERERR +User-defined error. Normally it is returned by \fBws_getvar\fR or +\fBws_command\fR. Use the function +.B wordsplit_strerror +to get textual description of the error. .SH "RETURN VALUE" +Both +.B wordsplit +and +.B wordsplit_len +return \fB0\fR on success, and a non-zero error code on +error (see the section +.BR "ERROR CODES" ). +.PP +.B wordsplit_strerror +returns a pointer to the constant string describing the last error +condition that occurred in +.IR ws . .SH EXAMPLE .SH "SEE ALSO" .SH AUTHORS |