aboutsummaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorSergey Poznyakoff <gray@gnu.org.ua>2014-10-28 15:40:20 +0200
committerSergey Poznyakoff <gray@gnu.org.ua>2015-12-17 15:26:28 +0200
commit56a02e741cd8d8b9dce27a79ae9bbcaf1713c4f7 (patch)
tree1153a7dd5c15ab80f1ffa7da4eb9091685222445 /doc
parent8383ec3a522a944969b3fc44069a3ff056da554a (diff)
downloadgrecs-56a02e741cd8d8b9dce27a79ae9bbcaf1713c4f7.tar.gz
grecs-56a02e741cd8d8b9dce27a79ae9bbcaf1713c4f7.tar.bz2
Improve wordsplit
* src/wordsplit.c: Implement default assignment, word expansion in variable defaults, distinction between ${variable:-word} and ${variable-word}. * doc/wordsplit.3: New file. * src/wordsplit.h (wordsplit)<ws_envbuf,ws_envidx> <ws_envsiz>: New members. (WRDSF_ARGV): Remove. (WRDSF_OPTIONS): New flag. (WRDSO_ARGV): New option bit. * tests/wordsplit.at: Add new tests. * tests/wsp.c: Set WRDSF_OPTIONS flag if one of the options is requested.
Diffstat (limited to 'doc')
-rw-r--r--doc/wordsplit.3585
1 files changed, 585 insertions, 0 deletions
diff --git a/doc/wordsplit.3 b/doc/wordsplit.3
new file mode 100644
index 0000000..2f0cced
--- /dev/null
+++ b/doc/wordsplit.3
@@ -0,0 +1,585 @@
+.\" This file is part of grecs -*- nroff -*-
+.\" Copyright (C) 2007, 2009-2014 Sergey Poznyakoff
+.\"
+.\" Grecs is free software; you can redistribute it and/or modify
+.\" it under the terms of the GNU General Public License as published by
+.\" the Free Software Foundation; either version 3, or (at your option)
+.\" any later version.
+.\"
+.\" Grecs is distributed in the hope that it will be useful,
+.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
+.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+.\" GNU General Public License for more details.
+.\"
+.\" You should have received a copy of the GNU General Public License
+.\" along with Grecs. If not, see <http://www.gnu.org/licenses/>.
+.\"
+.TH WORDSPLIT 3 "October 28, 2014" "GRECS" "Grecs User Reference"
+.SH NAME
+wordsplit \- split string into words
+.SH SYNOPSIS
+.B #include <wordsplit.h>
+.sp
+\fBint wordsplit (const char *\fIs\fB,\
+ wordsplit_t *\fIws\fB, int \fIflags\fB);\fR
+.sp
+\fBint wordsplit_len (const char *\fIs\fB,\
+ \fBsize_t \fIlen\fR,\
+ \fBwordsplit_t *\fIp\fB,\
+ int \fIflags\fB);
+.sp
+\fBvoid wordsplit_free (wordsplit_t *\fIp\fB);\fR
+.sp
+\fBvoid wordsplit_free_words (wordsplit_t *\fIws\fB);\fR
+.sp
+\fBvoid wordsplit_perror (wordsplit_t *\fIws\fB);\fR
+.sp
+\fBconst char *wordsplit_strerror (wordsplit_t *\fIws\fB);\fR
+.sp
+\fBvoid wordsplit_clearerr (wordsplit_t *\fIws\fB);\fR
+.SH DESCRIPTION
+The function \fBwordsplit\fR splits the string \fIs\fR into words
+using a set of rules governed by \fIflags\fR and stores the result
+in the memory location pointed to by \fIws\fR. Depending on
+\fIflags\fR, the function performs the following: whitespace trimming,
+tilde expansion, variable expansion, quote removal, command
+substitution, and path expansion. On success, the function returns 0
+and stores the words found in the member \fBws_wordv\fR and the number
+of words in the member \fBws_wordc\fR. On error, -1 is returned and
+error code is stored in \fBws_errno\fR.
+.PP
+The function \fBwordsplit_len\fR acts similarly, except that it
+accesses only first \fBlen\fR bytes of the string \fIs\fR, which is
+not required to be null-terminated.
+.PP
+When no longer needed, the resources allocated by a call to one of
+these functions must be freed using
+.BR wordsplit_free .
+.PP
+The function
+.B wordsplit_free_words
+frees only the memory allocated for elements of
+.I ws_wordv
+and initializes
+.I ws_wordc
+to zero.
+.PP
+The usual calling sequence is:
+.PP
+.EX
+wordsplit_t ws;
+int rc;
+
+if (wordsplit(s, &ws, WRDSF_DEFFLAGS)) {
+ wordsplit_perror(&ws);
+ return;
+}
+for (i = 0; i < ws.ws_wordc; i++) {
+ /* do something with ws.ws_wordv[i] */
+}
+wordsplit_free(&ws);
+.EE
+.PP
+The function
+.B wordsplit_perror
+prints error message from the last invocation of \fBwordsplit\fR. It
+uses the function pointed to by the
+.I ws_error
+member. By default, it outputs the message on the standard error.
+.PP
+For more sophisticated error reporting, the function
+.B wordsplit_strerror
+can be used. It returns a pointer to the string describing the error.
+The caller should treat this pointer as a constant string. It should
+not try to alter or deallocate it.
+.PP
+The function
+.B wordsplit_clearerr
+clears the error condition associated with \fIws\fR.
+.SH EXPANSION
+The number of expansions performed on the input is controlled by
+appropriate bits set in the \fIflags\fR argument. Whatever expansions
+are enabled, they are always run in the same order as described in this
+section.
+.SS Whitespace trimming
+Whitespace trimming removes any leading and trailing whitespace from
+the initial word array. It is enabled by the
+.B WRDSF_WS
+flag. Whitespace trimming is needed only if you redefine
+word delimiters (\fIws_delim\fR member) so that they don't contain
+whitespace characters (\fB\(dq \\t\\n\(dq\fR).
+.SS Tilde expansion
+Tilde expansion is enabled if the
+.B WRDSF_PATHEXPAND
+bit is set. It expands all words that begin with an unquoted tilde
+character (`\fB~\fR'). If tilde is followed immediately by a slash,
+it is replaced with the home directory of the current user (as
+determined by his \fBpasswd\fR entry). A tilde alone is handled the
+same way. Otherwise, the characters between the tilde and first slash
+character (or end of string, if it doesn't contain any) are treated as
+a login name. and are replaced (along with the tilde itself) with the
+home directory of that user. If there is no user with such login
+name, the word is left unchanged.
+.SS Variable expansion
+Variable expansion replaces each occurrence of
+.BI $ NAME
+or
+.BI ${ NAME }
+with the value of the variable \fINAME\fR. It is enabled if the
+flag \fBWRDSF_NOVAR\fR is not set. The caller is responsible for
+supplying the table of available variables. Two mechanisms are
+provided: environment array and a callback function.
+.PP
+Environment array is a \fBNULL\fR-terminated array of variables,
+stored in the \fIws_env\fR member. The \fBWRDSF_ENV\fR flag must be
+set in order to instruct \fBwordsplit\fR to use this array.
+.PP
+By default, elements of the \fIws_env\fR array have the form
+.IR NAME = VALUE .
+An alternative format is enabled by the
+.B WRDSF_ENV_KV
+flag. When it is set, each variable is described by two consecutive
+elements in the array:
+.IR ws_env [ n ]
+contains variable name, and
+.IR ws_env [ "n+1" ]
+contains its value.
+.PP
+More sophisticated variable tables can be implemented using
+callback function. The \fIws_getvar\fR member should be set to point
+to that function and \fBWRDSF_GETVAR\fR flag must be set. The
+function itself should be defined as
+.EX
+int getvar (char **ret, const char *var, size_t len, void *clos);
+.EE
+.PP
+The function should look up for the variable identified by the first
+\fIlen\fR bytes of the string \fIvar\fR. If such variable is found,
+th function stores a copy of its value (allocated using
+\fBmalloc\fR(3)) in the memory location pointed to by \fBret\fR, and
+returns \fBWRDSE_OK\fR. If the variable is not found, the function
+returns \fBWRDSE_UNDEF\fR. Otherwise, a non NULL error code is
+returned.
+.PP
+If \fIws_getvar\fR returns
+.BR WRDSE_USERERR ,
+it must store the pointer to the error description string in
+.BR *ret .
+In any case (whether returning \fB0\fR or \fBWRDSE_USERERR\fR) , the
+data returned in \fBret\fR must be allocated using
+.BR malloc (3).
+.PP
+If both
+.I ws_env
+and
+.I ws_getvar
+are used, the variable is first looked up in
+.IR ws_env ,
+and if not found there, the
+.I ws_getvar
+function is called.
+.PP
+During variable expansion, the forms below cause
+.B wordsplit
+to test for a variable that is unset or null. Omitting the
+colon results in a test only for a variable that is unset.
+.TP
+.BI ${ variable :- word }
+.BR "Use Default Values" .
+If \fIvariable\fR is unset or null, the expansion of \fIword\fR is substituted.
+Otherwise, the value of \fIvariable\fR is substituted.
+.TP
+.BI ${ variable := word }
+.BR "Assign Default Values" .
+If \fIvariable\fR is unset or null, the expansion of \fIword\fR is
+assigned to \fIvariable\fR. The value of \fIvariable\fR is then substituted.
+.TP
+.BI ${ variable :? word }
+.BR "Display Error if Null or Unset" .
+If \fIvariable\fR is null or unset, the expansion of \fIword\fR (or a
+message to that effect if word is not present) is output using
+.IR ws_error .
+Otherwise, the value of \fIvariable\fR is substituted.
+.TP
+.BI ${ variable :+ word }
+.BR "Use Alternate Value" .
+If \fIvariable\fR is null or unset, nothing is substituted, otherwise the
+expansion of \fIword\fR is substituted.
+.SS Quote removal
+.SS Command substitution
+.SS Path expansion
+.SH WORDSPLIT_T STRUCTURE
+The data type \fBwordsplit_t\fR has three members that contain
+output data upon return from \fBwordsplit\fR or \fBwordsplit_len\fR,
+and a number of members that the caller can initialize on input in
+order to customize the function behavior. Each its member has a
+corresponding flag bit, which must be set in the \fIflags\fR argument
+in order to instruct the \fBwordsplit\fR function to use it.
+.SS OUTPUT
+.TP
+.BI size_t " ws_wordc"
+Number of words in \fIws_wordv\fR. Accessible upon successful return
+from \fBwordsplit\fR.
+.TP
+.BI "char ** " ws_wordv
+Array of resulting words. Accessible upon successful return
+from \fBwordsplit\fR.
+.TP
+.BI "int " ws_errno
+Error code, if the invocation of \fBwordsplit\fR or
+\fBwordsplit_len\fR failed. This is the same value as returned from
+the function in that case.
+.SS INPUT
+.TP
+.BI "size_t " ws_offs
+If the
+.B WRDSF_DOOFFS
+flag is set, this member specifies the number of initial elements in
+.I ws_wordv
+to fill with NULLs. These elements are not counted in the returned
+.IR ws_wordc .
+.TP
+.BI "int " ws_flags
+Contains flags passed to wordsplit on entry. Can be used as a
+read-only member when using \fBwordsplit\fR in incremental mode or
+in a loop with
+.B WRDSF_REUSE
+flag set.
+.TP
+.BI "int " ws_options
+Additional options used when
+.B WRDSF_OPTIONS
+is set.
+.TP
+.BI "const char *" ws_delim
+Word delimiters. If initialized on input, the
+.B WRDSF_DELIM
+flag must be set. Otherwise, it is initialized on entry to
+.B wordsplit
+with the string \fB\(dq \\t\\n\(dq\fR.
+.TP
+.BI "const char *" ws_comment
+A zero-terminated string of characters that begin an inline comment.
+If initialized on input, the
+.B WRDSF_COMMENT
+flag must be set. By default, it's value is \fB\(dq#\(dq\fR.
+.TP
+.BI "const char *" ws_escape
+Characters to be escaped with backslash. The
+.B WRDSF_ESCAPE
+flag must be set if this member is initialized.
+.TP
+.BI "void (*" ws_alloc_die ") (wordsplit_t *)"
+This function is called when
+.B wordsplit
+is unable to allocate memory and the
+.B WRDSF_ENOMEMABRT
+flag was set. The default function prints a
+message on standard error and aborts. This member can be used
+to customize error handling. If initialized, the
+.B WRDSF_ALLOC_DIE
+flag must be set.
+.TP
+.BI "void (*" ws_error ") (const char *, ...)"
+Pointer to function used for error reporting. The invocation
+convention is the same as for
+.BR printf (3).
+The default function formats and prints the message on the standard
+error.
+
+If this member is initialized, the
+.B WRDSF_ERROR
+flag must be set.
+.TP
+.BI "void (*" ws_debug ") (const char *, ...)"
+Pointer to function used for debugging output. By default it points
+to the same function as
+.BR ws_error .
+If initialized, the
+.B WRDSF_DEBUG
+flag must be set.
+.TP
+.BR "const char **" ws_env
+A \fBNULL\fR-terminated array of environment variables. It is used
+during variable expansion. If set, the
+.B WRDSF_ENV
+flag must be set. Variable expansion is enabled only if either
+.B WRDSF_ENV
+or
+.B WRDSF_GETVAR
+(see below) is set, and
+.B WRDSF_NOVAR
+flag is not set.
+
+Each element of
+.I ws_env
+must have the form \fB\(dq\fINAME\fB=\fIVALUE\fR, where \fINAME\fR is
+the name of the variable, and \fIVALUE\fR is its value.
+Alternatively, if the \fBWRDSF_ENV_KV\fR flag is set, each variable is
+described by two elements of
+.IR ws_env : one containing variable name, and the next one with its
+value.
+.TP
+.BI "int (*" ws_getvar ") (char **ret, const char *var, size_t len, void *clos)"
+Points to the function that will be used during variable expansion to
+look up for the value of the environment variable named \fBvar\fR.
+This function is used if the variable expansion is enabled (i.e. the
+.B WRDSF_NOVAR
+flag is not set), and the \fBWRDSF_GETVAR\fR flag is set.
+
+If both
+.B WRDSF_ENV
+and
+.B WRDSF_GETVAR
+are set, the variable is first looked up in the
+.I ws_env
+array and, if not found there,
+.I ws_getvar
+is called.
+
+The name of the variable is specified by the first \fIlen\fR bytes of
+the string \fIvar\fR. The \fIclos\fR parameter supplies the
+user-specific data (see below the description of \fIws_closure\fR
+member) and the \fBret\fR parameter points to the memory location used
+for output data. On success, the function must store ther a pointer
+to the string with the value of the variable and return 0. On error,
+it must return one of the error codes described in the section
+.BR "ERROR CODES" .
+If \fIws_getvar\fR returns
+.BR WRDSE_USERERR ,
+it must store the pointer to the error description string in
+.BR *ret .
+In any case (whether returning \fB0\fR or \fBWRDSE_USERERR\fR) , the
+data returned in \fBret\fR must be allocated using
+.BR malloc (3).
+.TP
+.BI "void *" ws_closure
+Additional user-specific data passed as the last argument to
+.I ws_getvar
+or
+.I ws_command
+(see below). If defined, the
+.B WRDSF_CLOSURE
+flag must be set.
+.TP
+\fBint (*\fIws_command\fB)\
+ (char **ret,\
+ const char *cmd,\
+ size_t len,\
+ char **argv,\
+ void *clos)
+Pointer to the function that performs command substitution. It treats
+the first \fIlen\fR bytes of the string \fIcmd\fR as a command
+(whatever it means for the caller) and attempts to execute it. On
+success, a pointer to the string with the command output is stored
+in the memory location pointed to by \fBret\fR and \fB0\fR is
+returned. On error,
+the function must return one of the error codes described in the section
+.BR "ERROR CODES" .
+If \fIws_getvar\fR returns
+.BR WRDSE_USERERR ,
+it must store the pointer to the error description string in
+.BR *ret .
+In any case (whether returning \fB0\fR or \fBWRDSE_USERERR\fR) , the
+data returned in \fBret\fR must be allocated using
+.BR malloc (3).
+
+If the
+.I WRDSO_ARGV
+option is set, the parameter \fBargv\fR contains the command split into
+words using the same settings as the input \fIws\fR structure, with
+command substitution disabled.
+
+The \fIclos\fR parameter supplies user-specific data (see the
+description of \fIws_closure\fR member).
+.SH FLAGS
+The following macros are defined for use in the \fBflags\fR argument.
+.TP
+.B WRDSF_DEFFLAGS
+Default flags. This is a shortcut for:
+
+\fB(WRDSF_NOVAR |\
+ WRDSF_NOCMD |\
+ WRDSF_QUOTE |\
+ WRDSF_SQUEEZE_DELIMS |\
+ WRDSF_CESCAPES)\fR,
+
+i.e.: disable variable expansion and quote substituton, perform quote
+removal, treat any number of consequtive delimiters as a single
+delimiter, replace \fBC\fR escapes appearing in the input string with
+the corresponding characters.
+.TP
+.B WRDSF_APPEND
+Append the words found to the array resulting from a previous call to
+\fBwordsplit\fR.
+.TP
+.B WRDSF_DOOFFS
+Insert
+.I ws_offs
+initial
+.BR NULL s
+in the array
+.IR ws_wordv .
+These are not counted in the returned
+.IR ws_wordc .
+.TP
+.B WRDSF_NOCMD
+Don't do command substitution.
+.TP
+.B WRDSF_REUSE
+The parameter \fIws\fR resulted from a previous call to
+\fBwordsplit\fR, and \fBwordsplit_free\fR was not called. Reuse the
+allocated storage.
+.TP
+.B WRDSF_SHOWERR
+Print errors using
+.BR ws_error .
+.TP
+.B WRDSF_UNDEF
+Consider it an error if an undefined variable is expanded.
+.TP
+.B WRDSF_NOVAR
+Don't do variable expansion.
+.TP
+.B WRDSF_ENOMEMABRT
+Abort on
+.B ENOMEM
+error. By default, out of memory errors are treated as any other
+errors: the error is reported using \fIws_error\fR if the
+.B WRDSF_SHOWERR
+flag is set, and error code is returned. If this flag is set, the
+.B ws_alloc_die
+function is called instead. This function is not supposed to return.
+.TP
+.B WRDSF_WS
+Trim off any leading and trailind whitespace from the returned
+words. This flag is useful if the \fIws_delim\fR member does not
+contain whitespace characters.
+.TP
+.B WRDSF_SQUOTE
+Handle single quotes.
+.TP
+.B WRDSF_DQUOTE
+Handle double quotes.
+.TP
+.B WRDSF_QUOTE
+A shortcut for \fB(WRDSF_SQUOTE|WRDSF_DQUOTE)\fR.
+.TP
+.B WRDSF_SQUEEZE_DELIMS
+Replace each input sequence of repeated delimiters with a single
+delimiter.
+.TP
+.B WRDSF_RETURN_DELIMS
+Return delimiters.
+.TP
+.B WRDSF_SED_EXPR
+Treat
+.BR sed (1) expressions as words.
+.TP
+.B WRDSF_DELIM
+.I ws_delim
+member is initialized.
+.TP
+.B WRDSF_COMMENT
+.I ws_comment
+member is initialized.
+.TP
+.B WRDSF_ALLOC_DIE
+.I ws_alloc_die
+member is initialized.
+.TP
+.B WRDSF_ERROR
+.I ws_error
+member is initialized.
+.TP
+.B WRDSF_DEBUG
+.I ws_debug
+member is initialized.
+.TP
+.B WRDSF_ENV
+.I ws_env
+member is initialized.
+.TP
+.B WRDSF_GETVAR
+.I ws_getvar member is initialized.
+.TP
+.B WRDSF_SHOWDBG
+Enable debugging.
+.TP
+.B WRDSF_NOSPLIT
+Don't split input into words. This flag is is useful for side
+effects, e.g. to perform variable expansion within a string.
+.TP
+.B WRDSF_KEEPUNDEF
+Keep undefined variables in place, instead of expanding them to
+empty strings.
+.TP
+.B WRDSF_WARNUNDEF
+Warn about undefined variables.
+.TP
+.B WRDSF_CESCAPES
+Handle \fBC\fR-style escapes in the input string.
+.TP
+.B WRDSF_CLOSURE
+.I ws_closure
+is set.
+.TP
+.B WRDSF_ENV_KV
+Each two consecutive elements in the
+.I ws_env
+array describe a single variable:
+.IR ws_env [ n ]
+contains variable name, and
+.IR ws_env [ "n+1" ]
+contains its value.
+.TP
+.B WRDSF_ESCAPE 0x10000000
+.I ws_escape
+is set.
+.TP
+.B WRDSF_INCREMENTAL
+Incremental mode. Each subsequent call to \fBwordsplit\fR with
+\fBNULL\fR as its first argument parses the next word from the input.
+See the section
+.B INCREMENTAL MODE
+for a detailed discussion.
+.TP
+.B WRDSF_PATHEXPAND
+Perform pathname and tilde expansion. If this flag is set, the
+\fIws_options\fR member must also be initialized. See the
+subsection
+.B "Pathname expansion"
+for details.
+.TP
+.B WRDSF_OPTIONS
+The
+.I ws_options
+member is initialized.
+.SH "RETURN VALUE"
+.SH EXAMPLE
+.SH "SEE ALSO"
+.SH AUTHORS
+Sergey Poznyakoff
+.SH "BUG REPORTS"
+Report bugs to <gray+grecs@gnu.org.ua>.
+.SH COLOPHON
+The \fBGrecs\fR library is constantly changing, so this manual page
+may be incorrect or out-of-date. For the latest copy of \fBGrecs\fR
+documentation, visit <http://www.gnu.org.ua/software/grecs>.
+.SH COPYRIGHT
+Copyright \(co 2011 Sergey Poznyakoff
+.br
+.na
+License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
+.br
+.ad
+This is free software: you are free to change and redistribute it.
+There is NO WARRANTY, to the extent permitted by law.
+.\" Local variables:
+.\" eval: (add-hook 'write-file-hooks 'time-stamp)
+.\" time-stamp-start: ".TH [A-Z_][A-Z0-9_]* [0-9] \""
+.\" time-stamp-format: "%:B %:d, %:y"
+.\" time-stamp-end: "\""
+.\" time-stamp-line-limit: 20
+.\" end:
+

Return to:

Send suggestions and report system problems to the System administrator.