aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--README298
-rwxr-xr-xbootstrap162
-rw-r--r--wordsplit.3332
3 files changed, 533 insertions, 259 deletions
diff --git a/README b/README
new file mode 100644
index 0000000..7383739
--- /dev/null
+++ b/README
@@ -0,0 +1,298 @@
+* Overview
+
+This package provides a set of C functions for splitting a string into
+words. The splitting process is highly configurable and allows for
+considerable flexibility. The default splitting rules are similar to
+those used in Bourne shell. The splitting process includes tilde
+expansion, variable expansion, quote removal, command substitution,
+and path expansion. Each of these phases can be turned off by the caller.
+
+The following code fragment shows the basic usage:
+
+ /* This variable controls the splitting */
+ wordsplit_t ws;
+ int rc;
+
+ /* Provide variable definitions */
+ ws.ws_env = (const char **) environ;
+ /* Provide a function for expanding commands */
+ ws.ws_command = runcom;
+ /* Split input_string into words */
+ rc = wordsplit(input_string, &ws,
+ WRDSF_QUOTE /* Handle both single and
+ double quoted strings as words. */
+ | WRDSF_SQUEEZE_DELIMS /* Compress adjacent delimiters */
+ | WRDSF_PATHEXPAND /* Expand pathnames */
+ | WRDSF_SHOWERR); /* Show errors */
+ if (rc == 0) {
+ /* Success. The resulting words are returned in the NULL-terminated
+ array ws.ws_wordv. Number of words is in ws.ws_wordc */
+ }
+ /* Reclaim the allocated memory */
+ wordsplit_free(&ws);
+
+For a detailed discussion, please see the man page wordsplit.3 inluded
+in the package.
+
+* Description
+
+The package is designed as a drop-in facility for use in larger
+programs. It consists of the following files:
+
+ wordsplit.h - Interface header.
+ wordsplit.c - Main source file.
+ wordsplit.3 - Documentation.
+
+For most uses, you will need only these three. The rest of files
+are for building the autotest-based testsuite:
+
+ wsp.c - Auxiliary test program.
+ wordsplit.at - The source for the testsuite.
+
+* Incorporating wordsplit into your project
+
+The project is designed to be used as a git submodule. First, select
+the location DIR for the wordsplit directory within your project. Then
+add the submodule:
+
+ git submodule add git://git.gnu.org.ua/wordsplit.git DIR
+
+The rest is quite straightforward: you need to add wordsplit.c to your
+sources and add both wordsplit.c and wordsplit.h to the distributed files.
+
+There are two methods of doing so: direct incorporation and
+incorporation via VPATH. The discussion below will describe both
+methods based on the assumption that your project is using GNU
+autotools framework. If you are using plain makefiles, these
+instructions are easy to convert to such use as well.
+
+** Direct incorporation
+
+Add the subdir-objects option to the invocation of AM_INIT_AUTOMAKE macro in
+your configure.ac:
+
+ AM_INIT_AUTOMAKE([subdir-objects])
+
+In your Makefile.am, add both wordsplit/wordsplit.c and wordsplit/wordsplit.h
+to the sources and -Iwordsplit to the cpp flags. For example:
+
+ program_SOURCES = main.c \
+ wordsplit/wordsplit.c \
+ wordsplit/wordsplit.h
+ AM_CPPFLAGS = -I$(srcdir)/wordsplit
+
+You can also put wordsplit.h in the noinst_HEADERS variable, if you like:
+
+ program_SOURCES = main.c \
+ wordsplit/wordsplit.c
+ noinst_HEADERS = wordsplit/wordsplit.h
+ AM_CPPFLAGS = -I$(srcdir)/wordsplit
+
+If you are building an installable library and wish to make wordsplit functions
+available, install wordsplit.h to $(pkgincludedir), e.g.
+
+ lib_LTLIBRARIES = libmy.la
+ libmy_la_SOURCES = main.c \
+ wordsplit/wordsplit.c
+ AM_CPPFLAGS = -I$(srcdir)/wordsplit
+ pkginclude_HEADERS = wordsplit/wordsplit.h
+
+** Vpath-based incorporation
+
+Modify the VPATH variable in your Makefile.am:
+
+ VPATH += $(srcdir)/wordsplit
+
+Notice the use of "+=": it is necessary for the vpath builds to work.
+
+Add wordsplit.o to the name_LIBADD or name_LDADD variable, depending on
+the nature of the object being built.
+
+Modify AM_CPPFLAGS as shown in the previous section:
+
+ AM_CPPFLAGS = -I$(srcdir)/wordsplit
+
+Add both wordsplit/wordsplit.c and wordsplit/wordsplit.h to the EXTRA_DIST
+variable.
+
+An example Makefile.am:
+
+ program_SOURCES = main.c
+ LDADD = wordsplit.o
+ noinst_HEADERS = wordsplit/wordsplit.h
+ VPATH += $(srcdir)/wordsplit
+ EXTRA_DIST = wordsplit/wordsplit.c wordsplit/wordsplit.h
+
+* The testsuite
+
+The package contains two files for building the testsuite: wsp.c,
+which is used to build the auxiliary binary wsp, and wordsplit.at,
+which is translated by GNU autotest into a testsuite shell script.
+
+The discussion below is for those who wish to include wordsplit
+testsuite into their project. It assumes the following layout of the
+hosting project:
+
+ lib/
+ Directory holding the library that incorporates wordsplit.o.
+ This discussion assumes the library name is libmy.a
+ lib/wordsplit
+ Wordsplit sources.
+
+The testsuite will be built in lib.
+
+** Additional files
+
+Three additional files are necessary for the testsuite: atlocal.in,
+wordsplit-version.h, and package.m4.
+
+The file atlocal.in is a simple shell script that sets the PATH
+environment variable for the testsuite. It contains just one line:
+
+ PATH=$srcdir/wordsplit:$PATH
+
+The file wordsplit-version.h provides the version definition for the
+test program wsp.c. Use the following script to create it:
+
+ version=$(cd wordsplit; git describe)
+ cat > wordsplit-version.h <<EOF
+ #define WORDSPLIT_VERSION "$version"
+ EOF
+
+The file package.m4 contains package description which allows
+testsuite to generate an accurate report. To create it, use:
+
+ cat > package.m4 <<EOF
+ m4_define([AT_PACKAGE_NAME], [wordsplit])
+ m4_define([AT_PACKAGE_TARNAME], [wordsplit])
+ m4_define([AT_PACKAGE_VERSION], [$version])
+ m4_define([AT_PACKAGE_STRING], [AT_PACKAGE_NAME AT_PACKAGE_VERSION])
+ m4_define([AT_PACKAGE_BUGREPORT], [gray@gnu.org])
+ EOF
+
+Here, $version is the same variable you used for wordsplit-version.h.
+
+After creating the three files, list them in the EXTRA_DIST variable in
+lib/Makefile.am to make sure they will be distributed with the tarball.
+
+** configure.ac
+
+Add the following lines to your configure.ac:
+
+ AM_MISSING_PROG([AUTOM4TE], [autom4te])
+
+ AC_CONFIG_TESTDIR([lib])
+ AC_CONFIG_FILES([lib/Makefile lib/atlocal])
+
+** lib/Makefile.am
+
+The makefile in lib must be modified to build the auxiliary program
+wsp and create the testsuite script. This is done by the following
+fragment:
+
+ EXTRA_DIST = testsuite wordsplit/wordsplit.at package.m4
+ DISTCLEANFILES = atconfig
+ MAINTAINERCLEANFILES = Makefile.in $(TESTSUITE)
+
+ TESTSUITE = $(srcdir)/testsuite
+ M4=m4
+ AUTOTEST = $(AUTOM4TE) --language=autotest
+ $(TESTSUITE): src/wordsplit.at
+ $(AM_V_GEN)$(AUTOTEST) -I $(srcdir) wordsplit/wordsplit.at \
+ -o $(TESTSUITE).tmp
+ $(AM_V_at)mv $(TESTSUITE).tmp $(TESTSUITE)
+
+ noinst_PROGRAMS = wsp
+ wsp_SOURCES = wordsplit/wsp.c wordsplit-version.h
+ wsp_LDADD = ./libmy.a
+
+ atconfig: $(top_builddir)/config.status
+ cd $(top_builddir) && ./config.status $@
+
+ clean-local:
+ @test ! -f $(TESTSUITE) || $(SHELL) $(TESTSUITE) --clean
+
+ check-local: atconfig atlocal $(TESTSUITE)
+ @$(SHELL) $(TESTSUITE)
+
+* History
+
+First version of wordsplit appeared in March 2009 as a part of the
+Wydawca[1] project. Its main usage there was to assist in
+configuration file parsing. The parser subsystem proved to be quite
+useful and it soon forked into a separate project - Grecs[2]. This
+package had been since used (as a git submodule) in a number of other
+projects, such as GNU Dico[3] and Direvent[4], to name a few.
+
+In 2010 the wordsplit sources were incorporated to the GNU
+Mailutils[5] package, where they replaced the obsolete argcv module.
+Mailutils uses its own configuration package, which meant that using
+Grecs was not expedient. Therefore the sources had been exported from
+Grecs and are kept in sync with the changes in it.
+
+Several other projects, such as GNU Rush[6] and fileserv[7], followed
+the suite. It was therefore decided that it would be advisable to
+have wordsplit as a separate package which could be easily included in
+another project without incurring unnecessary overhead.
+
+Currently the work is underway on incorporating it into existing
+projects.
+
+* References
+
+[1] Wydawca - an automatic release submission daemon
+ Home: <http://puszcza.gnu.org.ua/software/wydawca>
+ Git: <http://git.gnu.org.ua/cgit/wydawca.git>
+[2] Grecs - a library for parsing structured configuration files
+ Home: <https://puszcza.gnu.org.ua/projects/grecs>
+ Git: <http://git.gnu.org.ua/cgit/grecs.git>
+[3] GNU Dico - a dictionary server
+ Home: <https://puszcza.gnu.org.ua/projects/dico>
+ Git: <http://git.gnu.org.ua/cgit/dico.git>
+[4] GNU Direvent - filesystem event watching daemon
+ Home: <http://puszcza.gnu.org.ua/software/direvent>
+ Git: <http://git.gnu.org.ua/cgit/direvent.git>
+[5] GNU Mailutils - a general-purpose mail package
+ Home: <http://mailutils.org>
+ Git: <http://git.savannah.gnu.org/cgit/mailutils.git>
+[6] GNU Rush - a restricted user shell for remote access
+ Home: <http://puszcza.gnu.org.ua/software/rush>
+ Git: <http://git.gnu.org.ua/cgit/rush.git>
+[7] fileserv - simple http server for serving static files
+ Home: <https://puszcza.gnu.org.ua/projects/fileserv>
+ Git: <http://git.gnu.org.ua/cgit/fileserv.git>
+[8] vmod-dbrw - Database-driven rewrite rules for Varnish Cache
+ Home: <http://puszcza.gnu.org.ua/software/vmod-dbrw>
+ Git: <http://git.gnu.org.ua/cgit/vmod-dbrw.git>
+
+* Bug reporting
+
+Please send bug reports, questions, suggestions and criticism to
+<gray@gnu.org>. When sending bug reports, please make sure to provide
+the following information:
+
+ 1. Wordsplit invocation flags.
+ 2. Input string.
+ 3. Produced output.
+ 4. Expected output.
+
+* Copying
+
+Copyright (C) 2009-2019 Sergey Poznyakoff
+
+Permission is granted to anyone to make or distribute verbatim copies
+of this document as received, in any medium, provided that the
+copyright notice and this permission notice are preserved,
+thus giving the recipient permission to redistribute in turn.
+
+Permission is granted to distribute modified versions
+of this document, or of portions of it, under the above conditions,
+provided also that they carry prominent notices stating who last
+changed them.
+
+Local Variables:
+mode: outline
+paragraph-separate: "[ ]*$"
+version-control: never
+End:
+
diff --git a/bootstrap b/bootstrap
deleted file mode 100755
index 4cc4178..0000000
--- a/bootstrap
+++ /dev/null
@@ -1,162 +0,0 @@
-#! /bin/sh
-cd $(dirname $0)
-version=$(git describe)
-
-function genfiles() {
- cat > wordsplit-version.h <<EOF
-#define WORDSPLIT_VERSION "$version"
-EOF
- cat > package.m4 <<EOF
-m4_define([AT_PACKAGE_NAME], [wordsplit])
-m4_define([AT_PACKAGE_TARNAME], [wordsplit])
-m4_define([AT_PACKAGE_VERSION], [$version])
-m4_define([AT_PACKAGE_STRING], [AT_PACKAGE_NAME AT_PACKAGE_VERSION])
-m4_define([AT_PACKAGE_BUGREPORT], [gray@gnu.org])
-EOF
-}
-
-function mk_atlocal() {
- cat <<\EOF
-# @configure_input@ -*- shell-script -*-
-# Configurable variable values for wordsplit test suite.
-# Copyright (C) 2016-2019 Sergey Poznyakoff
-
-PATH=@abs_builddir@:$srcdir:$PATH
-EOF
-} > atlocal.in
-
-
-function mk_testsuite() {
- sed -e 's|MODDIR|$moddir|' <<\EOF
-# ##################
-# Testsuite
-# ##################
-EXTRA_DIST = testsuite wordsplit.at package.m4
-DISTCLEANFILES = atconfig
-MAINTAINERCLEANFILES = Makefile.in $(TESTSUITE)
-
-TESTSUITE = $(srcdir)/testsuite
-M4=m4
-AUTOTEST = $(AUTOM4TE) --language=autotest
-$(TESTSUITE): wordsplit.at
- $(AM_V_GEN)$(AUTOTEST) -I $(srcdir) wordsplit.at -o $(TESTSUITE).tmp
- $(AM_V_at)mv $(TESTSUITE).tmp $(TESTSUITE)
-
-atconfig: $(top_builddir)/config.status
- cd $(top_builddir) && ./config.status MODDIR/$@
-
-clean-local:
- @test ! -f $(TESTSUITE) || $(SHELL) $(TESTSUITE) --clean
-
-check-local: atconfig atlocal $(TESTSUITE)
- @$(SHELL) $(TESTSUITE)
-
-noinst_PROGRAMS = wsp
-wsp_SOURCES = wsp.c wordsplit-version.h
-EOF
- echo "wsp_LDADD = $1"
-}
-
-function common_notice() {
- cat <<EOF
-Add the following to your configure.ac:
-
- AC_CONFIG_TESTDIR($moddir)
- AC_CONFIG_FILES([$moddir/Makefile $moddir/atlocal])
-
-EOF
-}
-
-function mk_installed() {
- (cat <<EOF
-lib_LTLIBRARIES = libwordsplit.la
-libwordsplit_la_SOURCES = wordsplit.c
-include_HEADERS = wordsplit.h
-
-EOF
- mk_testsuite ./libwordsplit.la) > Makefile.am
- mk_atlocal
- common_notice
-}
-
-function mk_shared() {
- (cat <<EOF
-noinst_LTLIBRARIES = libwordsplit.la
-libwordsplit_la_SOURCES = wordsplit.c wordsplit.h
-
-EOF
- mk_testsuite ./libwordsplit.la) > Makefile.am
- mk_atlocal
- common_notice
-}
-
-function mk_static() {
- (cat <<EOF
-noinst_LIBRARIES = libwordsplit.a
-libwordsplit_a_SOURCES = wordsplit.c wordsplit.h
-
-EOF
- mk_testsuite ./libwordsplit.a) > Makefile.am
- mk_atlocal
- common_notice
-}
-
-function mk_embedded() {
- (mk_testsuite wordsplit.o
- echo "AM_CPPFLAGS = "
- )> Makefile.am
- mk_atlocal
- cat <<EOF
-Add the following to the _SOURCES variable of your top-level Makefile.am:
-
- wordsplit/wordsplit.c\\
- wordsplit/wordsplit.h
-
-If test framework is enabled, add also the line
-
- SUBDIRS = . wordsplit
-
-and the following lines to your configure.ac:
-
- AC_CONFIG_TESTDIR($moddir)
- AC_CONFIG_FILES([$moddir/Makefile $moddir/atlocal])
-
-Replace ellipsis with the leading path components to the embedded wordsplit
-sources.
-EOF
-}
-
-function usage() {
- cat <<EOF
-usage: $0 MODE MODDIR
-
-MODE is any of:
- installed standalone installable library
- shared shared convenience library (lt)
- static static convenience library
- embedded embedded into another library
-
-EOF
-}
-#
-if [ $# -ne 2 ]; then
- usage >&2
- exit 1
-fi
-
-moddir=$2
-
-case $1 in
- installed|shared|static|standalone|embedded)
- genfiles
- mk_$1
- ;;
- clean)
- rm -f Makefile.am package.m4 wordsplit-version.h atlocal.in
- ;;
- *)
- usage
- ;;
-esac
-
-
diff --git a/wordsplit.3 b/wordsplit.3
index 400c2ee..139c73e 100644
--- a/wordsplit.3
+++ b/wordsplit.3
@@ -14,7 +14,7 @@
.\" You should have received a copy of the GNU General Public License
.\" along with wordsplit. If not, see <http://www.gnu.org/licenses/>.
.\"
-.TH WORDSPLIT 3 "July 7, 2019" "WORDSPLIT" "Wordsplit User Reference"
+.TH WORDSPLIT 3 "July 9, 2019" "WORDSPLIT" "Wordsplit User Reference"
.SH NAME
wordsplit \- split string into words
.SH SYNOPSIS
@@ -62,7 +62,10 @@ The function
.B wordsplit_free_words
frees only the memory allocated for elements of
.I ws_wordv
-and initializes
+after which it resets
+.I ws_wordv to
+.B NULL
+and
.I ws_wordc
to zero.
.PP
@@ -73,15 +76,17 @@ wordsplit_t ws;
int rc;
if (wordsplit(s, &ws, WRDSF_DEFFLAGS)) {
- wordsplit_perror(&ws);
- return;
-}
-for (i = 0; i < ws.ws_wordc; i++) {
- /* do something with ws.ws_wordv[i] */
+ for (i = 0; i < ws.ws_wordc; i++) {
+ /* do something with ws.ws_wordv[i] */
+ }
}
wordsplit_free(&ws);
.EE
.PP
+Notice, that \fBwordsplit_free\fR must be called after each invocation
+of \fBwordsplit\fR or \fBwordsplit_len\fR, even if it resulted in
+error.
+.PP
The function
.B wordsplit_getwords
returns in \fIwordv\fR an array of words, and in \fIwordc\fR the number
@@ -135,49 +140,37 @@ wordsplit_free(&ws);
.EE
.SH OPTIONS
The number of flags is limited to 32 (the width of \fBuint32_t\fR data
-type) and each bit is occupied by a corresponding flag. However, the
-number of features \fBwordsplit\fR provides required still
-more. Additional features can be requested by setting a corresponding
-\fIoption bit\fR in the \fBws_option\fR field of the \fBstruct
-wordsplit\fR argument. To inform wordsplit functions that this field
-is initialized the \fBWRDSF_OPTIONS\fR flag must be set.
+type). By the time of this writing each bit is already occupied by a
+corresponding flag. However, the number of features \fBwordsplit\fR
+provides requires still more. Additional features can be requested by
+setting a corresponding \fIoption bit\fR in the \fBws_option\fR field
+of the \fBstruct wordsplit\fR argument. To inform wordsplit functions
+that this field is initialized the \fBWRDSF_OPTIONS\fR flag must be set.
.PP
Option symbolic names begin with \fBWRDSO_\fR. They are discussed in
detail in the subsequent chapters.
.SH EXPANSION
Expansion is performed on the input after it has been split into
-words. There are several kinds of expansion, which of them are
-performed is controlled by appropriate bits set in the \fIflags\fR
-argument. Whatever expansion kinds are enabled, they are always run
-in the same order as described in this section.
+words. The kinds of expansion to be performed are controlled by the
+appropriate bits set in the \fIflags\fR argument. Whatever expansion
+kinds are enabled, they are always run in the order described in this
+section.
.SS Whitespace trimming
Whitespace trimming removes any leading and trailing whitespace from
the initial word array. It is enabled by the
.B WRDSF_WS
-flag. Whitespace trimming is needed only if you redefine
-word delimiters (\fIws_delim\fR member) so that they don't contain
-whitespace characters (\fB\(dq \\t\\n\(dq\fR).
-.SS Tilde expansion
-Tilde expansion is enabled if the
-.B WRDSF_PATHEXPAND
-bit is set. It expands all words that begin with an unquoted tilde
-character (`\fB~\fR'). If tilde is followed immediately by a slash,
-it is replaced with the home directory of the current user (as
-determined by his \fBpasswd\fR entry). A tilde alone is handled the
-same way. Otherwise, the characters between the tilde and first slash
-character (or end of string, if it doesn't contain any) are treated as
-a login name. and are replaced (along with the tilde itself) with the
-home directory of that user. If there is no user with such login
-name, the word is left unchanged.
+flag. Whitespace trimming is enabled automatically if the word
+delimiters (\fIws_delim\fR member) contain whitespace characters
+(\fB\(dq \\t\\n\(dq\fR), which is the default.
.SS Variable expansion
Variable expansion replaces each occurrence of
.BI $ NAME
or
.BI ${ NAME }
-with the value of the variable \fINAME\fR. It is enabled if the
-flag \fBWRDSF_NOVAR\fR is not set. The caller is responsible for
-supplying the table of available variables. Two mechanisms are
-provided: environment array and a callback function.
+with the value of the variable \fINAME\fR. It is enabled by default
+and can be disabled by setting the \fBWRDSF_NOVAR\fR flag. The caller
+is responsible for supplying the table of available variables. Two
+mechanisms are provided: environment array and a callback function.
.PP
Environment array is a \fBNULL\fR-terminated array of variables,
stored in the \fIws_env\fR member. The \fBWRDSF_ENV\fR flag must be
@@ -204,8 +197,8 @@ function itself shall be defined as
int getvar (char **ret, const char *var, size_t len, void *clos);
.EE
.PP
-The function shall look up for the variable identified by the first
-\fIlen\fR bytes of the string \fIvar\fR. If such variable is found,
+The function shall look up the variable identified by the first
+\fIlen\fR bytes of the string \fIvar\fR. If the variable is found,
the function shall store a copy of its value (allocated using
\fBmalloc\fR(3)) in the memory location pointed to by \fBret\fR, and
return \fBWRDSE_OK\fR. If the variable is not found, the function shall
@@ -216,7 +209,7 @@ If \fIws_getvar\fR returns
.BR WRDSE_USERERR ,
it must store the pointer to the error description string in
.BR *ret .
-In any case (whether returning \fB0\fR or \fBWRDSE_USERERR\fR) , the
+In any case (whether returning \fB0\fR or \fBWRDSE_USERERR\fR), the
data returned in \fBret\fR must be allocated using
.BR malloc (3).
.PP
@@ -225,10 +218,11 @@ If both
and
.I ws_getvar
are used, the variable is first looked up in
-.IR ws_env ,
-and if not found there, the
+.IR ws_env .
+If it is not found there, the
.I ws_getvar
-function is called.
+callback is invoked.
+This order is reverted if the \fBWRDSO_GETVARPREF\fR option is set.
.PP
During variable expansion, the forms below cause
.B wordsplit
@@ -255,14 +249,61 @@ Otherwise, the value of \fIvariable\fR is substituted.
.BI ${ variable :+ word }
.BR "Use Alternate Value" .
If \fIvariable\fR is null or unset, nothing is substituted, otherwise the
-expansion of \fIword\fR is substituted.
+expansion of \fIword\fR is substituted.
+.PP
+Unless the above forms are used, a reference to an undefined variable
+expands to empty string. Three flags affect this behavior. If the
+\fBWRDSF_UNDEF\fR flag is set, expanding undefined variable triggers
+a \fBWRDSE_UNDEF\fR error. If the \fBWRDSF_WARNUNDEF\fR flag is set,
+a non-fatal warning is emitted for each undefined variable. Finally,
+if the \fBWRDSF_KEEPUNDEF\fR flag is set, references to undefined
+variables are left unexpanded.
+.PP
+If two or three of these flags are set simultaneously, the behavior is
+undefined.
+.SS Positional argument expansion
+\fIPositional arguments\fR are special parameters that can be
+referenced in the input string by their ordinal number. The numbering
+begins at \fB0\fR. The syntax for referencing positional arguments is
+the same as for the variables, except that argument index is used
+instead of the variable name. If the index is between 0 and 9, the
+\fB$\fIN\fR form is acceptable. Otherwise, the index must be enclosed
+in curly braces: \fB${\fIN\fB}\fR.
+.PP
+During argument expansion, references to positional arguments are
+replaced with the corresponding values.
+.PP
+Argument expansion is requested by the \fBWRDSO_PARAMV\fR option bit.
+The NULL-terminated array of variables shall be supplied in the
+.I ws_paramv
+member. The
+.I ws_paramc
+member shall be initialized to the number of elements in
+.IR ws_paramv .
+.PP
+Setting the \fBWRDSO_PARAM_NEGIDX\fR option together with
+\fBWRDSO_PARAMV\fR enables negative positional argument references.
+A negative reference has the form \fB${-\fIN\fB}\fR. It is expanded
+to the value of the argument with index \fB\fIws_paramc\fR \- \fIN\fR.
.SS Quote removal
-Quote removal translates unquoted escape sequences into corresponding bytes.
-An escape sequence is a backslash followed by one or more characters. By
-default, each sequence \fB\\\fIC\fR appearing in unquoted words is
-replaced with the character \fIC\fR. In doubly-quoted strings, two
-backslash sequences are recognized: \fB\\\\\fR translates to a single
-backslash, and \fB\\\(dq\fR translates to a double-quote.
+During quote removal, single or double quotes surrounding a sequence
+of characters are removed and the sequence itself is treated as a
+single word. Characters within single quotes are treated verbatim.
+Characters within double quotes undergo variable expansion and
+backslash interpretation (see below).
+.PP
+Recognition of single quoted strings is enabled by the
+\fBWRDSF_SQUOTE\fR flag. Recognition of double quotes is enabled by
+the \fBWRDSF_DQUOTE\fR flag. The macro \fBWRDSF_QUOTE\fR enables both.
+.SS Backslash interpretation
+Backslash interpretation translates unquoted
+.I escape sequences
+into corresponding characters. An escape sequence is a backslash followed
+by one or more characters. By default, each sequence \fB\\\fIC\fR
+appearing in unquoted words is replaced with the character \fIC\fR. In
+doubly-quoted strings, two backslash sequences are recognized:
+\fB\\\\\fR translates to a single backslash, and \fB\\\(dq\fR
+translates to a double-quote.
.PP
Two flags are provided to modify this behavior. If
.I WRDSF_CESCAPES
@@ -292,16 +333,16 @@ The \fBWRDSF_ESCAPE\fR flag allows the caller to customize escape
sequences. If it is set, the \fBws_escape\fR member must be
initialized. This member provides escape tables for unquoted words
(\fBws_escape[0]\fR) and quoted strings (\fBws_escape[1]\fR). Each
-table is a string consisting of even number of charactes. In each
+table is a string consisting of an even number of charactes. In each
pair of characters, the first one is a character that can appear after
backslash, and the following one is its translation. For example, the
above table of C escapes is represented as
-\fB\(dqa\\ab\\bf\\fn\\nr\\rt\\tv\\v\(dq\fR.
+\fB\(dq\\\\\\\\"\\"a\\ab\\bf\\fn\\nr\\rt\\tv\\v\(dq\fR.
.PP
It is valid to initialize \fBws_escape\fR elements to zero. In this
case, no backslash translation occurs.
.PP
-The handling of octal and hex escapes is controlled by the following
+Interpretation of octal and hex escapes is controlled by the following
bits in \fBws_options\fR:
.TP
.B WRDSO_BSKEEP_WORD
@@ -357,9 +398,9 @@ The substitution function should be defined as follows:
void *\fIclos\fB);\fR
.RE
.PP
-First \fIlen\fR bytes of \fIcmd\fR contain the command invocation as
-it appeared between
-.BR $( and ),
+On input, the first \fIlen\fR bytes of \fIcmd\fR contain the command
+invocation as it appeared between
+.BR $( " and " ),
with all expansions performed.
.PP
The \fIargv\fR parameter contains the command
@@ -381,11 +422,27 @@ is returned, a pointer to the error description string must be stored in
When \fBWRDSE_OK\fR or \fBWRDSE_USERERR\fR is returned, the
data stored in \fB*ret\fR must be allocated using
.BR malloc (3).
-.SS Pathname expansion
-Pathname expansion is performed if the \fBWRDSF_PATHEXPAND\fR flag is
-set. Each unquoted word is scanned for characters
-.BR * , ? ", and " [ .
-If one of these appears, the word is considered a \fIpattern\fR (in
+.SS Tilde and pathname expansion
+Both expansions are performed if the
+.B WRDSF_PATHEXPAND
+flag is set.
+.PP
+.I Tilde expansion
+affects any word that begins with an unquoted tilde
+character (\fB~\fR). If the tilde is followed immediately by a slash,
+it is replaced with the home directory of the current user (as
+determined by his \fBpasswd\fR entry). A tilde alone is handled the
+same way. Otherwise, the characters between the tilde and first slash
+character (or end of string, if it doesn't contain any) are treated as
+a login name. and are replaced (along with the tilde itself) with the
+home directory of that user. If there is no user with such login
+name, the word is left unchanged.
+.PP
+During
+.I pathname expansion
+each unquoted word is scanned for characters
+.BR * ", " ? ", and " [ .
+If any of these appears, the word is considered a \fIpattern\fR (in
the sense of
.BR glob (3))
and is replaced with an alphabetically sorted list of file names matching the
@@ -429,9 +486,9 @@ the last word. For example, if the input to the above fragment were
The data type \fBwordsplit_t\fR has three members that contain
output data upon return from \fBwordsplit\fR or \fBwordsplit_len\fR,
and a number of members that the caller can initialize on input in
-order to customize the function behavior. Each its member has a
-corresponding flag bit, which must be set in the \fIflags\fR argument
-in order to instruct the \fBwordsplit\fR function to use it.
+order to customize the function behavior. For each input member there
+is a corresponding flag bit, which must be set in the \fIflags\fR argument
+in order to instruct the \fBwordsplit\fR function to use the member.
.SS OUTPUT
.TP
.BI size_t " ws_wordc"
@@ -441,6 +498,17 @@ from \fBwordsplit\fR.
.BI "char ** " ws_wordv
Array of resulting words. Accessible upon successful return
from \fBwordsplit\fR.
+.PP
+The caller should not attempt to free or reallocate \fIws_wordv\fR or
+any elements thereof, nor to modify \fIws_wordc\fR.
+.PP
+To store away the words for use after freeing \fIws\fR with
+.BR wordsplit_free ,
+the caller should use
+.BR wordsplit_getwords .
+It is more effective than copying the contents of
+.I ws_wordv
+manually.
.TP
.BI "size_t " ws_wordi
Total number of words processed. This field is intended for use with
@@ -452,17 +520,41 @@ flag. If that flag is not set, the following relation holds:
Error code, if the invocation of \fBwordsplit\fR or
\fBwordsplit_len\fR failed. This is the same value as returned from
the function in that case.
+.TP
+.BI "char *" ws_errctx
+On error, context in which the error occurred. For
+.BR WRDSE_UNDEF ,
+it is the name of the undefined variable. For
+.B WRDSE_GLOBERR
+- the pattern that caused error.
+.sp
+The caller should treat this member as
+.BR "const char *" .
.PP
-The caller should not attempt to free or reallocate \fIws_wordv\fR or
-any elements thereof, nor to modify \fIws_wordc\fR.
+The following members are used if the variable expansion was requested
+and the input string contained an
+.B Assign Default Values
+form (\fB${\fIvariable\fB:=\fIword\fB}\fR).
+.TP
+.BI "char **" ws_envbuf
+Modified environment. It follows the same arrangement as \fIws_env\fR
+on input (see the \fBWRDSF_ENV_KV\fR flag). If \fIws_env\fR was NULL (or
+\fBWRDSF_ENV\fR was not set), but the \fIws_getvar\fR callback was
+used, the \fIws_envbuf\fR array will contain only the modified variables.
+.TP
+.BI "size_t " ws_envidx
+Number of entries in
+.IR ws_envbuf .
.PP
-To store away the words for use after freeing \fIws\fR with
-.BR wordsplit_free ,
-the caller should use
-.BR wordsplit_getwords .
-It is more effective than copying the contents of
-.I ws_wordv
-manually.
+If positional parameters were used (see the \fBWRDSO_PARAMV\fR option)
+and any of them were modified during processing, the following two
+members supply the modified parameter array.
+.TP
+.BI "char ** " ws_parambuf
+Array of positional parameters.
+.TP
+.BI "size_t " ws_paramidx
+Number of positional parameters.
.SS INPUT
.TP
.BI "size_t " ws_offs
@@ -569,12 +661,12 @@ one containing variable name, and the next one with its
value.
.TP
.BI "int (*" ws_getvar ") (char **ret, const char *var, size_t len, void *clos)"
-Points to the function that will be used during variable expansion to
-look up for the value of the environment variable named \fBvar\fR.
+Points to the function that will be used during variable expansion for
+environment variable lookups.
This function is used if the variable expansion is enabled (i.e. the
.B WRDSF_NOVAR
flag is not set), and the \fBWRDSF_GETVAR\fR flag is set.
-
+.sp
If both
.B WRDSF_ENV
and
@@ -583,14 +675,15 @@ are set, the variable is first looked up in the
.I ws_env
array and, if not found there,
.I ws_getvar
-is called.
-
+is called. If the \fBWRDSO_GETVARPREF\fR option is set, this order is
+reverted.
+.sp
The name of the variable is specified by the first \fIlen\fR bytes of
the string \fIvar\fR. The \fIclos\fR parameter supplies the
user-specific data (see below the description of \fIws_closure\fR
member) and the \fBret\fR parameter points to the memory location
where output data is to be stored. On success, the function must
-store ther a pointer to the string with the value of the variable and
+store there a pointer to the string with the value of the variable and
return 0. On error, it must return one of the error codes described
in the section
.BR "ERROR CODES" .
@@ -598,7 +691,7 @@ If \fIws_getvar\fR returns
.BR WRDSE_USERERR ,
it must store the pointer to the error description string in
.BR *ret .
-In any case (whether returning \fB0\fR or \fBWRDSE_USERERR\fR) , the
+In any case (whether returning \fB0\fR or \fBWRDSE_USERERR\fR), the
data returned in \fBret\fR must be allocated using
.BR malloc (3).
.TP
@@ -629,7 +722,7 @@ If \fIws_command\fR returns
.BR WRDSE_USERERR ,
it must store the pointer to the error description string in
.BR *ret .
-In any case (whether returning \fB0\fR or \fBWRDSE_USERERR\fR) , the
+In any case (whether returning \fB0\fR or \fBWRDSE_USERERR\fR), the
data returned in \fBret\fR must be allocated using
.BR malloc (3).
@@ -639,6 +732,17 @@ command substitution disabled.
The \fIclos\fR parameter supplies user-specific data (see the
description of \fIws_closure\fR member).
+.PP
+The following two members are consulted if the \fBWRDSO_PARAMV\fR
+option is set. They provide an array of positional parameters.
+.TP
+.BI "char const **" ws_paramv
+Positional parameters. These are accessible in the input string using
+the notation \fB$\fIN\fR or \fB${\fIN\fB}\fR, where \fIN\fR is the
+0-based parameter number.
+.TP
+.BI "size_t " ws_paramc
+Number of positional parameters.
.SH FLAGS
The following macros are defined for use in the \fBflags\fR argument.
.TP
@@ -657,7 +761,7 @@ delimiter, replace \fBC\fR escapes appearing in the input string with
the corresponding characters.
.TP
.B WRDSF_APPEND
-Append the words found to the array resulting from a previous call to
+Append the resulting words to the array left from a previous call to
\fBwordsplit\fR.
.TP
.B WRDSF_DOOFFS
@@ -671,7 +775,9 @@ These are not counted in the returned
.IR ws_wordc .
.TP
.B WRDSF_NOCMD
-Don't do command substitution.
+Don't do command substitution. The \fBWRDSO_NOCMDSPLIT\fR option set
+together with this flag prevents splitting command invocations
+into separate words (see the \fBOPTIONS\fR section).
.TP
.B WRDSF_REUSE
The parameter \fIws\fR resulted from a previous call to
@@ -686,7 +792,9 @@ Print errors using
Consider it an error if an undefined variable is expanded.
.TP
.B WRDSF_NOVAR
-Don't do variable expansion.
+Don't do variable expansion. The \fBWRDSO_NOVARSPLIT\fR option set
+together with this flag prevents variable references from being split
+into separate words (see the \fBOPTIONS\fR section).
.TP
.B WRDSF_ENOMEMABRT
Abort on
@@ -721,7 +829,8 @@ Return delimiters.
.TP
.B WRDSF_SED_EXPR
Treat
-.BR sed (1) expressions as words.
+.BR sed (1)
+expressions as words.
.TP
.B WRDSF_DELIM
.I ws_delim
@@ -792,8 +901,7 @@ See the section
for a detailed discussion.
.TP
.B WRDSF_PATHEXPAND
-Perform pathname and tilde expansion. If this flag is set, the
-\fIws_options\fR member must also be initialized. See the
+Perform pathname and tilde expansion. See the
subsection
.B "Pathname expansion"
for details.
@@ -822,32 +930,60 @@ metacharacters.
.PP
.TP
.B WRDSO_BSKEEP_WORD
-Quote removal: when an unrecognized escape sequence is encountered in a word,
-preserve it on output. If that bit is not set, the backslash is
-removed from such sequences.
+Backslash interpretation: when an unrecognized escape sequence is
+encountered in a word, preserve it on output. If that bit is not set,
+the backslash is removed from such sequences.
.TP
.B WRDSO_OESC_WORD
-Quote removal: handle octal escapes in words.
+Backslash interpretation: handle octal escapes in words.
.TP
.B WRDSO_XESC_WORD
-Quote removal: handle hex escapes in words.
+Backslash interpretation: handle hex escapes in words.
.TP
.B WRDSO_BSKEEP_QUOTE
-Quote removal: when an unrecognized escape sequence is encountered in
-a doubly-quoted string, preserve it on output. If that bit is not
-set, the backslash is removed from such sequences.
+Backslash interpretation: when an unrecognized escape sequence is
+encountered in a doubly-quoted string, preserve it on output. If that
+bit is not set, the backslash is removed from such sequences.
.TP
.B WRDSO_OESC_QUOTE
-Quote removal: handle octal escapes in doubly-quoted strings.
+Backslash interpretation: handle octal escapes in doubly-quoted strings.
.TP
.B WRDSO_XESC_QUOTE
-Quote removal: handle hex escapes in doubly-quoted strings.
+Backslash interpretation: handle hex escapes in doubly-quoted strings.
.TP
.B WRDSO_MAXWORDS
The \fBws_maxwords\fR member is initialized. This is used to control
the number of words returned by a call to \fBwordsplit\fR. For a
detailed discussion, refer to the chapter
.BR "LIMITING THE NUMBER OF WORDS" .
+.TP
+.B WRDSO_NOVARSPLIT
+When \fBWRDSF_NOVAR\fR is set, don't split variable references, even
+if they contain whitespace. E.g.
+.B ${VAR:-foo bar}
+will be treated as a single word.
+.TP
+.B WRDSO_NOCMDSPLIT
+When \fBWRDSF_NOCMD\fR is set, don't split whatever looks like command
+invocation, even if it contains whitespace. E.g.
+.B $(command arg)
+will be treated as a single word.
+.TP
+.B WRDSO_PARAMV
+Positional arguments are supplied in
+.I ws_paramv
+and
+.IR ws_paramc .
+See the subsection
+.B Position