aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSergey Poznyakoff <gray@gnu.org>2021-07-31 00:32:38 +0300
committerSergey Poznyakoff <gray@gnu.org>2021-07-31 00:48:02 +0300
commit0e79aa787877cdbffc8900952115de9173f41732 (patch)
treedb92715d7ec3c5ca90866f68491933f8b069d69c
parent257bb9882aaf9ec7acc2401fd875a8d0b0a69c2e (diff)
downloadgdbm-0e79aa787877cdbffc8900952115de9173f41732.tar.gz
gdbm-0e79aa787877cdbffc8900952115de9173f41732.tar.bz2
Update the documentation
-rw-r--r--README_crash_tolerance.txt197
-rw-r--r--doc/gdbm.39
-rw-r--r--doc/gdbm.texi188
-rw-r--r--doc/gdbmtool.113
4 files changed, 190 insertions, 217 deletions
diff --git a/README_crash_tolerance.txt b/README_crash_tolerance.txt
deleted file mode 100644
index 5aaf483..0000000
--- a/README_crash_tolerance.txt
+++ /dev/null
@@ -1,197 +0,0 @@
-
-Crash Tolerance for GNU dbm
-===========================
-
-This file describes a new (as of release 1.21) feature that can be
-enabled at compile time and used in environments with appropriate
-support from the OS (currently Linux) and filesystem (currently XFS,
-BtrFS, and OCFS2). The feature is a "pure opt-in," in the sense that
-it has no effect whatsoever unless it is explicitly enabled at
-compile time and used by applications. It has been tested on
-late-2020-vintage Fedora Linux and XFS.
-
-See the "Drill Bits" column in the July/August 2021 issue of ACM
-_Queue_ magazine for a broader discussion of crash-tolerant GNU dbm.
-If for whatever reason you can't access this column, contact the
-author (Kelly).
-
-Read and thoroughly understand this file before attempting to use the
-new feature. Address questions/feedback to the maintainer(s) and to
-Terence Kelly, tpkelly@{acm.org, cs.princeton.edu, eecs.umich.edu}.
-
-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-
-Background:
-
-Historically GNU dbm did not tolerate crashes: An ill-timed crash due
-to a power outage, an operating system kernel panic, or an abnormal
-application process termination could corrupt or destroy data in the
-database file. Corruption is likely if a crash occurs during updates
-to the GDBM database file, e.g., during a gdbm_store() or gdbm_sync()
-call. Therefore GNU dbm was not suitable for applications that
-require the ability to recover an up-to-date consistent state of
-their persistent data following a crash. Such applications resorted
-instead to alternative "transactional" NoSQL data stores such as
-BerkeleyDB or Kyoto Cabinet, or even full-blown SQL databases such as
-MySQL or SQLite. Which is unfortunate if all the application really
-needs is a crash-tolerant GDBM.
-
-New crash-tolerance feature:
-
-GNU dbm now includes an optional crash-tolerance mechanism that, when
-used correctly, guarantees that a consistent recent state of
-application data can be recovered followng a crash. Specifically, it
-guarantees that the state of the database file corresponding to the
-most recent successful gdbm_sync() call can be recovered. Crash
-tolerance must be enabled when the GNU dbm library is compiled, and
-applications must request crash tolerance for each GDBM_FILE by
-calling a new API.
-
-If the new mechanism is used correctly, crashes such as power
-outages, OS kernel panics, and (some) application process crashes
-will be tolerated. Non-tolerated failures include physical
-destruction of storage devices and corruption due to bugs in
-application logic. For example, the new mechanism won't help if a
-pointer bug in your application corrupts gdbm's private in-memory
-data which in turn corrupts the database file.
-
-Using crash tolerance:
-
-(1) The GNU dbm library must be built with an additional C compiler
-#define flag. After unpacking the tarball, from the C-shell command
-line it suffices to do the following before running make:
-
- % setenv CFLAGS -DGDBM_FAILURE_ATOMIC
- % ./configure CFLAGS=-DGDBM_FAILURE_ATOMIC >& configure.out
-
-(2) You must use a filesystem that supports reflink copying.
-Currently XFS, BtrFS, and OCFS2 support reflink. You can create such
-a filesystem if you don't have one already. (Note that reflink
-support may require that special options be specified at the time of
-filesystem creation; this is true of XFS.) The most conventional way
-to create a filesystem is on a dedicated storage device. However it
-is also possible to create a filesystem *within an ordinary file* on
-some other filesystem. For example, executing the following commands
-from the C-shell command line will create a smallish XFS filesystem
-inside a file on an ext4 filesystem:
-
- % mkdir XFS
- % cd XFS
- % sudo truncate --size 512m XFSfile
- % sudo mkfs.xfs -m crc=1 -m reflink=1 XFSfile
- % sudo mkdir XFSmountpoint
- % sudo mount -o loop XFSfile XFSmountpoint
- % sudo xfs_info XFSmountpoint
- % cd XFSmountpoint
- % sudo mkdir test
- % set me = `whoami`':'`whoami`
- % sudo chown $me test
- % cd test
- % echo foo > bar
- % ls -l bar
-
-After executing the commands above, from the diretory where you
-started you should see a directory XFS/XFSmountpoint/test/ where your
-unprivileged user account may create and delete files. Reflink
-copying via ioctl(FICLONE) should work for files in and below this
-directory. You can test reflink copying using the GNU "cp"
-command-line utility: "cp --reflink=always file1 file2". Read the
-manpage for the Linux-specific API "ioctl_ficlone(2)" for additional
-information.
-
-Your GNU dbm database file and two other files described below must
-all reside on the same reflink-capable filesystem.
-
-(3) In your application source code, #define GDBM_FAILURE_ATOMIC
-before you #include <gdbm.h>.
-
-(4) Open a GNU dbm database with gdbm_open(). Unless you know what
-you are doing, do *not* specify the GDBM_SYNC flag when opening the
-database. The reason is that you want your application to explicitly
-control when gdbm_sync() is called; you don't want an implicit sync
-on every database operation.
-
-(5) Request crash tolerance by invoking the following new interface:
-
- gdbm_failure_atomic(GDBM_FILE dbf, const char *even, const char *odd);
-
-"even" and "odd" are the pathnames of two files that will be created
-and filled with snapshots of the database file. These two files must
-*not* exist when gdbm_failure_atomic() is called and must reside on the
-same filesystem as the database file. The filesystem must support
-reflink copying, i.e., ioctl(FICLONE) must work.
-
-After you call gdbm_failure_atomic(), every call to gdbm_sync() will
-make an efficient reflink snapshot of the database file in either the
-"even" or the "odd" snapshot file; consecutive gdbm_sync() calls
-alternate between the two, hence the names. The permission bits and
-last-mod timestamps on the snapshot files determine which one
-contains the state of the database file corresponding to the most
-recent successful gdbm_sync(). Post-crash recovery is described
-below.
-
-(6) When your application knows that the state of the database is
-consistent (i.e., all relevant application-level invariants hold),
-you may call gdbm_sync(). For example, if your application manages
-bank accounts, transferring money from one account to another should
-maintain the invariant that the sum of the two accounts is the same
-before and after the transfer: It is correct to decrement account A
-by $7, increment account B by $7, and then call gdbm_sync(). However
-it is *not* correct to call gdbm_sync() *between* the decrement of A
-and the increment of B, because a crash immediately after that call
-would destroy money. The general rule is simple, sensible, and
-memorable: Call gdbm_sync() only when the database is in a state from
-which you are willing and able to recover following a crash. (If you
-think about it you'll realize that there's never any other moment
-when you'd really want to call gdbm_sync(), regardless of whether
-crash-tolerance is enabled. Why on earth would you push the state of
-an inconsistent unrecoverable database down to durable media?).
-
-(7) If a crash occurs, the snapshot file ("even" or "odd") containing
-the database state reflecting the most recent successful gdbm_sync()
-call is the snapshot file whose permission bits are read-only and
-whose last-modification timestamp is greatest. If both snapshot
-files are readable, we choose the one with the most recent
-last-modification timestamp. Following a crash, *do not* do anything
-that could change the file permissions or last-mod timestamp on
-either snapshot file!
-
-The gdbm_latest() function takes two filename arguments---the "even"
-and "odd" snapshot filenames---and tells you which is the most recent
-readable file. That's the snapshot file that should replace the
-original database file, which may have been corrupted by the crash.
-
-Return values:
-
-Both new functions, gdbm_failure_atomic() and gdbm_latest(), pinpoint
-mishaps by returning the *negation* of the source code line number on
-which something went wrong: "return (-1 * __LINE__)". So to diagnose
-problems, "use the Source, Luke!"
-
-Note that the values returned by the gdbm_sync() function may change
-as a result of enabling crash tolerance. Applications unprepared for
-the new return values might become confused.
-
-Performance:
-
-The purpose of a parachute is not to hasten descent. Crash tolerance
-is a safety mechanism, not a performance accelerator. Reflink
-copying is designed to be as efficient as possible, but making
-snapshots of the GNU dbm database file on every gdbm_sync() call
-entails overheads. The performance impact of GDBM crash tolerance
-will depend on many factors including the type and configuration of
-the underlying storage system, how often the application calls
-gdbm_sync(), and the extent of changes to the database file between
-consecutive calls to gdbm_sync().
-
-Availability:
-
-To ensure that application data can survive the failure of one or
-more storage devices, replicated storage (e.g., RAID) may be used
-beneath the reflink-capable filesystem. Some cloud providers offer
-block storage services that mimic the interface of individual storage
-devices but that are implemented as high-availability fault-tolerant
-replicated distributed storage systems. Installing a reflink-capable
-filesystem atop a high-availability storage system is a good starting
-point for a high-availability crash-tolerant GDBM.
-
diff --git a/doc/gdbm.3 b/doc/gdbm.3
index 963b9f0..6f569dc 100644
--- a/doc/gdbm.3
+++ b/doc/gdbm.3
@@ -13,7 +13,7 @@
.\"
.\" You should have received a copy of the GNU General Public License
.\" along with GDBM. If not, see <http://www.gnu.org/licenses/>. */
-.TH GDBM 3 "June 25, 2021" "GDBM" "GDBM User Reference"
+.TH GDBM 3 "July 31, 2021" "GDBM" "GDBM User Reference"
.SH NAME
GDBM \- The GNU database manager. Includes \fBdbm\fR and \fBndbm\fR
compatibility.
@@ -446,9 +446,10 @@ of the underlying database. This mechanism requires OS and
filesystem support and must be requested when \fBgdbm\fR is compiled.
The crash-tolerance mechanism is a "pure opt-in" feature, in the
sense that it has no effects whatsoever except on those applications
-that explicitly request it. See file "README_crash_tolerance.txt"
-in the distribution tarball for details.
-
+that explicitly request it. For details, see the chapter
+.B "Crash Tolerance"
+in the
+.BR "GDBM manual" .
.SH LINKING
This library is accessed by specifying \fI\-lgdbm\fR as the last
parameter to the compile line, e.g.:
diff --git a/doc/gdbm.texi b/doc/gdbm.texi
index 84cc3aa..7a9198c 100644
--- a/doc/gdbm.texi
+++ b/doc/gdbm.texi
@@ -107,6 +107,7 @@ Functions:
* Sequential:: Sequential access to records.
* Reorganization:: Database reorganization.
* Sync:: Insure all writes to disk have competed.
+* Database format:: GDBM database formats.
* Flat files:: Export and import to Flat file format.
* Errors:: Error handling.
* Recovery:: Recovery from fatal errors.
@@ -404,9 +405,6 @@ the database and wants it created if it does not already exist. If
created, regardless of whether one existed, and wants read and write
access to the new database.
-@kwindex GDBM_SYNC
-@kwindex GDBM_NOLOCK
-@kwindex GDBM_NOMMAP
The following constants may also be logically or'd into the database
flags:
@@ -423,6 +421,14 @@ A reverse of @code{GDBM_SYNC}. Synchronize writes only when needed.
This is the default. The flag is provided for compatibility with
previous versions of @command{GDBM}.
+@kwindex GDBM_NUMSYNC
+@item GDBM_NUMSYNC
+Useful only together with @code{GDBM_NEWDB}, this bit instructs
+@code{gdbm_open} to create new database in @dfn{extended database
+format}, suitable for effective crash recovery. @xref{Numsync}, for a
+detailed discussion of this format, and @ref{Crash Tolerance}, for a
+discussion of crash recovery.
+
@kwindex GDBM_NOLOCK
@item GDBM_NOLOCK
Don't lock the database file. Use this flag if you intend to do
@@ -870,6 +876,46 @@ immediately after the set of changes have been made.
describing the error and returns -1.
@end deftypefn
+@node Database format
+@chapter Changing database format
+As of version @value{VERSION}, @command{GDBM} supports databases in
+two formats: @dfn{standard} and @dfn{extended}. The standard format
+is used most often. The @dfn{extended} database format is used to
+provide additional crash resistance (@pxref{Crash Tolerance}).
+
+Depending on the value of the @var{flags} parameter in a call to
+@code{gdbm_open} (@pxref{Open}), a database can be created in either
+format.
+
+The format of an existing database can be changed using the
+@code{gdbm_convert} function:
+
+@deftypefn {gdbm interface} int gdbm_convert (GDBM_FILE @var{dbf}, @
+ int @var{flag})
+Changes the format of the database file @var{dbf}. Allowed values for
+@var{flag} are:
+
+@table @code
+@item 0
+Convert database to the standard format.
+
+@kwindex GDBM_NUMSYNC
+@item GDBM_NUMSYNC
+Convert database to the extended @dfn{numsync} format (@pxref{Numsync}).
+@end table
+
+On success, the function returns 0. In this case, it should be
+followed by a call to @code{gdbm_sync} (@pxref{Sync}) or
+@code{gdbm_close} (@pxref{Close}) to ensure the changes are written to
+the disk.
+
+On error, returns -1 and sets the @code{gdbm_errno} variable
+(@pxref{Variables, gdbm_errno}).
+
+If the database is already in the requested format, the function
+returns success (0) without doing anything.
+@end deftypefn
+
@node Flat files
@chapter Export and Import
@cindex Flat file format
@@ -1345,11 +1391,11 @@ support from the OS and the filesystem. As of version
@value{VERSION}, this means a Linux kernel 5.12.12 or later and
a filesystem that supports reflink copying, such as XFS, BtrFS, or
OCFS2. If these prerequisites are met, crash tolerance code will
-be enabled automaticaly by the @command{configure} script when
+be enabled automatically by the @command{configure} script when
building the package.
The crash-tolerance mechanism, when used correctly, guarantees that a
-consistent recent state of application data can be recovered followng
+consistent recent state of application data can be recovered following
a crash. Specifically, it guarantees that the state of the database
file corresponding to the most recent successful gdbm_sync() call can
be recovered.
@@ -1359,7 +1405,7 @@ outages, OS kernel panics, and (some) application process crashes
will be tolerated. Non-tolerated failures include physical
destruction of storage devices and corruption due to bugs in
application logic. For example, the new mechanism won't help if a
-pointer bug in your application corrupts gdbm's private in-memory
+pointer bug in your application corrupts @command{GDBM} private in-memory
data which in turn corrupts the database file.
To enable crash tolerance in your application, follow these steps.
@@ -1391,7 +1437,7 @@ The XFS filesystem is now available in directory
unprivileged user account may create and delete files:
@example
-mkdir XFSmountpoint
+cd XFSmountpoint
mkdir test
chown @var{user}:@var{group} test
@end example
@@ -1415,11 +1461,14 @@ all reside on the same reflink-capable filesystem.
@heading Enabling crash tolerance
-Open a GNU dbm database with @code{gdbm_open}. Unless you know what
-you are doing, do not specify the @code{GDBM_SYNC} flag when opening the
-database. The reason is that you want your application to explicitly
-control when @code{gdbm_sync} is called; you don't want an implicit sync
-on every database operation.
+Open a GNU dbm database with @code{gdbm_open}. Whenever possible, use
+the extended @command{GDBM} format. Generally speaking, this means
+using the @code{GDBM_NUMSYNC} flag when creating the database
+(@pxref{Numsync}). Unless you know what you are doing, do not specify
+the @code{GDBM_SYNC} flag when opening the database. The reason is that
+you want your application to explicitly control when @code{gdbm_sync}
+is called; you don't want an implicit sync on every database
+operation.
Request crash tolerance by invoking the following interface:
@@ -1470,9 +1519,11 @@ containing the database state reflecting the most recent successful
@code{gdbm_sync} call is the snapshot file whose permission bits are
read-only and whose last-modification timestamp is greatest. If both
snapshot files are readable, we choose the one with the most recent
-last-modification timestamp. Following a crash, @emph{do not} do
-anything that could change the file permissions or last-mod timestamp on
-either snapshot file!
+last-modification timestamp@footnote{The experimental @dfn{numsync}
+extension is provided to handle such case gracefully. @xref{Numsync},
+for details.}. Following a crash, @emph{do not} do anything that
+could change the file permissions or last-mod timestamp on either
+snapshot file!
The @code{gdbm_latest_snapshot} function is provided, that selects the
right snapshot among the two. Invoke it as:
@@ -1502,6 +1553,19 @@ switch (gdbm_latest_snapshot (even, odd, &recovery_file))
case GDBM_SNAPSHOT_SAME:
fprintf (stderr, "Both snapshots have the same date!\n);
exit (1);
+
+ case GDBM_SNAPSHOT_SUSPICIOUS:
+ /*
+ * That can occur only in databases with extended numsync header
+ * enabled. @xref{Numsync}.
+ */
+ fprintf (stderr, "returned snapshot %s is suspicious\n", recovery_file);
+ fprintf (stderr, "examine it and take action\n");
+ /*
+ * Switch to interactive mode letting the user examine the
+ * snapshot and take appropriate action
+ */
+
@}
@end group
@end example
@@ -1529,6 +1593,76 @@ replicated distributed storage systems. Installing a reflink-capable
filesystem atop a high-availability storage system is a good starting
point for a high-availability crash-tolerant GDBM.
+@node Numsync
+@section Numsync Extension
+
+In @ref{Crash recovery}, we have shown that for database recovery,
+one should select the snapshot whose permission bits are read-only and
+whose last-modification timestamp is greatest. However, there may be
+cases when a crash occurs at such a time that both snapshot files
+remain readable. It may also happen, that their permissions and/or
+modification times are inadvertently changed before recovery. To
+make it possible to select the right snapshot in such cases, a new
+@dfn{extended database format} was introduced in @command{GDBM}
+version 1.21. This format adds to the database header the
+@code{numsync} field, that holds the number of synchronizations the
+database underwent before being closed or abandoned due to a crash.
+
+Each snapshot is an exact copy of the database at a given point of
+time. Thus, if both snapshots of a database in extended format are
+readable, it will suffice to examine their @code{numsync} counters
+and select the one whose @code{numsync} is greater. That's what
+the @code{gdbm_latest_snapshot} function does in this case.
+
+It is worth noticing, that the two counters should differ exactly by
+one. If the difference is greater than that, @code{gdbm_latest_snapshot}
+will still select the snapshot with the greater @code{numsync} value,
+but will return a special status code, @code{GDBM_SNAPSHOT_SUSPICIOUS},
+indicating that the proposed snapshot file has been chosen based on
+suspicious or unreliable data. If, during a recovery attempt, you get
+this status code, we recommend to proceed with the manual recovery,
+e.g. by examining both snapshot files using @command{gdbmtool -r}
+(@pxref{gdbmtool}).
+
+To create a database in extended format, call @code{gdbm_open} with
+both @code{GDBM_NEWDB} and @code{GDBM_NUMSYNC} flags:
+
+@example
+dbf = gdbm_open(dbfile, 0, GDBM_NEWDB|GDBM_NUMSYNC, 0600, NULL);
+@end example
+
+@noindent
+Notice, that this flag must always be used together with
+@code{GDBM_NEWDB} (@pxref{Open}).
+
+A standard @command{GDBM} database can be converted to the extended
+format. To convert an existing database to the extended format, use the
+@code{gdbm_convert} function (@pxref{Database format}):
+
+@example
+ rc = gdbm_convert(dbf, GDBM_NUMSYNC);
+@end example
+
+You can do the same using the @command{gdbmtool} utility
+(@pxref{commands, upgrade}):
+
+@example
+gdbmtool @var{dbname} upgrade
+@end example
+
+The conversion is reversible. To convert a database from extended
+format back to the standard @command{GDBM} format, do:
+
+@example
+ rc = gdbm_convert(dbf, 0);
+@end example
+
+To do the from the command line:
+
+@example
+gdbmtool @var{dbname} downgrade
+@end example
+
@node Crash Tolerance API
@section Crash Tolerance API
@@ -1581,6 +1715,16 @@ select between the two snapshots (this means they are both readable
and have exactly the same @code{mtime} timestamp), the function returns
@code{GDBM_SNAPSHOT_SAME}.
+@kwindex GDBM_SNAPSHOT_SUSPICIOUS
+If the @samp{numsync} extension is enabled (@pxref{Numsync}), the
+function can also return the @code{GDBM_SNAPSHOT_SUSPICIOUS} status
+code. This happens when the @code{numsync} counters in the two
+snapshots differ by more than one. In this case, the function selects
+the snapshot with the greater @code{numsync} value. If you get this
+status code when recovering from a crash, it is recommended to switch
+to manual recovery procedure, letting the user examine the snapshots
+and take the appropriate action.
+
If any value other than @code{GDBM_SNAPSHOT_OK} is returned, it is
guaranteed that the function don't touch @var{retval}.
@end deftypefn
@@ -2911,6 +3055,11 @@ Delete record with the given @var{key}
Print hash directory.
@end deffn
+@deffn {command verb} downgrade
+Downgrade the database from extended to the standard database format.
+@xref{Numsync}.
+@end deffn
+
@anchor{gdbmtool export}
@deffn {command verb} export @var{file-name} [truncate] [binary|ascii]
Export the database to the flat file @var{file-name}. @xref{Flat files},
@@ -3077,6 +3226,15 @@ Store the @var{data} with @var{key} in the database. If @var{key}
already exists, its data will be replaced.
@end deffn
+@deffn {command verb} sync
+Synchronize the database with the disk storage (@pxref{Sync}).
+@end deffn
+
+@deffn {command verb} upgrade
+Upgrade the database from standard to extended database format.
+@xref{Numsync}.
+@end deffn
+
@deffn {command verb} version
Print the version of @command{gdbm}.
@end deffn
diff --git a/doc/gdbmtool.1 b/doc/gdbmtool.1
index d15b7cd..20c7c27 100644
--- a/doc/gdbmtool.1
+++ b/doc/gdbmtool.1
@@ -13,7 +13,7 @@
.\"
.\" You should have received a copy of the GNU General Public License
.\" along with GDBM. If not, see <http://www.gnu.org/licenses/>. */
-.TH GDBMTOOL 1 "June 27, 2018" "GDBM" "GDBM User Reference"
+.TH GDBMTOOL 1 "July 31, 2021" "GDBM" "GDBM User Reference"
.SH NAME
gdbmtool \- examine and modify a GDBM database
.SH SYNOPSIS
@@ -179,6 +179,10 @@ Delete record with the given \fIKEY\fR.
.BR dir
Print hash directory.
.TP
+.BR downgrade
+Downgrade the database from the extended \fInumsync\fR format to the
+standard format.
+.TP
\fBexport\fR \fIFILE\-NAME\fR [\fBtruncate\fR] [\fBbinary\fR|\fBascii\fR]
Export the database to the flat file \fIFILE\-NAME\fR. This is equivalent to
.BR gdbm_dump (1).
@@ -270,6 +274,13 @@ Print current program status.
Store the \fIDATA\fR with the given \fIKEY\fR in the database. If the
\fIKEY\fR already exists, its data will be replaced.
.TP
+.B sync
+Synchronize the database file with the disk storage.
+.TP
+.B upgrade
+Upgrade the database from the standard to the extended \fInumsync\fR
+format.
+.TP
\fBunset\fR \fIVARIABLE\fR...
Unsets listed variables.
.TP

Return to:

Send suggestions and report system problems to the System administrator.