Age | Commit message (Collapse) | Author | Files |
|
|
|
The implementation of _gdbm_cache_flush becomes prohibitively
inefficient during extensive updates of large databases. The
bug was reported at https://github.com/Perl/perl5/issues/19306.
To fix it, make sure that all changed cache entries are placed at
the head of the cache_mru list, forming a contiguous sequence.
This way a potentially long iteration over all cache entries can be
cut off at the first entry with ca_changed == FALSE.
This commit also gets rid of several superfluous fields in
struct gdbm_file_info:
- cache_entry
Not needed, because the most recently used cache entry
(cache_mru) is always the current one.
- bucket_changed
dbf->cache_mru->ca_changed reflects the status of the current
bucket.
- second_changed
Not needed because _gdbm_cache_flush, which flushes all changed
buckets, is now invoked unconditionally by _gdbm_end_update (and
also whenever dbf->cache_mru changes).
* src/gdbmdefs.h (struct gdbm_file_info): Remove cache_entry. The
current cache entry is cache_mru.
Remove bucket_changed, and second_changed.
All uses changed.
* src/proto.h (_gdbm_current_bucket_changed): New inline function.
* src/bucket.c (_gdbm_cache_flush): Assume all changed elements form
a contiguous sequence beginning with dbf->cache_mru.
(set_cache_entry): Remove. All callers changed.
(lru_link_elem,lru_unlink_elem): Update dbf->bucket as necessary.
(cache_lookup): If the obtained bucket is not changed and is going
to become current, flush all changed cache elements.
* src/update.c (_gdbm_end_update): Call _gdbm_cache_flush unconditionally.
* src/findkey.c: Use dbf->cache_mru instead of the removed dbf->cache_entry.
* src/gdbmseq.c: Likewise.
* tools/gdbmshell.c (_gdbm_print_bucket_cache): Likewise.
* src/falloc.c: Use _gdbm_current_bucket_changed to mark the current
bucket as changed.
* src/gdbmstore.c: Likewise.
* src/gdbmdelete.c: Likewise. Use _gdbm_current_bucket_changed.
* tests/gtcacheopt.c: Fix typo.
* tests/gtload.c: New option: -cachesize
|
|
|
|
* src/cachetree.c: Remove.
* src/Makefile.am: Remove cachetree.c
* doc/gdbm.texi: Document the changes.
* src/bucket.c (cache_tab_lookup_slot)
(cache_tab_resize): New function.
(cache_elem_new): Initialize ca_coll.
(cache_elem_free, cache_lookup)
(_gdbm_cache_init,_gdbm_cache_free): Rewrite with hash-based cache lookup.
(_gdbm_fetch_data): Remove unused function.
* src/gdbm.h.in (GDBM_GETDBFORMAT, GDBM_GETDIRDEPTH)
(GDBM_GETBUCKETSIZE, GDBM_GETCACHEAUTO, GDBM_SETCACHEAUTO): New option codes.
* src/gdbmdefs.h (cache_node): Remove.
(cache_elem): Remove ca_node. Add ca_coll (collision resolution pointer).
(gdbm_file_info): New members: cache_auto, cache_bits, cache.
* src/gdbmopen.c (gdbm_fd_open): Change cache initialization.
* src/gdbmsetopt.c (GDBM_GETDBFORMAT,GDBM_GETDIRDEPTH)
(GDBM_GETBUCKETSIZE,GDBM_GETCACHEAUTO)
(GDBM_SETCACHEAUTO): Implement new options.
(setopt_gdbm_getflags): Reflect the state of GDBM_CLOEXEC and GDBM_NUMSYNC.
* src/proto.h (_gdbm_fetch_data,_gdbm_cache_tree_alloc)
(_gdbm_cache_tree_destroy,_gdbm_cache_tree_delete)
(_gdbm_cache_tree_lookup): Remove protos.
* src/recover.c (_gdbm_finish_transfer): Restore original cache settings.
* tests/Makefile.am: Add new test.
* tests/testsuite.at: Likewise.
* tests/gtcacheopt.c: New file.
* tests/setopt02.at: New test case.
|
|
* src/recover.c (_gdbm_finish_transfer): Remove call to _gdbmsync_done.
* doc/gdbm.texi: Reflect the changes.
|
|
* src/recover.c (_gdbm_finish_transfer): Reuse memory mapping
from the intermediate dbm structure.
|
|
* src/gdbmdefs.h (SAVE_ERRNO): Preserve both gdbm_errno and errno.
* src/recover.c (_gdbm_finish_transfer): Transfer all cache fields
(cache_mru was missing).
|
|
* src/recover.c (_gdbm_finish_transfer): Close snapshot descriptors,
if any.
Restore xheader, avail, and avail_size members.
|
|
These address https://puszcza.gnu.org.ua/bugs/?503
* src/gdbmdefs.h (gdbm_avail_block_valid_p): Remove.
* src/gdbmopen.c (gdbm_avail_block_validate): Use inline conditional
instead of gdbm_avail_block_valid_p.
(gdbm_fd_open): Revert to reading master avail_block in two passes (as
was before fd5cf245ea).
(validate_header): Add back master avail block consistency check.
* src/gdbmtool.c (_gdbm_avail_list_size): Use _gdbm_avail_block_read.
* src/recover.c (_gdbm_finish_transfer): Reset dbf->file_size.
|
|
* src/recover.c (_gdbm_finish_transfer): Free the cache.
|
|
|
|
|
|
The new bucket cache uses the least recently used replacement
policy (instead of the least recently read, implemented previously).
It also allows for quick bucket lookups by the corresponding
disk address. To this effect the cache entries form a red-black
tree sorted by bucket address.
Additionally, data buckets are also cached.
* README: Describe the new branch.
* src/bucket.c: Rewrite cache support.
* src/cachetree.c: New file.
* src/Makefile.am: Add new file.
* src/findkey.c (_gdbm_read_entry): Use _gdbm_fetch_data.
This ensures data pages are cached as well as buckets.
* src/gdbm.h.in (GDBM_BUCKET_CACHE_CORRUPTED): New error code.
(gdbm_cache_stat): New struct.
(gdbm_get_cache_stats): New proto.
* src/gdbmclose.c (gdbm_close): Call _gdbm_cache_free to dispose
of the cache.
* src/gdbmdefs.h (cache_elem_color): New data type.
(cache_elem): New members: ca_left, ca_right, ca_node, and
ca_hits.
(cache_tree): New typedef.
(gdbm_file_info): Remove bucket_cache and last_read.
New fields: cache_num, cache_tree, cache_mru, cache_lru,
cache_avail, cache_access_count.
* src/gdbmerrno.c: Handle GDBM_BUCKET_CACHE_CORRUPTED.
* src/gdbmopen.c (gdbm_fd_open): Change cache initialization.
(_gdbm_init_cache, _gdbm_cache_entry_invalidate: Remove.
* src/gdbmsetopt.c (setopt_gdbm_setcachesize): Cache can be
re-initialized on the fly.
* src/gdbmtool.c: Change bucket printing routines.
* src/proto.h (_gdbm_read_bucket_at): Remove.
(_gdbm_fetch_data,_gdbm_cache_init,_gdbm_cache_free)
(_gdbm_cache_flush,_gdbm_cache_elem_new)
(_gdbm_cache_tree_alloc,_gdbm_cache_tree_destroy)
(_gdbm_cache_tree_delete,_gdbm_rbt_remove_node)
(_gdbm_cache_tree_lookup): New protos.
(_gdbm_init_cache,_gdbm_cache_entry_invalidate): Remove.
* src/recover.c (_gdbm_finish_transfer): Adapt to the new
cache structure.
* src/update.c: Likewise.
* tests/setopt00.at: Fix second GDBM_SETCACHESIZE test.
|
|
|
|
* src/recover.c (_gdbm_finish_transfer): Preserve locking type.
|
|
* src/gdbmopen.c (validate_header): Return GDBM_NEED_RECOVERY
if next_block is invalid.
(_gdbm_validate_header): New function.
(gdbm_fd_open): Set need_recovery depending on return from validate_header.
(gdbm_open): Bail out on invalid value of GDBM_OPENMASK bits.
* src/proto.h (_gdbm_validate_header): New proto.
* src/recover.c (check_db): Re-validate the header.
* src/gdbmtool.c (export_handler): Fix option processing.
|
|
* src/recover.c (backup_name): Fix memory overwrite.
* src/gdbmtool.c (recover_handler): New option "force".
|
|
Rename: __read to gdbm_file_read
__write to gdbm_file_write
__lseek to gdbm_file_seek
__fsync to gdbm_file_sync
|
|
|
|
* NEWS: Update.
* THANKS: Update.
* src/bucket.c (_gdbm_get_bucket): Check if directory entry is
valid. Don't cache invalid buckets.
* src/gdbm.h.in (GDBM_BAD_DIR_ENTRY): New error code.
* src/gdbmerrno.c: Likewise.
* src/gdbmopen.c (validate_header): Compute expected
number of bucket elements based on the bucket size, not on
the block size.
(_gdbm_init_cache_entry): New function.
* src/proto.h (_gdbm_init_cache_entry): New proto.
* src/recover.c (gdbm_recover): Clear error state after return
from check_db indicating failure.
|
|
* Makefile.am (set-dist-date): New rule
(dist-hook): Catch FIXMEs in NEWS.
* NEWS: Updated.
* src/findkey.c (gdbm_bucket_element_valid_p): New function.
(_gdbm_read_entry): Validate the retrieved bucket element.
* src/gdbm.h.in (gdbm_recovery): New member: duplicate_keys.
(GDBM_BAD_HASH_TABLE): New error code.
* src/gdbmdefs.h (TYPE_WIDTH,SIGNED_TYPE_MAXIMUM)
(OFF_T_MAX): New defines.
(off_t_sum_ok): New function.
(gdbm_bucket_element_valid_p): New prototype.
* src/gdbmerrno.c: Support for GDBM_BAD_HASH_TABLE code.
* src/gdbmtool.c (recover_handler): Fix argument counting.
New argument 'summary' prints statistics summary at the end
of the run.
(export_handler,import_handler): Fix argument counting.
* src/mmap.c (SUM_FILE_SIZE): Rewrite as inlined function.
Add error checking.
(_gdbm_mapped_remap): More error checking.
* src/recover.c (run_recovery): Don't bail out on GDBM_CANNOT_REPLACE.
(gdbm_recover): Initialize duplicate_keys
* src/systems.h: Include limits.h
|
|
|
|
|
|
Use the GDBM_SET_ERRNO and GDBM_SET_ERRNO2 macros to make
sure the error gets reported in debug output.
* src/fullio.c (_gdbm_full_read)
(_gdbm_full_write): Return -1 and set gdbm_errno
on error.
* src/bucket.c: Use GDBM_SET_ERRNO(2?) or
GDBM_DEBUG where necessary.
* src/falloc.c: Likewise.
* src/findkey.c: Likewise.
* src/gdbmdefs.h: Likewise.
* src/gdbmopen.c: Likewise.
* src/gdbmstore.c: Likewise.
* src/mmap.c: Likewise.
* src/recover.c: Likewise.
* src/update.c: Likewise.
|
|
* configure.ac: New option --enable-debug
Print feature summary at the end of the run.
* src/debug.c: New file.
* src/Makefile.am [GDBM_COND_DEBUG_ENABLE]: Build debug.o
Define GDBM_DEBUG_ENABLE.
* src/gdbmdefs.h [GDBM_DEBUG_ENABLE] (_gdbm_debug_hook_install)
(_gdbm_debug_hook_remove,_gdbm_debug_hook_check)
(_gdbm_debug_hook_val): New protos.
(GDBM_DEBUG_HOOK, GDBM_DEBUG_OVERRIDE)
(GDBM_DEBUG_ALLOC): New defines.
* src/gdbm.h.in (GDBM_RCVR_FORCE): New flag.
* src/recover.c (gdbm_recover): Check database before attempting
recovery, unless GDBM_RCVR_FORCE flag is set.
* doc/gdbm.texi: Document GDBM_RCVR_FORCE
* src/gdbmreorg.c (gdbm_reorganize): Use GDBM_RCVR_FORCE.
* src/gdbmtool.c (main): Always allocate file_name.
* src/bucket.c: Put GDBM_DEBUG_OVERRIDE and GDBM_DEBUG_ALLOC
in critical places.
* src/falloc.c: Likewise.
* src/findkey.c: Likewise.
* src/gdbmopen.c: Likewise.
* src/gdbmstore.c: Likewise.
* src/update.c: Likewise.
* tests/Makefile.am [GDBM_COND_DEBUG_ENABLE]: Define GDBM_DEBUG_ENABLE.
* tests/gtload.c: New options -hook, -recover, -verbose,
-backup, -max-failures, -max-failed-keys,
and -max-failed-buckets.
Attempt recovery after errors.
|
|
* configure.ac: Don't check for rename.
* src/Makefile.am (libgdbm_la_SOURCES): Add recover.c
* src/recover.c: New file.
* src/bucket.c (_gdbm_get_bucket): Remove extra space before [
* src/err.c (prerror): Take additional argument
(gdbm_perror): Print system errno if necessary.
* src/gdbm.h.in (GDBM_CLOERROR): New flag.
(gdbm_fd_open, gdbm_copy_meta): New proto.
(gdbm_last_syserr,gdbm_db_strerror,gdbm_recover): New proto.
(gdbm_syserr): New extern.
(gdbm_recovery): New struct.
(GDBM_RCVR_DEFAULT,GDBM_RCVR_ERRFUN)
(GDBM_RCVR_MAX_FAILED_KEYS)
(GDBM_RCVR_MAX_FAILED_BUCKETS)
(GDBM_RCVR_MAX_FAILURES)
(GDBM_RCVR_BACKUP): New flags.
(GDBM_BACKUP_FAILED): New error code.
* src/gdbmclose.c (gdbm_close): Work correctly if dbf->desc == -1.
* src/gdbmcount.c (gdbm_count): Remove spurious sorting.
Use _gdbm_next_bucket_dir for iterating over the buckets.
* src/gdbmdefs.h (struct gdbm_file_info)<last_syserror>
<last_errstr>: New members.
* src/gdbmerrno.c (gdbm_set_errno): Set last_syserror as well.
(gdbm_clear_error): Reset last_syserror.
(gdbm_last_syserr): New function.
(gdbm_errlist): New entry for GDBM_BACKUP_FAILED.
(gdbm_db_strerror): New function.
(gdbm_syserr): New global.
* src/gdbmload.c (get_parms): Buffer can be NULL.
* src/gdbmopen.c (gdbm_fd_open): New function.
(gdbm_open): Rewrite as a wrapper over gdbm_fd_open.
* src/gdbmreorg.c (gdbm_reorganize): Rewrite as a wrapper
over gdbm_recover.
* src/proto.h (_gdbm_next_bucket_dir): New proto.
* src/gdbmtool.c: New command: recover.
* tests/.gitignore: Add gtrecover
* tests/gtrecover.c: New test program.
* tests/Makefile.am: Build gtrecover
|