[jump to content][No Software Patents]Tools (new) : dbm [ Home | Scylla+Charybdis old home | Tools (old) | Tools (new) | FAQ | Downloads ]

Moving to GitHub, slowly

The software on this pages will slowly be moved to GitHub https://github.com/hilbix/. The CVS repository will be migrated to GIT as well, so the history will be preserved, a bit. See FAQ.

Scylla and Charybdis, dbm - Tools

The tools are developed under Linux with ESR's paradigm release early, release often in mind.
So you can consider this beta software, or alpha, or pre-alpha, or even worse ;)

Have a look in the download directory for all downloads.
As always here, all you get is the source. No binaries here.

dbm 0.11.0-20080519-114741

A little tool to access gdbm files from shell.


⇒ ⇒ ⇒ The development shifted to another location. All future versions will be published at https://github.com/hilbix/dbm   ⇐ ⇐ ⇐



If you have an Online Virus Scanner under Windows, never use options -t, -q or -a. With these options and a Virus Scanner you will observe strange things to happen. Under Linux (or Unix) there is no such problem.

If a DBM process cannot lock a DBM file, it enters a rapid open/close busy cycle (GDBM is lacking support for blocking locks) until the timeout occurs. This triggers a race condition with Online Virus Protection Solutions for Windows such that it becomes unlikely that DBM can get a successful lock on the DBM file.

Note that DBM works as designed, this is no bug of DBM. However the incompatibility in the Virus Protection cannot be fixed without rendering them nearly useless. So the bug is in the idea of Online Virus Protection. This means: Under Windows you currently cannot have both: DBM processes concurrently accessing a DBM file and an Online Filesystem Antivirus.

If only a single DBM process accesses a DBM file (so you do not need a timeout option at all), then you do not need to worry about.


Note that DBM is deprecated, but I have no time to create a replacement

DBM began as a quick hack just to access gdbm files from the commandline. And it still is evolving as a quick hack.

It now has some batch and filter functions to be used in shell pipes, but it is very little tested for now.

When you compile it, just ignore the error that make cannot cd into "tino". That's ugly, but as it's just a quickhack no tinolib shall be required to compile it, and thus the current version of tinolib is not part of this distribution. Also note that there currently is no tinolib distribution, as it's far from beeing ready. For now it just comes with other of my sources.

Short example what DBM can do for you, a JPG file checker:

./dbm create db.ok
./dbm create db.fail
find / -type f -name '*.jpg' -print |
./dbm filter db.ok |
./dbm filter db.fail |
sort |
while read -r a
do
if djpeg "$a" >/dev/null
then
./dbm insert DB/ok "$a" 1
else
./dbm insert DB/fail "$a" 1
fi
done
The "sort" is a hack to "buffer" the output, such that the database files are closed when the loop runs such that the insert can be writer. You don't need this anymore if you give '-a-1' to each of the dbm commands.

History:

version 0.11.0-20080519-114741

download (30744 bytes) sig

Functions sadd and sdir removed for re-write

The "find" function now returns FALSE in case it does not find anything. This is to more easily differ from the case when it returns a blank result, which, from shell, does look alike.

The "sadd" and "sdir" functions have proven badly designed, so they will be re-invented with another algorithm. This is an intermediate version which removes the old "sadd" and "sdir" function. If you need the old behavior, stick to version 0.10.x

version 0.10.1-20080407-040731

download (31239 bytes) sig

Improved: Option -a Added: Option -n and -u

Option -a still was too slow on outputs. Now output is processed in "batches", too.

Option -u flushes output as soon as meaningful. Together with Option -a this nearly brings back the old behavior (db close on each line).

Option -n suppresses fsync() on the GDBM file. Changes in the DB are factors faster on machines which are heavily loaded with IO. Especially useful on option -a and batch commands. However the DBM is not synced anymore, so a sudden power loss can have a more bad impact then it already has (often I observed that a sudden power loss corrupts the DBM).

version 0.10.0-20070806-101555

download (30382 bytes)

Improved: Option -a Started to add "Sorted keys directory structure"

Option -a was very slow. Now input is processed in "batches", that is the database is closed only between actual read() calls.

Also new experimental "sadd" and "sdir" commands introduced. "sget" and "sfind" are still missing.

version 0.9.2-20070407-210740

download (28935 bytes)

Bugfix: Now exit(11) in case that the output pipe goes away.

When output is futile, DBM shall go away, as further processing is no more interesting (as nothing which outputs data does not alter data).

And to stress it again:

Currently I use DBM in a fairly complex environment with several dozens of scripts running parallely on a handful DBM files concurrently. So I am was able to hunt down some really weird race conditions and pitfalls. I am now pretty sure that I cought the most. But one thing cannot be fixed:

Read the virus scanner warning when using DBM under CygWin. If you have several concurrently running processes under Windows which use DBM on the same file, you will see that they either starve all or a lot of DBM processes start to pile up until your system becomes totally unusable. In this case do following:

  • Disable the online virus scanner.
  • Set the priority of the DBM processes to lowest possible value.
  • Kill the processes, each. After some hours they might go away.
  • Uninstall the online virus scanner component and reboot your PC.

Again, this is no fault of DBM. DBM works correctly as designed. It has no means to detect or circumvent such a case where online virus scanners lock files for some time after they were closed or opened by an application. DBM only(!) tries 10 times a second to get a lock on the DBM file. Note that this rate is *low*, as you need N second timeout for N processes concurrently accessing the file with this low rate to have a high enough chance to get the lock on the DBM file. Now, if a virus scanner is able to scan the file 100 times a second and you have 5 DBM processes concurrently trying to lock the file, this means that you have a 50:50 chance that you get into trouble.

version 0.9.1-20070326-202709

download (26860 bytes)

Bugfix: balter did not work (bug in db_cmp() removed)

version 0.9.0-20070326-181947

(28601 bytes archive)

Bugfix: Storing an empty key via batch/nbatch did not work (OOPS!) Major overhaul: (For some time it's probably more safe to use the 0.8.2!) - import and export commands. - "Advanced timeout" which closes DB while in blocking reads/writes. - Also little change in "bget" command to be more intuitive. - "bdel" now can check for the data value like "delete". CygWin Bugfix: Open may break on EBUSY for some reason. In this case the timeout was not effective. The sleep time for timeouts now is seeded by rand() to distribute the waiting.

Import/Export:

XML like import and export allow easy copy of a DB contents:

dbm create DB2; dbm export DB1 | dbm import DB2
You can see this as a replacement for the missing "reorg" under CygWin.

Option -a for advanced timeout added:

This works like -q, but keeps the DB closed during most operations. It has a higher overhead, as it must open the DB for each operation, though, but it allows a better batch integration, for example:

# Most missing feature in BASH are piped coprocesses
# (open zillions of pipes to zillions of coprocess each)
# You cannot help with <(proc) and >(proc) as this needs /dev/fd
# Idea:
# 3| creates a pipe from FD3 to STDIN
# |5 creates a pipe from STDOUT to FD5
# exec 3| process |4 exec 2|5 exec
# creates a coprocess, listening on FD3, writing to FD4 and FD5

{ # Save STDOUT to FD3; Note the unintuitive IO-redirection below: exec 3>&1

find dir -type f -print | dbm -a-1 filter DB/check | while read -r a do if check "$a" then echo "$a" >&4 else echo "$a" >&5 fi # Trick: Bring back STDOUT from FD3 done 4>&1 >&3 |

# Pipe to "ok" was saved to FD4 dbm -a-1 nbatch DB/check ok 5>&1 >&3 |

# PIPE to "ko" was saved to FD5 dbm -a-1 nbatch DB/check ko }

Alternatively with files with filenames without TABs:

find dir -type f -print |
dbm -a-1 filter DB/check |

# Read files from STDIN and output # FILENAME TAB ok|ko check |

# The character in '' is a TAB dbm -a-1 brep DB/check ' '

The "bget" change is as follows:

Previously following did not print anything:

dbm list DB | dbm bget DB
As "list" did not print out the lat LF the correct script was:
{ dbm list DB; echo; } | dbm bget DB
Now both work identically. However if the last line is empty it must terminate on an LF else "bget" ignores it.

Note that the old behavior is intended and still is present on bget0 (so a line not terminated by 0 is considered invalid).

Important Windows bug with online virus scanners

The virus scanners can prohibit dbm to open the file. In such cases there is no other way than to switch off the virus scanner.

It seems to happen when the virus scanner is too slow to keep up the pace with DBM, while DBM processes try to open the DB concurrently. In this case as the online protection does a first virus scan on the file and keeps it open such that it cannot be locked by another DBM process in the meanwhile. These other process sleeps and reopens the file later. However for this open the virus scanner starts a second(!) parallel virus scan on the file (it must do so before the open, else it would not be able to protect). But when the sleep time is not long enough for the first virus scan to finish, we now have a deadlock, as no DBM can successfully lock the DBM file ever: It simply always is open by one or more processes scanning the file against viri. This is even more complicated if the virus scanner runs under the thread priority of the process, as then slow processes can lead to hours (literally!) of unusable DBMs.

Thats not a bug of DBM. It's a fundamental design flaw of online virus scanners under Windows. So the only advice I can give is: Under windows never even try to open DBM files concurrently while an online virus scanner is active. Period.

Side-Advice: If you do not have a virus but some computer problem, then try to switch off the online virus scanner first. Often the problem vanishes. For example, switching off my virus scanner improves the "find.exe" from CygWin by factor 1000 or even much more. (This is because the Virus scanner *must* scan all files find.exe touches. Again this is a fundamental flaw of online virus scanner technology under Windows.)

version 0.8.2-20070113-020406

download (21724 bytes)

Small improvement in error handling

The database now is closed before something is printed. This helps in the "stopped terminal" situation.

version 0.8.1-20060812-142632

download (21554 bytes)

Option -q for quiet timeouts added

It is like option -t but inhibits the sleep warning.

version 0.8.0-20060812-004531

download (21564 bytes)

Batch version of "alter" commands balter and balter0 added

There was a bug in the alter command in case the third argument was not present. This shall be fixed now.

The code now becomes more and more hacked. This is not good. So it shall be rewritten from scratch with a more clean structure. However I do not have the time to do a clean rewrite (sigh).

version 0.7.0-20060722-025556

(20554 bytes archive)

Memory leaks fixed, "bget"/"bget0" added and "list" improved

There was a memory leak as the allocated pointers were not freed. Most commands did not have any problem, but all list commands which read data from the database were affected:

filter, find, search, list, dump

Also there is a new command "bget" for a batch type get:

find . -type f -print |
dbm bget DB/files '
' '
' |
while read -r file && read -r data
do
: whatever
done

Beware of filenames with LF in it. Find knows a "-print0" so dbm knows a "bget0", too.

Also the "list" command now is able to output NUL or other strings as line terminator.

version 0.6.1-20060715-165210

(19731 bytes archive)

"alter" command should work now

version 0.6.0-20060611-080944

(19853 bytes archive)

New feature nearly untested as always.

Better return value and messaging added.

Return value now is

  • 0=ok
  • 1=key missing or database empty
  • 2=key exists or cannot store
  • 10=general other error
  • 255=timeout

The main reason for this change is "dbm delete" where now I can distinguish if the key was deleted (0) or the key was missing (1) or the key exists but has the wrong data (2), there was a retryable timeout (-1=255).

There now is only one case where the database does not return true but print2 no error to stderr: dbm get where the key is missing. This is you can still `dbm get DB something` without stderr clutter, and it will stay this way.

version 0.5.0-20060606-224940

(19891 bytes archive)

All features nearly untested as always.

- Alter command added. - Timeout and delayed database open added - delete command now can check against data in key - nbatch and nbatch0 added

You can now delete keys if the data is matching. So if the data is not matching, no delete happens.

Also "alter" command added. It works like "update" but it does not update if there is no match of th old data.

Example:

./dbm create test.db
./dbm insert test.db a b
./dbm alter  test.db a b c
./dbm alter  test.db a b c || echo "must fail now"
./dbm delete test.db a b || echo "must fail"
./dbm delete test.db a c
rm test.db

Also timeout and delayed open added. This allows more easy pipes:

find . |
./dbm filter DB |
sort |
./dbm nbatch DB add
The trick is, that the sort eats all data until the first dbm is finished. The second one delays the open until the first data arrives.

The new commands nbatch and nbatch0 repairs "the empty key" phenomenon in case you use dbm in pipe situations. However it ignores the line if the line terminator is missing. This is helpful in situations where you kill some process, so lines which are not complete are not processed at all. However, in situations where the last line terminator might be missing you must use the command "batch" and "batch0".

version 0.4.3-20060604-153317

download (18426 bytes)

Bugfix for filter.

Previously on filter 000 a possible data match was done but then ignored (always fail). So the matching code introduced since 0.4 was plainly wrong, sorry! That shall be fixed now. As this is really a bug I removed download of 0.4.x before this version.

version 0.4.2-20060412-012656

(17690 bytes archive)

Bugfix for dump and additional commands nfind and nsearch

Also filter has been changed, such that 0xx works as OR (while 1xx works as AND)

version 0.4.1.1-deprecated

This tool will be replaced by tinodb soon.

version 0.4.1-20041214-012107

(17479 bytes archive)

Cosmetic changes, still nearly untested.

version 0.4.0-20041214-005812

(17434 bytes archive)

Added "filter".

Nearly untested.

version 0.3.0-20041119-062214

download (16366 bytes)

Added "update" and the key/data batch variants.

Nearly untested.

version 0.2.0-20040905-004429

download (15337 bytes)

Added "find" and "search" option (slow).

Now compiles under CygWin, too.

version 0.1.1-20040723-212810

download (11461 bytes)

Batch adding of keys and diagnostic dump added.

GPL preamble added to source to make it more clear. Additional more minor updates.

version 0.1.0-20040723-204854

(11210 bytes archive)

Batch adding of keys and diagnostic dump added.

version 0.0.1-20040721-221619

download (10046 bytes)

First version.

It works.

License and Disclaimer

All you can see here is free software according to the GNU GPL.
Copyright (C)2000-2011 by Valentin Hilbig
Note that the software comes with absolutely no warranty of any kind.
You use the software at your own risk.
Valentin Hilbig cannot be hold responsible for any unintended damage,
lost data or malfunction of the software you can find here.

[FSFE contributor 2007]

[end of page - jump back to content][hacker culture]
Last modified: 2011-09-12 by Valentin Hilbig [ Imprint / Impressum ]