-*-text-*- $Header: /CVSROOT/public/scylla-charybdis/md5backup/README,v 1.25 2006/10/03 21:16:33 tino Exp $ Too tired to read docs? Just run bin/dobackup.sh For migration from 0.3.x please read MIGRATION. FROM VERSION 0.4.x ON YOU WILL NEED SQLITE 3.2.1 or higher TO COMPILE! Read doc/sqlite.txt on how to prepare sqlite for this. WARNING! THE NEW INTERMEDIATE archive format of 0.4.0 WILL BE HIGHLY EXPERMIMENTAL. EXPECT MAJOR INTERNAL FAILURE CONDITIONS (nothing shall break badly, but it may even fail silently). IMPORTANT! There is an experimental restore script named bin/md5restore.sh which is not capable of restoring metadata. There also is a bin/compare.sh to compare the data which can be transformed to another type of restore. For both you need to be able to access the backup archive, which probably means, you must be root. PLEASE HAVE A LOOK INTO DIRECTORY doc/ FOR MORE INFORMATOIN. This README will be reorganized when I find the time. Changes ======= (see also doc/historic.txt) 200x-xx-xx 0.4.0 sqlite and see doc/archive_format.txt 2006-10-03 0.3.15 Some new scripts (md5restore.sh, sc-setup.sh) 2005-03-06 0.3.14 multiple file store added, internal restructuring 2005-02-20 0.3.13 wildcard matching for ignores added 2004-10-05 0.3.12 bin/compare.sh, minor improvements 2004-09-30 0.3.11 sc-loop.sh corrected 2004-07-06 0.3.7 first freshmeat release, see ChangeLog 2004-05-04 0.3.1 bin/sc-move.sh and documentation restructuring 2004-05-01 0.3.0 see doc/linked_file_store.txt 2004-01-17 0.2.0 added: Excludes (based on autoignore) 2004-01-11 0.1.0 added: Autoignore & Stay in same filesystem 2003-12-24 0.0.1 Initial version md5backup ========= This is a quick hack because I needed it and I had no time to finish some other tools to get it done in time. V0.0.1 was programmed in roughly some free hours just to do what it does. It works as expected now. Expected as I expect it to work. The major disadvantages: - It's considered ALPHA code. It's released EARLY. It has not been tested much. - There is no restore for now (as always with my backuppers, I don't find the time to restore because machines go out of business faster (one per year is too fast for me, ok?) than I can replace them currently, so I only move data from one point to the next until this crashes, too.) - Once my machine crashed while writing the database. The database (dbm file) then went corrupt and gdbm-lib went fatal! There should be an option to recreate the database somehow, but currently this is completely missing. In this case just remove the database. This will run something like a full backup, but luckily no new files need to be written. - There are no sanity checks. Instead, because filenames are MD5, you can do a check.sh to find out which files backed up became inconsistent, too. (Sometimes even the most expensive hard drives just behave too odd.) In this case just remove the database, too. - All scripts currently assume the backup directories are /backup/md5backup (for md5backup) and /backup/charybis (for the bin/sc-*.sh scripts) - Networking is not builtin and needs Scylla-Charybdis, which is not yet capable to copy files 4GB or bigger. - Sparse file handling is not implemented, the algorithm does not handle databases well anyway. - Frequently changing and growing files are not yet supported very well. Look into ign/ for default ignore lists. The major advantages: - It's open source according to the GNU GPL v2 or higher. - If you run this on a machine locally from one harddrive to a special backup harddrive, this is - compared to my other tools - incredible fast even without improvements. - You are able to do forensic after the weirdest things you can imagine, as the backup file names contain their md5 sum! - It supports additional networked backup via sc-backup.sh Notes: - It was not tested much. I run it as follows: 1) From a local hard drive to an external hard drive. 2) From the local hard drive to the local hard drive and sc-backup.sh to a networked backup server. - It is designed to be able to run over NFS, however this feature is not tested! It's much slower then, because the database is on NFS, too. So make the dbm a softlink to /usr/local to improve the speed! - To do a full backup with a full compare, just delete the database OR take an unknown fresh and new named database (second argument). - Since 0.1.0 md5backup tries to ignore all the backup objects automatically. However, as files with identical content are never stored more than once, no harm is done if backup objects come into the way (except for the database, which is changing always). - You *must*not* try to backup /proc and similar stuff. - Since 0.1.0 md5backup does not cross mount borders any more, so you can backup the root filesystem (/) safely. - Since 0.2.0 md5backup supports target excludes. You should always exclude things like databases (-/var/lib/mysql). - Since 0.3.0 md5backup supports to hardlink new files in out/ to a secondary directory (=ln1). Only hardlinks are supported, if you try to have the directory on another filesystem than where out/ is, it will not work (it will not detect this case either yet). Note that this is needed for seamlessly networked backup, perhaps look into bin/sc-backup.sh or doc/sc-backup.txt - Since 0.3.13 ignores which are listed in files can contain wildcards if the line starts with a ? - see tino/memwild.h for explanation of wildcards etc. - Since 0.3.14 multiple file stores are supported. This support is incomplete, as multiple file stores must be manually switched, and strange things might happen in case the linked file store is not online. So disconnected file store is not supported. Multiple file store is very easy for now (it might change in future): Create directories or softlinks to directories, named outN where N is a number, starting with 0. Move the data you want to get rid of to these other directories. md5backup will read/search files in these directories, but it will never write into them, so you can mount these directories readonly. In case you have mounted a drive onto out/, and the drive becomes full, you can mount it on out0/ and mount another, empty, drive onto out/. You also can move the data from out/ to out0/ manually if you like. However be careful not to change the directory structure. Rationale ========= see doc/rationale.txt Example: ======== bin/dobackup.sh (Just call it) bin/sc-backup.sh (read doc/sc-backup.txt) TARG=/backup mkdir "$TARG" mount 192.168.71.1:/backup "$TARG" allfs="`df -P | awk -vTARG="$TARG" '/^\// && $6!=TARG { print $6 }'`" md5backup "$TARG"/ `hostname -f` $allfs If hostname -f is my.host.test then you will see in backup following: out/ where the files are stored after their md5 sum outN/ where N is a number starting with 0. Softlinks to directories are supported. Additional readonly file stores to search for already backed up files. tmp/ used for temporary files (should be empty afterwards) dbm/my.host.test 0.3.x and below: the database file for my.host.test 0.4.x: the database to migrate from 0.5.x and above: no more used sql/my.host.test 0.4.x and above: The sqlite database file for my.host.test log/my.host.test the activity log for my.host.test md5/my.host.test a file suitable for md5sum --check however it is appended, so missing files might show up. Note that following is not built into md5backup, but it's built into the script bin/backup.sh: ign/ Directoy with files which list additional ignore targets line by line. You can also enter softlinks which point to the target to ignore. The contents of these directories are added to the command line as "-name" lnX/ where X is a number. Optional directories which will get hardlinks of new files which are newly created in out/. Can be deleted, such that you can find out what's newly backupped. See doc/linked_file_store.txt for details. These directory names are given as command line arguments "=name" Notes: ====== - Be sure to add / to the first parameter if it's a directory! - YOU HAVE TO MAKE THE DIRECTORIES IN THE BACKUP DIRECTORY YOURSELF! cd "first parameter"; mkdir out tmp dbm log md5 (bin/dobackup.sh does this now for you) - The paths are exactly as you wrote them. So if you mix absolute and relative file paths, *you* will get in trouble, not md5backup ;) - Never run two instances of md5backup in parallel on the same md5backup database (the one below dbm/). - The md5backup database contains follwing: Key is the file name Content: The last modification timestamp of a the file A count The MD5 SUM of the file - Read the source as documentation. Internals: ========== see doc/internals.txt Install: ======== Just type make You need gdbm and probably openssl to get it compiled. Was tested under RedHat 9 and SuSE 9. To run it like I do call bin/dobackup.sh In case you wonder: I hacked odysseus (see S&C) to become this here. Therefore all this funny "sc" references. 0.3.0 now uses tinolib (see subdirectory tino/), thus most of the sc-references vanished. Restore: ======== WARNING! THERE IS NO REAL RESTORE FOR TODAY! see doc/restore.txt Output: ======= See doc/output.txt DISCLAIMER: =========== USE AT YOUR OWN RISK! I CANNOT ACCEPT ANY LIABLILITY FOR ANYTHING! THIS *IS* RELEASE EARLY CODE! IT IS CONSIDERED TO BE INSTABLE! DON'T TRUST SOURCES! READ THEM! READ THEM AGAIN! AND BE SURE THERE ARE NO KNOWN BACKDOORS IN THEM! IT'S OPEN SOURCE, SO YOU CAN CHECK! I tried my best. However I am human. So I make mistakes. Be prepared. All I guarantee is, that I never do anything to harm *you* by purpose. Copyright (C)2003,2004 by Valentin Hilbig md5backup may be distributed freely under the conditions of the GPL2 (GNU General Public License version 2) or higher. Note that this is an intermediate utility until my new full featured backup suit starts to work. As always you will see it under http://www.scylla-charybdis.com/ and freshmeat .. sometimes. -Tino webmaster@scylla-charybdis.com $Log: README,v $ Revision 1.25 2006/10/03 21:16:33 tino See Changelog, commit for dist Revision 1.24 2005/07/21 19:03:55 tino changed to reflect next release Revision 1.23 2005/03/06 00:39:37 tino Information corrected according to version 0.3.14 Revision 1.22 2005/03/04 00:47:01 tino preparing new distribution Revision 1.21 2005/03/02 23:32:11 tino first version for multi filestore Revision 1.20 2005/02/20 15:43:09 tino commit for release Revision 1.19 2004/10/05 03:02:13 tino "nice", security lack fixed, new sparse files handling, bin/compare.sh Details see ChangeLog Revision 1.18 2004/09/29 00:02:57 tino prepared new distribution Revision 1.17 2004/08/22 05:58:02 Administrator Bug removed: CMP falsely returned true in case one file was truncated. Revision 1.16 2004/07/05 23:53:52 tino va_copy not defined in all systems Revision 1.15 2004/07/05 17:18:29 tino working version Revision 1.14 2004/06/18 23:51:16 tino see ChangeLog Revision 1.13 to beginning: cleaned up, log got too long