[RFC] Incremental backup v3: incremental PoC - Mailing list pgsql-hackers
From | Marco Nenciarini |
---|---|
Subject | [RFC] Incremental backup v3: incremental PoC |
Date | |
Msg-id | 543D5AA7.9@2ndquadrant.it Whole thread Raw |
Responses |
Re: [RFC] Incremental backup v3: incremental PoC
Re: [RFC] Incremental backup v3: incremental PoC Re: [RFC] Incremental backup v3: incremental PoC |
List | pgsql-hackers |
Hi Hackers, following the advices gathered on the list I've prepared a third partial patch on the way of implementing incremental pg_basebackup as described here https://wiki.postgresql.org/wiki/Incremental_backup == Changes Compared to the previous version I've made the following changes: * The backup_profile is not optional anymore. Generating it is cheap enough not to bother the user with such a choice. * I've isolated the code which detects the maxLSN of a segment in a separate getMaxLSN function. At the moment it works scanning the whole file, but I'm looking to replace it in the next versions. * I've made possible to request an incremental backup passing a "-I <LSN>" option to pg_basebackup. It is probably too "raw" to remain as is, but it's is useful at this stage to test the code. * I've modified the backup label to report the fact that the backup was taken with the incremental option. The result will be something like: START WAL LOCATION: 0/52000028 (file 000000010000000000000052) CHECKPOINT LOCATION: 0/52000060 INCREMENTAL FROM LOCATION: 0/51000028 BACKUP METHOD: streamed BACKUP FROM: master START TIME: 2014-10-14 16:05:04 CEST LABEL: pg_basebackup base backup == Testing it At this stage you can make an incremental file-level backup using this procedure: pg_basebackup -v -F p -D /tmp/x -x LSN=$(awk '/^START WAL/{print $4}' /tmp/x/backup_profile) pg_basebackup -v -F p -D /tmp/y -I $LSN -x the result will be an incremental backup in /tmp/y based on the full backup on /tmp/x. You can "reintegrate" the incremental backup in the /tmp/z directory with the following little python script, calling it as ./recover.py /tmp/x /tmp/y /tmp/z ---- #!/usr/bin/env python # recover.py import os import shutil import sys if len(sys.argv) != 4: print >> sys.stderr, "usage: %s base incremental destination" sys.exit(1) base=sys.argv[1] incr=sys.argv[2] dest=sys.argv[3] if os.path.exists(dest): print >> sys.stderr, "error: destination must not exist (%s)" % dest sys.exit(1) profile=open(os.path.join(incr, 'backup_profile'), 'r') for line in profile: if line.strip() == 'FILE LIST': break shutil.copytree(incr, dest) for line in profile: tblspc, lsn, sent, date, size, path = line.strip().split('\t') if sent == 't' or lsn=='\\N': continue base_file = os.path.join(base, path) dest_file = os.path.join(dest, path) shutil.copy2(base_file, dest_file) ---- It has obviously to be replaced by a full-fledged user tool, but it is enough to test the concept. == What next I would to replace the getMaxLSN function with a more-or-less persistent structure which contains the maxLSN for each data segment. To make it work I would hook into the ForwardFsyncRequest() function in src/backend/postmaster/checkpointer.c and update an in memory hash every time a block is going to be fsynced. The structure could be persisted on disk at some time (probably on checkpoint). I think a good key for the hash would be a BufferTag with blocknum "rounded" to the start of the segment. I'm here asking for comments and advices on how to implement it in an acceptable way. == Disclaimer The code here is an intermediate step, it does not contain any documentation beside the code comments and will be subject to deep and radical changes. However I believe it can be a base to allow PostgreSQL to have its file-based incremental backup, and a block-based incremental backup after it. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
Attachment
pgsql-hackers by date: