trying again to get incremental backup - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | trying again to get incremental backup |
Date | |
Msg-id | CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com Whole thread Raw |
Responses |
Re: trying again to get incremental backup
Re: trying again to get incremental backup Re: trying again to get incremental backup |
List | pgsql-hackers |
A few years ago, I sketched out a design for incremental backup, but no patch for incremental backup ever got committed. Instead, the whole thing evolved into a project to add backup manifests, which are nice, but not as nice as incremental backup would be. So I've decided to have another go at incremental backup itself. Attached are some WIP patches. Let me summarize the design and some open questions and problems with it that I've discovered. I welcome problem reports and test results from others, as well. The basic design of this patch set is pretty simple, and there are three main parts. First, there's a new background process called the walsummarizer which runs all the time. It reads the WAL and generates WAL summary files. WAL summary files are extremely small compared to the original WAL and contain only the minimal amount of information that we need in order to determine which parts of the database need to be backed up. They tell us about files getting created, destroyed, or truncated, and they tell us about modified blocks. Naturally, we don't find out about blocks that were modified without any write-ahead log record, e.g. hint bit updates, but those are of necessity not critical for correctness, so it's OK. Second, pg_basebackup has a mode where it can take an incremental backup. You must supply a backup manifest from a previous full backup. We read the WAL summary files that have been generated between the start of the previous backup and the start of this one, and use that to figure out which relation files have changed and how much. Non-relation files are sent normally, just as they would be in a full backup. Relation files can either be sent in full or be replaced by an incremental file, which contains a subset of the blocks in the file plus a bit of information to handle truncations properly. Third, there's now a pg_combinebackup utility which takes a full backup and one or more incremental backups, performs a bunch of sanity checks, and if everything works out, writes out a new, synthetic full backup, aka a data directory. Simple usage example: pg_basebackup -cfast -Dx pg_basebackup -cfast -Dy --incremental x/backup_manifest pg_combinebackup x y -o z The part of all this with which I'm least happy is the WAL summarization engine. Actually, the core process of summarizing the WAL seems totally fine, and the file format is very compact thanks to some nice ideas from my colleague Dilip Kumar. Someone may of course wish to argue that the information should be represented in some other file format instead, and that can be done if it's really needed, but I don't see a lot of value in tinkering with it, either. Where I do think there's a problem is deciding how much WAL ought to be summarized in one WAL summary file. Summary files cover a certain range of WAL records - they have names like $TLI${START_LSN}${END_LSN}.summary. It's not too hard to figure out where a file should start - generally, it's wherever the previous file ended, possibly on a new timeline, but figuring out where the summary should end is trickier. You always have the option to either read another WAL record and fold it into the current summary, or end the current summary where you are, write out the file, and begin a new one. So how do you decide what to do? I originally had the idea of summarizing a certain number of MB of WAL per WAL summary file, and so I added a GUC wal_summarize_mb for that purpose. But then I realized that actually, you really want WAL summary file boundaries to line up with possible redo points, because when you do an incremental backup, you need a summary that stretches from the redo point of the checkpoint written at the start of the prior backup to the redo point of the checkpoint written at the start of the current backup. The block modifications that happen in that range of WAL records are the ones that need to be included in the incremental. Unfortunately, there's no indication in the WAL itself that you've reached a redo point, but I wrote code that tries to notice when we've reached the redo point stored in shared memory and stops the summary there. But I eventually realized that's not good enough either, because if summarization zooms past the redo point before noticing the updated redo point in shared memory, then the backup sat around waiting for the next summary file to be generated so it had enough summaries to proceed with the backup, while the summarizer was in no hurry to finish up the current file and just sat there waiting for more WAL to be generated. Eventually the incremental backup would just time out. I tried to fix that by making it so that if somebody's waiting for a summary file to be generated, they can let the summarizer know about that and it can write a summary file ending at the LSN up to which it has read and then begin a new file from there. That seems to fix the hangs, but now I've got three overlapping, interconnected systems for deciding where to end the current summary file, and maybe that's OK, but I have a feeling there might be a better way. Dilip had an interesting potential solution to this problem, which was to always emit a special WAL record at the redo pointer. That is, when we fix the redo pointer for the checkpoint record we're about to write, also insert a WAL record there. That way, when the summarizer reaches that sentinel record, it knows it should stop the summary just before. I'm not sure whether this approach is viable, especially from a performance and concurrency perspective, and I'm not sure whether people here would like it, but it does seem like it would make things a whole lot simpler for this patch set. Another thing that I'm not too sure about is: what happens if we find a relation file on disk that doesn't appear in the backup_manifest for the previous backup and isn't mentioned in the WAL summaries either? The fact that said file isn't mentioned in the WAL summaries seems like it ought to mean that the file is unchanged, in which case perhaps this ought to be an error condition. But I'm not too sure about that treatment. I have a feeling that there might be some subtle problems here, especially if databases or tablespaces get dropped and then new ones get created that happen to have the same OIDs. And what about wal_level=minimal? I'm not at a point where I can say I've gone through and plugged up these kinds of corner-case holes tightly yet, and I'm worried that there may be still other scenarios of which I haven't even thought. Happy to hear your ideas about what the problem cases are or how any of the problems should be solved. A related design question is whether we should really be sending the whole backup manifest to the server at all. If it turns out that we don't really need anything except for the LSN of the previous backup, we could send that one piece of information instead of everything. On the other hand, if we need the list of files from the previous backup, then sending the whole manifest makes sense. Another big and rather obvious problem with the patch set is that it doesn't currently have any automated test cases, or any real documentation. Those are obviously things that need a lot of work before there could be any thought of committing this. And probably a lot of bugs will be found along the way, too. A few less-serious problems with the patch: - We don't have an incremental JSON parser, so if you have a backup_manifest>1GB, pg_basebackup --incremental is going to fail. That's also true of the existing code in pg_verifybackup, and for the same reason. I talked to Andrew Dunstan at one point about adapting our JSON parser to support incremental parsing, and he had a patch for that, but I think he found some problems with it and I'm not sure what the current status is. - The patch does support differential backup, aka an incremental atop another incremental. There's no particular limit to how long a chain of backups can be. However, pg_combinebackup currently requires that the first backup is a full backup and all the later ones are incremental backups. So if you have a full backup a and an incremental backup b and a differential backup c, you can combine a b and c to get a full backup equivalent to one you would have gotten if you had taken a full backup at the time you took c. However, you can't combine b and c with each other without combining them with a, and that might be desirable in some situations. You might want to collapse a bunch of older differential backups into a single one that covers the whole time range of all of them. I think that the file format can support that, but the tool is currently too dumb. - We only know how to operate on directories, not tar files. I thought about that when working on pg_verifybackup as well, but I didn't do anything about it. It would be nice to go back and make that tool work on tar-format backups, and this one, too. I don't think there would be a whole lot of point trying to operate on compressed tar files because you need random access and that seems hard on a compressed file, but on uncompressed files it seems at least theoretically doable. I'm not sure whether anyone would care that much about this, though, even though it does sound pretty cool. In the attached patch series, patches 1 through 6 are various refactoring patches, patch 7 is the main event, and patch 8 adds a useful inspection tool. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
Attachment
- v1-0001-In-basebackup.c-refactor-to-create-verify_page_ch.patch
- v1-0005-Change-how-a-base-backup-decides-which-files-have.patch
- v1-0003-Change-struct-tablespaceinfo-s-oid-member-from-ch.patch
- v1-0002-In-basebackup.c-refactor-to-create-read_file_data.patch
- v1-0004-Refactor-parse_filename_for_nontemp_relation-to-p.patch
- v1-0006-Move-src-bin-pg_verifybackup-parse_manifest.c-int.patch
- v1-0008-Add-new-pg_walsummary-tool.patch
- v1-0007-Prototype-patch-for-incremental-and-differential-.patch
pgsql-hackers by date: