Re: Proposal: Incremental Backup - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Proposal: Incremental Backup |
Date | |
Msg-id | CA+U5nMLkXrizPXg9Yf=uUGMu8zFyvXpnKTop_HaZUxJuwv9khw@mail.gmail.com Whole thread Raw |
In response to | Re: Proposal: Incremental Backup (Claudio Freire <klaussfreire@gmail.com>) |
Responses |
Re: Proposal: Incremental Backup
Re: Proposal: Incremental Backup Re: Proposal: Incremental Backup |
List | pgsql-hackers |
On 5 August 2014 22:38, Claudio Freire <klaussfreire@gmail.com> wrote: >> * When we take an incremental backup we need the WAL from the backup >> start LSN through to the backup stop LSN. We do not need the WAL >> between the last backup stop LSN and the new incremental start LSN. >> That is a huge amount of WAL in many cases and we'd like to avoid >> that, I would imagine. (So the space savings aren't just the delta >> from the main data files, we should also look at WAL savings). > > Yes, probably something along the lines of removing redundant FPW and > stuff like that. Not what I mean at all, sorry for confusing. Each backup has a start LSN and a stop LSN. You need all the WAL between those two points (-X option) But if you have an incremental backup (b2), it depends upon an earlier backup (b1). You don't need the WAL between b1.stop_lsn and b2.start_lsn. In typical cases, start to stop will be a few hours or less, whereas we'd be doing backups at most daily. Which would mean we'd only need to store at most 10% of the WAL files because we don't need WAL between backups. >> * For me, file based incremental is a useful and robust feature. >> Block-level incremental is possible, but requires either significant >> persistent metadata (1 MB per GB file) or access to the original >> backup. One important objective here is to make sure we do NOT have to >> re-read the last backup when taking the next backup; this helps us to >> optimize the storage costs for backups. Plus, block-level recovery >> requires us to have a program that correctly re-writes data into the >> correct locations in a file, which seems likely to be a slow and bug >> ridden process to me. Nice, safe, solid file-level incremental backup >> first please. Fancy, bug prone, block-level stuff much later. > > Ok. You could do incremental first without any kind of optimization, Yes, that is what makes sense to me. Fast, simple, robust and most of the benefit. We should call this INCREMENTAL FILE LEVEL > then file-level optimization by keeping a file-level LSN range, and > then extend that to block-segment-level LSN ranges. That sounds like a > plan to me. Thinking some more, there seems like this whole store-multiple-LSNs thing is too much. We can still do block-level incrementals just by using a single LSN as the reference point. We'd still need a complex file format and a complex file reconstruction program, so I think that is still "next release". We can call that INCREMENTAL BLOCK LEVEL > But, I don't see how you'd do the one without optimization without > reading the previous backup for comparing deltas. Remember checksums > are deemed not trustworthy, not just by me, so that (which was the > original proposition) doesn't work. Every incremental backup refers to an earlier backup as a reference point, which may then refer to an earlier one, in a chain. Each backup has a single LSN associated with it, as stored in the backup_label. (So we don't need the profile stage now, AFAICS) To decide whether we need to re-copy the file, you read the file until we find a block with a later LSN. If we read the whole file without finding a later LSN then we don't need to re-copy. That means we read each file twice, which is slower, but the file is at most 1GB in size, we we can assume will be mostly in memory for the second read. As Marco says, that can be optimized using filesystem timestamps instead. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: