Re: block-level incremental backup - Mailing list pgsql-hackers
From | Jehan-Guillaume de Rorthais |
---|---|
Subject | Re: block-level incremental backup |
Date | |
Msg-id | 20190410202145.425d4aa6@firost Whole thread Raw |
In response to | Re: block-level incremental backup (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: block-level incremental backup
|
List | pgsql-hackers |
Hi, First thank you for your answer! On Wed, 10 Apr 2019 12:21:03 -0400 Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Apr 10, 2019 at 10:57 AM Jehan-Guillaume de Rorthais > <jgdr@dalibo.com> wrote: > > My idea would be create a new tool working on archived WAL. No burden > > server side. Basic concept is: > > > > * parse archives > > * record latest relevant FPW for the incr backup > > * write new WALs with recorded FPW and removing/rewriting duplicated > > walrecords. > > > > It's just a PoC and I hadn't finished the WAL writing part...not even > > talking about the replay part. I'm not even sure this project is a good > > idea, but it is a good educational exercice to me in the meantime. > > > > Anyway, using real life OLTP production archives, my stats were: > > > > # WAL xlogrec kept Size WAL kept > > 127 39% 50% > > 383 22% 38% > > 639 20% 29% > > > > Based on this stats, I expect this would save a lot of time during recovery > > in a first step. If it get mature, it might even save a lot of archives > > space or extend the retention period with degraded granularity. It would > > even help taking full backups with a lower frequency. > > > > Any thoughts about this design would be much appreciated. I suppose this > > should be offlist or in a new thread to avoid polluting this thread as this > > is a slightly different subject. > > Interesting idea, but I don't see how it can work if you only deal > with the FPWs and not the other records. For instance, suppose that > you take a full backup at time T0, and then at time T1 there are two > modifications to a certain block in quick succession. That block is > then never touched again. Since no checkpoint intervenes between the > modifications, the first one emits an FPI and the second does not. > Capturing the FPI is fine as far as it goes, but unless you also do > something with the non-FPI change, you lose that second modification. > You could fix that by having your tool replicate the effects of WAL > apply outside the server, but that sounds like a ton of work and a ton > of possible bugs. In my current design, the scan is done backward from end to start and I keep all the records appearing after the last occurrence of their respective FPI. The next challenge I have to achieve is to deal with multiple blocks records where some need to be removed and other are FPI to keep (eg. UPDATE). > I have a related idea, though. Suppose that, as Peter says upthread, > you have a replication slot that prevents old WAL from being removed. > You also have a background worker that is connected to that slot. It > decodes WAL and produces summary files containing all block-references > extracted from those WAL records and the associated LSN (or maybe some > approximation of the LSN instead of the exact value, to allow for > compression and combining of nearby references). Then you hold onto > those summary files after the actual WAL is removed. Now, when > somebody asks the server for all blocks changed since a certain LSN, > it can use those summary files to figure out which blocks to send > without having to read all the pages in the database. Although I > believe that a simple system that finds modified blocks by reading > them all is good enough for a first version of this feature and useful > in its own right, a more efficient system will be a lot more useful, > and something like this seems to me to be probably the best way to > implement it. Summary files looks like what Andrey Borodin described as delta-files and change maps. > With an approach based > on WAL-scanning, the work is done in the background and nobody has to > wait for it. Agree with this.
pgsql-hackers by date: