Re: block-level incremental backup - Mailing list pgsql-hackers

From Ashwin Agrawal
Subject Re: block-level incremental backup
Date
Msg-id CALfoeitO-vkfjubMFQRmgyXghL-uUnZLNxbr=obrQQsm8kFO4A@mail.gmail.com
Whole thread Raw
In response to Re: block-level incremental backup  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: block-level incremental backup
List pgsql-hackers

On Wed, Apr 10, 2019 at 9:21 AM Robert Haas <robertmhaas@gmail.com> wrote:
I have a related idea, though.  Suppose that, as Peter says upthread,
you have a replication slot that prevents old WAL from being removed.
You also have a background worker that is connected to that slot.  It
decodes WAL and produces summary files containing all block-references
extracted from those WAL records and the associated LSN (or maybe some
approximation of the LSN instead of the exact value, to allow for
compression and combining of nearby references).  Then you hold onto
those summary files after the actual WAL is removed.  Now, when
somebody asks the server for all blocks changed since a certain LSN,
it can use those summary files to figure out which blocks to send
without having to read all the pages in the database.  Although I
believe that a simple system that finds modified blocks by reading
them all is good enough for a first version of this feature and useful
in its own right, a more efficient system will be a lot more useful,
and something like this seems to me to be probably the best way to
implement it.

Not to fork the conversation from incremental backups, but similar approach is what we have been thinking for pg_rewind. Currently, pg_rewind requires all the WAL logs to be present on source side from point of divergence to rewind. Instead just parse the wal and keep the changed blocks around on sourece. Then don't need to retain the WAL but can still rewind using the changed block map. So, rewind becomes much similar to incremental backup proposed here after performing rewind activity using target side WAL only.

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: block-level incremental backup
Next
From: Justin Pryzby
Date:
Subject: Re: Cleanup/remove/update references to OID column