Re: Extending BASE_BACKUP in replication protocol: incremental backup and backup format - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: Extending BASE_BACKUP in replication protocol: incremental backup and backup format
Date
Msg-id CABUevEzw3rUR7oV8nhpMSwGivtrdME2QbmV+zkrKEjVOyh=bjw@mail.gmail.com
Whole thread Raw
In response to Re: Extending BASE_BACKUP in replication protocol: incremental backup and backup format  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: Extending BASE_BACKUP in replication protocol: incremental backup and backup format
List pgsql-hackers
On Tue, Jan 14, 2014 at 2:16 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Tue, Jan 14, 2014 at 10:01 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
>> - Addition of an option called INCREMENTAL to send an incremental
>> backup to the client. This option uses as input an LSN, and sends back
>> to client relation pages (in the shape of reduced relation files) that
>> are newer than the LSN specified by looking at pd_lsn of
>> PageHeaderData. In this case the LSN needs to be determined by client
>> based on the latest full backup taken. This option is particularly
>> interesting to reduce the amount of data taken between two backups,
>> even if it increases the restore time as client needs to reconstitute
>> a base backup depending on the recovery target and the pages modified.
>> Client would be in charge of rebuilding pages from incremental backup
>> by scanning all the blocks that need to be updated based on the full
>> backup as the LSN from which incremental backup is taken is known. But
>> this is not really something the server cares about... Such things are
>> actually done by pg_rman as well.
>
>
> How does the server find all the pages with LSN > the threshold? If it needs
> to scan the whole database, it's not all that useful. I guess it would be
> better than nothing, but I think you might as well just use rsync.
Yes, it would be necessary to scan the whole database as the LSN to be
checked is kept in PageHeaderData :). Perhaps it is not that
performant, but my initial thought was that perhaps the amount of data
necessary to maintain incremental backups could balance with the
amount of WAL necessary to keep and limit the whole amount on disk.

It wouldn't be worse performance wise than a full backup. That one also has to read all the blocks after all... You're decreasing network traffic and client storage, with the same I/O on the server side. Seems worthwhile. 

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Extending BASE_BACKUP in replication protocol: incremental backup and backup format
Next
From: Magnus Hagander
Date:
Subject: Re: Extending BASE_BACKUP in replication protocol: incremental backup and backup format