Re: block-level incremental backup - Mailing list pgsql-hackers

From vignesh C
Subject Re: block-level incremental backup
Date
Msg-id CALDaNm310fUZ72nM2n=cD0eSHKRAoJPuCyvvR0dhTEZ9Oytyzg@mail.gmail.com
Whole thread Raw
In response to Re: block-level incremental backup  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: block-level incremental backup  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Tue, Jul 30, 2019 at 1:58 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Jul 10, 2019 at 2:17 PM Anastasia Lubennikova
> <a.lubennikova@postgrespro.ru> wrote:
> > In attachments, you can find a prototype of incremental pg_basebackup,
> > which consists of 2 features:
> >
> > 1) To perform incremental backup one should call pg_basebackup with a
> > new argument:
> >
> > pg_basebackup -D 'basedir' --prev-backup-start-lsn 'lsn'
> >
> > where lsn is a start_lsn of parent backup (can be found in
> > "backup_label" file)
> >
> > It calls BASE_BACKUP replication command with a new argument
> > PREV_BACKUP_START_LSN 'lsn'.
> >
> > For datafiles, only pages with LSN > prev_backup_start_lsn will be
> > included in the backup.
>>
One thought, if the file is not modified no need to check the lsn.
>>
> > They are saved into 'filename.partial' file, 'filename.blockmap' file
> > contains an array of BlockNumbers.
> > For example, if we backuped blocks 1,3,5, filename.partial will contain
> > 3 blocks, and 'filename.blockmap' will contain array {1,3,5}.
>
> I think it's better to keep both the information about changed blocks
> and the contents of the changed blocks in a single file.  The list of
> changed blocks is probably quite short, and I don't really want to
> double the number of files in the backup if there's no real need. I
> suspect it's just overall a bit simpler to keep everything together.
> I don't think this is a make-or-break thing, and welcome contrary
> arguments, but that's my preference.
>
I feel Robert's suggestion is good.
We can probably keep one meta file for each backup with some basic information
of all the files being backed up, this metadata file will be useful in the
below case:
Table dropped before incremental backup
Table truncated and Insert/Update/Delete operations before incremental backup

I feel if we have the metadata, we can add some optimization to decide the
above scenario with the metadata information to identify the file deletion
and avoiding write and delete for pg_combinebackup which Jeevan has told in
his previous mail.

Probably it can also help us to decide which work the worker needs to do
if we are planning to backup in parallel.

Regards,
vignesh
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Remove HeapTuple and Buffer dependency for predicate lockingfunctions
Next
From: Ashwin Agrawal
Date:
Subject: Re: Remove HeapTuple and Buffer dependency for predicate locking functions