Re: block-level incremental backup - Mailing list pgsql-hackers

From vignesh C
Subject Re: block-level incremental backup
Date
Msg-id CALDaNm01DxcHwZ8f5N7gXv8iGer1jY+i-AuzkS4TxtmRowrLKQ@mail.gmail.com
Whole thread Raw
In response to Re: block-level incremental backup  (Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>)
List pgsql-hackers
On Fri, Jul 26, 2019 at 11:21 AM Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:
Hi Vignesh,

Please find my comments inline below:

1) If relation file has changed due to truncate or vacuum.
    During incremental backup the new files will be copied.
    There are chances that both the old  file and new file
    will be present. I'm not sure if cleaning up of the
    old file is handled.

When an incremental backup is taken it either copies the file in its entirety if
a file is changed more than 90%, or writes .partial with changed blocks bitmap
and actual data. For the files that are unchanged, it writes 0 bytes and still
creates a .partial file for unchanged files too. This means there is a .partitial
file for all the files that are to be looked up in full backup.
While composing a synthetic backup from incremental backup the pg_combinebackup
tool will only look for those relation files in full(parent) backup which are
having .partial files in the incremental backup. So, if vacuum/truncate happened
between full and incremental backup, then the incremental backup image will not
have a 0-length .partial file for that relation, and so the synthetic backup
that is restored using pg_combinebackup will not have that file as well.
Thanks Jeevan for the update, I feel this logic is good.  
It will handle the case of deleting the old relation files.
 
2) Just a small thought on building the bitmap,
    can the bitmap be built and maintained as
    and when the changes are happening in the system.
    If we are building the bitmap while doing the incremental backup,
    Scanning through each file might take more time.
    This can be a configurable parameter, the system can run
    without capturing this information by default, but if there are some
    of them who will be taking incremental backup frequently this
    configuration can be enabled which should track the modified blocks.

IIUC, this will need changes in the backend. Honestly, I think backup is a
maintenance task and hampering the backend for this does not look like a good
idea. But, having said that even if we have to provide this as a switch for some
of the users, it will need a different infrastructure than what we are building
here for constructing bitmap, where we scan all the files one by one. Maybe for
the initial version, we can go with the current proposal that Robert has suggested,
and add this switch at a later point as an enhancement. 
That sounds fair to me.


Regards,
vignesh
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Sergei Kornilov
Date:
Subject: Re: Add parallelism and glibc dependent only options to reindexdb
Next
From: Jehan-Guillaume de Rorthais
Date:
Subject: Re: Fetching timeline during recovery