Home > mailing lists

Re: block-level incremental backup - Mailing list pgsql-hackers

From	vignesh C
Subject	Re: block-level incremental backup
Date	July 26, 2019 10:53:57
Msg-id	CALDaNm01DxcHwZ8f5N7gXv8iGer1jY+i-AuzkS4TxtmRowrLKQ@mail.gmail.com Whole thread Raw
In response to	Re: block-level incremental backup (Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>)
List	pgsql-hackers

Tree view

On Fri, Jul 26, 2019 at 11:21 AM Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:

Hi Vignesh,

Please find my comments inline below:

1) If relation file has changed due to truncate or vacuum.
During incremental backup the new files will be copied.
There are chances that both the old file and new file
will be present. I'm not sure if cleaning up of the
old file is handled.

When an incremental backup is taken it either copies the file in its entirety if
a file is changed more than 90%, or writes .partial with changed blocks bitmap
and actual data. For the files that are unchanged, it writes 0 bytes and still
creates a .partial file for unchanged files too. This means there is a .partitial
file for all the files that are to be looked up in full backup.
While composing a synthetic backup from incremental backup the pg_combinebackup
tool will only look for those relation files in full(parent) backup which are
having .partial files in the incremental backup. So, if vacuum/truncate happened
between full and incremental backup, then the incremental backup image will not
have a 0-length .partial file for that relation, and so the synthetic backup
that is restored using pg_combinebackup will not have that file as well.

Thanks Jeevan for the update, I feel this logic is good.

It will handle the case of deleting the old relation files.

2) Just a small thought on building the bitmap,
can the bitmap be built and maintained as
and when the changes are happening in the system.
If we are building the bitmap while doing the incremental backup,
Scanning through each file might take more time.
This can be a configurable parameter, the system can run
without capturing this information by default, but if there are some
of them who will be taking incremental backup frequently this
configuration can be enabled which should track the modified blocks.

IIUC, this will need changes in the backend. Honestly, I think backup is a
maintenance task and hampering the backend for this does not look like a good
idea. But, having said that even if we have to provide this as a switch for some
of the users, it will need a different infrastructure than what we are building
here for constructing bitmap, where we scan all the files one by one. Maybe for
the initial version, we can go with the current proposal that Robert has suggested,
and add this switch at a later point as an enhancement.

That sounds fair to me.

Regards,
vignesh
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Sergei Kornilov
Date: 26 July 2019, 10:53:03
Subject: Re: Add parallelism and glibc dependent only options to reindexdb

From: Jehan-Guillaume de Rorthais
Date: 26 July 2019, 11:02:58
Subject: Re: Fetching timeline during recovery

Re: block-level incremental backup - Mailing list pgsql-hackers

Previous

Next