Re: block-level incremental backup - Mailing list pgsql-hackers

From Ibrar Ahmed
Subject Re: block-level incremental backup
Date
Msg-id CALtqXTebEahx3c63ttst0z1PPdzj1T6Lo6nS2phGF93fZAtNuQ@mail.gmail.com
Whole thread Raw
In response to Re: block-level incremental backup  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers


On Tue, Jul 30, 2019 at 1:28 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Jul 10, 2019 at 2:17 PM Anastasia Lubennikova
<a.lubennikova@postgrespro.ru> wrote:
> In attachments, you can find a prototype of incremental pg_basebackup,
> which consists of 2 features:
>
> 1) To perform incremental backup one should call pg_basebackup with a
> new argument:
>
> pg_basebackup -D 'basedir' --prev-backup-start-lsn 'lsn'
>
> where lsn is a start_lsn of parent backup (can be found in
> "backup_label" file)
>
> It calls BASE_BACKUP replication command with a new argument
> PREV_BACKUP_START_LSN 'lsn'.
>
> For datafiles, only pages with LSN > prev_backup_start_lsn will be
> included in the backup.
> They are saved into 'filename.partial' file, 'filename.blockmap' file
> contains an array of BlockNumbers.
> For example, if we backuped blocks 1,3,5, filename.partial will contain
> 3 blocks, and 'filename.blockmap' will contain array {1,3,5}.

I think it's better to keep both the information about changed blocks
and the contents of the changed blocks in a single file.  The list of
changed blocks is probably quite short, and I don't really want to
double the number of files in the backup if there's no real need. I
suspect it's just overall a bit simpler to keep everything together.
I don't think this is a make-or-break thing, and welcome contrary
arguments, but that's my preference.

I had experience working on a similar product and I agree with Robert to keep
the changed block info and the changed block in a single file make more sense.  
+1

> 2) To merge incremental backup into a full backup call
>
> pg_basebackup -D 'basedir' --incremental-pgdata 'incremental_basedir'
> --merge-backups
>
> It will move all files from 'incremental_basedir' to 'basedir' handling
> '.partial' files correctly.

This, to me, looks like it's much worse than the design that I
proposed originally.  It means that:

1. You can't take an incremental backup without having the full backup
available at the time you want to take the incremental backup.

2. You're always storing a full backup, which means that you need more
disk space, and potentially much more I/O while taking the backup.
You save on transfer bandwidth, but you add a lot of disk reads and
writes, costs which have to be paid even if the backup is never
restored.

> 1) Whether we collect block maps using simple "read everything page by
> page" approach
> or WAL scanning or any other page tracking algorithm, we must choose a
> map format.
> I implemented the simplest one, while there are more ideas:

I think we should start simple.

I haven't had a chance to look at Jeevan's patch at all, or yours in
any detail, as yet, so these are just some very preliminary comments.
It will be good, however, if we can agree on who is going to do what
part of this as we try to drive this forward together.  I'm sorry that
I didn't communicate EDB's plans to work on this more clearly;
duplicated effort serves nobody well.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




--
Ibrar Ahmed

pgsql-hackers by date:

Previous
From: Binguo Bao
Date:
Subject: Re: [proposal] de-TOAST'ing using a iterator
Next
From: Tom Lane
Date:
Subject: Re: tap tests driving the database via psql