Home > mailing lists

Re: block-level incremental backup - Mailing list pgsql-hackers

From	Ibrar Ahmed
Subject	Re: block-level incremental backup
Date	July 30, 2019 16:27:07
Msg-id	CALtqXTebEahx3c63ttst0z1PPdzj1T6Lo6nS2phGF93fZAtNuQ@mail.gmail.com Whole thread Raw
In response to	Re: block-level incremental backup (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

On Tue, Jul 30, 2019 at 1:28 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jul 10, 2019 at 2:17 PM Anastasia Lubennikova
<a.lubennikova@postgrespro.ru> wrote:
> In attachments, you can find a prototype of incremental pg_basebackup,
> which consists of 2 features:
>
> 1) To perform incremental backup one should call pg_basebackup with a
> new argument:
>
> pg_basebackup -D 'basedir' --prev-backup-start-lsn 'lsn'
>
> where lsn is a start_lsn of parent backup (can be found in
> "backup_label" file)
>
> It calls BASE_BACKUP replication command with a new argument
> PREV_BACKUP_START_LSN 'lsn'.
>
> For datafiles, only pages with LSN > prev_backup_start_lsn will be
> included in the backup.
> They are saved into 'filename.partial' file, 'filename.blockmap' file
> contains an array of BlockNumbers.
> For example, if we backuped blocks 1,3,5, filename.partial will contain
> 3 blocks, and 'filename.blockmap' will contain array {1,3,5}.

I think it's better to keep both the information about changed blocks
and the contents of the changed blocks in a single file. The list of
changed blocks is probably quite short, and I don't really want to
double the number of files in the backup if there's no real need. I
suspect it's just overall a bit simpler to keep everything together.
I don't think this is a make-or-break thing, and welcome contrary
arguments, but that's my preference.

I had experience working on a similar product and I agree with Robert to keep

the changed block info and the changed block in a single file make more sense.

> 2) To merge incremental backup into a full backup call
>
> pg_basebackup -D 'basedir' --incremental-pgdata 'incremental_basedir'
> --merge-backups
>
> It will move all files from 'incremental_basedir' to 'basedir' handling
> '.partial' files correctly.

This, to me, looks like it's much worse than the design that I
proposed originally. It means that:

1. You can't take an incremental backup without having the full backup
available at the time you want to take the incremental backup.

2. You're always storing a full backup, which means that you need more
disk space, and potentially much more I/O while taking the backup.
You save on transfer bandwidth, but you add a lot of disk reads and
writes, costs which have to be paid even if the backup is never
restored.

> 1) Whether we collect block maps using simple "read everything page by
> page" approach
> or WAL scanning or any other page tracking algorithm, we must choose a
> map format.
> I implemented the simplest one, while there are more ideas:

I think we should start simple.

I haven't had a chance to look at Jeevan's patch at all, or yours in
any detail, as yet, so these are just some very preliminary comments.
It will be good, however, if we can agree on who is going to do what
part of this as we try to drive this forward together. I'm sorry that
I didn't communicate EDB's plans to work on this more clearly;
duplicated effort serves nobody well.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Ibrar Ahmed

pgsql-hackers by date:

From: Binguo Bao
Date: 30 July 2019, 16:20:20
Subject: Re: [proposal] de-TOAST'ing using a iterator

From: Tom Lane
Date: 30 July 2019, 16:40:54
Subject: Re: tap tests driving the database via psql

Re: block-level incremental backup - Mailing list pgsql-hackers

Previous

Next