Re: block-level incremental backup - Mailing list pgsql-hackers

From Anastasia Lubennikova
Subject Re: block-level incremental backup
Date
Msg-id bc1b3253-8deb-a8f4-7bf3-4e5cef3d3fd6@postgrespro.ru
Whole thread Raw
In response to Re: block-level incremental backup  (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
Responses Re: block-level incremental backup
Re: block-level incremental backup
List pgsql-hackers
23.04.2019 14:08, Anastasia Lubennikova wrote:
> I'm volunteering to write a draft patch or, more likely, set of 
> patches, which
> will allow us to discuss the subject in more detail.
> And to do that I wish we agree on the API and data format (at least 
> broadly).
> Looking forward to hearing your thoughts. 

Though the previous discussion stalled,
I still hope that we could agree on basic points such as a map file 
format and protocol extension,
which is necessary to start implementing the feature.

--------- Proof Of Concept patch ---------

In attachments, you can find a prototype of incremental pg_basebackup, 
which consists of 2 features:

1) To perform incremental backup one should call pg_basebackup with a 
new argument:

pg_basebackup -D 'basedir' --prev-backup-start-lsn 'lsn'

where lsn is a start_lsn of parent backup (can be found in 
"backup_label" file)

It calls BASE_BACKUP replication command with a new argument 
PREV_BACKUP_START_LSN 'lsn'.

For datafiles, only pages with LSN > prev_backup_start_lsn will be 
included in the backup.
They are saved into 'filename.partial' file, 'filename.blockmap' file 
contains an array of BlockNumbers.
For example, if we backuped blocks 1,3,5, filename.partial will contain 
3 blocks, and 'filename.blockmap' will contain array {1,3,5}.

Non-datafiles use the same format as before.

2) To merge incremental backup into a full backup call

pg_basebackup -D 'basedir' --incremental-pgdata 'incremental_basedir' 
--merge-backups

It will move all files from 'incremental_basedir' to 'basedir' handling 
'.partial' files correctly.


--------- Questions to discuss ---------

Please note that it is just a proof-of-concept patch and it can be 
optimized in many ways.
Let's concentrate on issues that affect the protocol or data format.

1) Whether we collect block maps using simple "read everything page by 
page" approach
or WAL scanning or any other page tracking algorithm, we must choose a 
map format.
I implemented the simplest one, while there are more ideas:

- We can have a map not per file, but per relation or maybe per tablespace,
which will make implementation more complex, but probably more optimal.
The only problem I see with existing implementation is that even if only 
a few blocks changed,
we still must pad it to 512 bytes per tar format requirements.

- We can save LSNs into the block map.

typedef struct BlockMapItem {
     BlockNumber blkno;
     XLogRecPtr lsn;
} BlockMapItem;

In my implementation, invalid prev_backup_start_lsn means fallback to 
regular basebackup
without any block maps. Alternatively, we can define another meaning of 
this value and send a block map for all files.
Backup utilities can use these maps to speed up backup merge or restore.

2) We can implement BASE_BACKUP SEND_FILELIST replication command,
which will return a list of filenames with file sizes and block maps if 
lsn was provided.

To avoid changing format, we can simply send tar headers for each file:
- tarHeader("filename.blockmap") followed by blockmap for relation files 
if prev_backup_start_lsn is provided;
- tarHeader("filename") without actual file content for non relation 
files or for all files in "FULL" backup

The caller can parse messages and use them for any purpose, for example, 
to perform a parallel backup.

Thoughts?

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Next
From: Bruce Momjian
Date:
Subject: Re: doc: minor update for description of "pg_roles" view