Re: block-level incremental backup - Mailing list pgsql-hackers
From | Anastasia Lubennikova |
---|---|
Subject | Re: block-level incremental backup |
Date | |
Msg-id | bc1b3253-8deb-a8f4-7bf3-4e5cef3d3fd6@postgrespro.ru Whole thread Raw |
In response to | Re: block-level incremental backup (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>) |
Responses |
Re: block-level incremental backup
Re: block-level incremental backup |
List | pgsql-hackers |
23.04.2019 14:08, Anastasia Lubennikova wrote: > I'm volunteering to write a draft patch or, more likely, set of > patches, which > will allow us to discuss the subject in more detail. > And to do that I wish we agree on the API and data format (at least > broadly). > Looking forward to hearing your thoughts. Though the previous discussion stalled, I still hope that we could agree on basic points such as a map file format and protocol extension, which is necessary to start implementing the feature. --------- Proof Of Concept patch --------- In attachments, you can find a prototype of incremental pg_basebackup, which consists of 2 features: 1) To perform incremental backup one should call pg_basebackup with a new argument: pg_basebackup -D 'basedir' --prev-backup-start-lsn 'lsn' where lsn is a start_lsn of parent backup (can be found in "backup_label" file) It calls BASE_BACKUP replication command with a new argument PREV_BACKUP_START_LSN 'lsn'. For datafiles, only pages with LSN > prev_backup_start_lsn will be included in the backup. They are saved into 'filename.partial' file, 'filename.blockmap' file contains an array of BlockNumbers. For example, if we backuped blocks 1,3,5, filename.partial will contain 3 blocks, and 'filename.blockmap' will contain array {1,3,5}. Non-datafiles use the same format as before. 2) To merge incremental backup into a full backup call pg_basebackup -D 'basedir' --incremental-pgdata 'incremental_basedir' --merge-backups It will move all files from 'incremental_basedir' to 'basedir' handling '.partial' files correctly. --------- Questions to discuss --------- Please note that it is just a proof-of-concept patch and it can be optimized in many ways. Let's concentrate on issues that affect the protocol or data format. 1) Whether we collect block maps using simple "read everything page by page" approach or WAL scanning or any other page tracking algorithm, we must choose a map format. I implemented the simplest one, while there are more ideas: - We can have a map not per file, but per relation or maybe per tablespace, which will make implementation more complex, but probably more optimal. The only problem I see with existing implementation is that even if only a few blocks changed, we still must pad it to 512 bytes per tar format requirements. - We can save LSNs into the block map. typedef struct BlockMapItem { BlockNumber blkno; XLogRecPtr lsn; } BlockMapItem; In my implementation, invalid prev_backup_start_lsn means fallback to regular basebackup without any block maps. Alternatively, we can define another meaning of this value and send a block map for all files. Backup utilities can use these maps to speed up backup merge or restore. 2) We can implement BASE_BACKUP SEND_FILELIST replication command, which will return a list of filenames with file sizes and block maps if lsn was provided. To avoid changing format, we can simply send tar headers for each file: - tarHeader("filename.blockmap") followed by blockmap for relation files if prev_backup_start_lsn is provided; - tarHeader("filename") without actual file content for non relation files or for all files in "FULL" backup The caller can parse messages and use them for any purpose, for example, to perform a parallel backup. Thoughts? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
pgsql-hackers by date: