Re: block-level incremental backup - Mailing list pgsql-hackers
From | Anastasia Lubennikova |
---|---|
Subject | Re: block-level incremental backup |
Date | |
Msg-id | 3e15314c-b8de-2c81-6722-80c33423bc85@postgrespro.ru Whole thread Raw |
In response to | block-level incremental backup (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: block-level incremental backup
|
List | pgsql-hackers |
09.04.2019 18:48, Robert Haas writes: > Thoughts? Hi, Thank you for bringing that up. In-core support of incremental backups is a long-awaited feature. Hopefully, this take will end up committed in PG13. Speaking of UI: 1) I agree that it should be implemented as a new replication command. 2) There should be a command to get only a map of changes without actual data. Most backup tools establish server connection, so they can use this protocol to get the list of changed blocks. Then they can use this information for any purpose. For example, distribute files between parallel workers to copy the data, or estimate backup size before data is sent, or store metadata separately from the data itself. Most methods (except straightforward LSN comparison) consist of two steps: get a map of changes and read blocks. So it won't add much of extra work. example commands: GET_FILELIST [lsn] returning json (or whatever) with filenames and maps of changed blocks Map format is also the subject of discussion. Now in pg_probackup we reuse code from pg_rewind/datapagemap, not sure if this format is good for sending data via the protocol, though. 3) The API should provide functions to request data with a granularity of file and block. It will be useful for parallelism and for various future projects. example commands: GET_DATAFILE [filename [map of blocks] ] GET_DATABLOCK [filename] [blkno] returning data in some format 4) The algorithm of collecting changed blocks is another topic. Though, it's API should be discussed here: Do we want to have multiple implementations? Personally, I think that it's good to provide several strategies, since they have different requirements and fit for different workloads. Maybe we can add a hook to allow custom implementations. Do we want to allow the backup client to tell what block collection method to use? example commands: GET_FILELIST [lsn] [METHOD lsn | page | ptrack | etc] Or should it be server-side cost-based decision? 5) The method based on LSN comparison stands out - it can be done in one pass. So it probably requires special protocol commands. for example: GET_DATAFILES [lsn] GET_DATAFILE [filename] [lsn] This is pretty simple to implement and pg_basebackup can use this method, at least until we have something more advanced in-core. I'll be happy to help with design, code, review, and testing. Hope that my experience with pg_probackup will be useful. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pgsql-hackers by date: