Re: Proposal: Incremental Backup - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Proposal: Incremental Backup |
Date | |
Msg-id | CA+U5nMK2gZgYuRF=-gv1aWxvDWhXZkJSyZZCC9XO_j4DLTKrzw@mail.gmail.com Whole thread Raw |
In response to | Re: Proposal: Incremental Backup (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: Proposal: Incremental Backup
|
List | pgsql-hackers |
On 6 August 2014 17:27, Bruce Momjian <bruce@momjian.us> wrote: > On Wed, Aug 6, 2014 at 01:15:32PM -0300, Claudio Freire wrote: >> On Wed, Aug 6, 2014 at 12:20 PM, Bruce Momjian <bruce@momjian.us> wrote: >> > >> > Well, for file-level backups we have: >> > >> > 1) use file modtime (possibly inaccurate) >> > 2) use file modtime and checksums (heavy read load) >> > >> > For block-level backups we have: >> > >> > 3) accumulate block numbers as WAL is written >> > 4) read previous WAL at incremental backup time >> > 5) read data page LSNs (high read load) >> > >> > The question is which of these do we want to implement? #1 is very easy >> > to implement, but incremental _file_ backups are larger than block-level >> > backups. If we have #5, would we ever want #2? If we have #3, would we >> > ever want #4 or #5? >> >> You may want to implement both #3 and #2. #3 would need a config >> switch to enable updating the bitmap. That would make it optional to >> incur the I/O cost of updating the bitmap. When the bitmap isn't >> there, the backup would use #2. Slow, but effective. If slowness is a >> problem for you, you enable the bitmap and do #3. >> >> Sounds reasonable IMO, and it means you can start by implementing #2. > > Well, Robert Haas had the idea of a separate process that accumulates > the changed WAL block numbers, making it low overhead. I question > whether we need #2 just to handle cases where they didn't enable #3 > accounting earlier. If that is the case, just do a full backup and > enable #3. Well, there is a huge difference between file-level and block-level backup. Designing, writing and verifying block-level backup to the point that it is acceptable is a huge effort. (Plus, I don't think accumulating block numbers as they are written will be "low overhead". Perhaps there was a misunderstanding there and what is being suggested is to accumulate file names that change as they are written, since we already do that in the checkpointer process, which would be an option between 2 and 3 on the above list). What is being proposed here is file-level incremental backup that works in a general way for various backup management tools. It's the 80/20 first step on the road. We get most of the benefit, it can be delivered in this release as robust, verifiable code. Plus, that is all we have budget for, a fairly critical consideration. Big features need to be designed incrementally across multiple releases, delivering incremental benefit (or at least that is what I have learned). Yes, working block-level backup would be wonderful, but if we hold out for that as the first step then we'll get nothing anytime soon. I would also point out that the more specific we make our backup solution the less likely it is to integrate with external backup providers. Oracle's RMAN requires specific support in external software. 10 years after Postgres PITR we still see many vendors showing "PostgreSQL Backup Supported" as meaning pg_dump only. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: