Re: Proposal: Incremental Backup - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Proposal: Incremental Backup
Date
Msg-id 20140806152050.GJ13302@momjian.us
Whole thread Raw
In response to Re: Proposal: Incremental Backup  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Proposal: Incremental Backup
Re: Proposal: Incremental Backup
List pgsql-hackers
On Wed, Aug  6, 2014 at 06:48:55AM +0100, Simon Riggs wrote:
> On 6 August 2014 03:16, Bruce Momjian <bruce@momjian.us> wrote:
> > On Wed, Aug  6, 2014 at 09:17:35AM +0900, Michael Paquier wrote:
> >> On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >> >
> >> > On 5 August 2014 22:38, Claudio Freire <klaussfreire@gmail.com> wrote:
> >> > Thinking some more, there seems like this whole store-multiple-LSNs
> >> > thing is too much. We can still do block-level incrementals just by
> >> > using a single LSN as the reference point. We'd still need a complex
> >> > file format and a complex file reconstruction program, so I think that
> >> > is still "next release". We can call that INCREMENTAL BLOCK LEVEL.
> >>
> >> Yes, that's the approach taken by pg_rman for its block-level
> >> incremental backup. Btw, I don't think that the CPU cost to scan all
> >> the relation files added to the one to rebuild the backups is worth
> >> doing it on large instances. File-level backup would cover most of the
> >
> > Well, if you scan the WAL files from the previous backup, that will tell
> > you what pages that need incremental backup.
> 
> That would require you to store that WAL, which is something we hope
> to avoid. Plus if you did store it, you'd need to retrieve it from
> long term storage, which is what we hope to avoid.

Well, for file-level backups we have:
1) use file modtime (possibly inaccurate)2) use file modtime and checksums (heavy read load)

For block-level backups we have:
3) accumulate block numbers as WAL is written4) read previous WAL at incremental backup time5) read data page LSNs
(highread load)
 

The question is which of these do we want to implement?  #1 is very easy
to implement, but incremental _file_ backups are larger than block-level
backups.  If we have #5, would we ever want #2?  If we have #3, would we
ever want #4 or #5?

> > I am thinking we need a wiki page to outline all these options.
> 
> There is a Wiki page.

I would like to see that wiki page have a more open approach to
implementations.

I do think this is a very important topic for us.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Scaling shared buffer eviction
Next
From: Bruce Momjian
Date:
Subject: Re: Append to a GUC parameter ?