Re: Proposal: Incremental Backup - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Proposal: Incremental Backup
Date
Msg-id CAA4eK1JOfrmurgzYhGoz8GMkVGb2tgERW_Fs0nOWLJ_qTCesZA@mail.gmail.com
Whole thread Raw
In response to Re: Proposal: Incremental Backup  (desmodemone <desmodemone@gmail.com>)
Responses Re: Proposal: Incremental Backup
List pgsql-hackers
On Thu, Jul 31, 2014 at 1:56 PM, desmodemone <desmodemone@gmail.com> wrote:
>
> Hi Amit, thank you for your comments .
> However , about drawbacks:
> a) It's not clear to me why the method needs checksum enable, I mean, if the bgwriter or another process flushes a dirty buffer, it's only have to signal in the map that the blocks are changed with an update of the value from 0 to 1.They not need to verify the checksum of the block, we could assume that when a dirty buffers is flushed, the block is changed [ or better in my idea, the chunk of N blocks ].
> We could think an advanced setting that verify the checksum, but I think will be heavier.

I was thinking of enabling it for hint bit updates, if any operation
changes the page due to hint bit, then it will not mark the buffer
dirty unless wal_log_hints or checksum is enabled.  Now I think
if we don't want to track page changes due to hint bit updates, then
this will not be required.


> b) yes the backends need to update the map, but it's in memory, and as I show, could be very small if we you chunk of blocks.If we not compress the map, I not think could be a bottleneck.

This map has to reside in shared memory, so how will you
estimate the size of this map during startup and even if you
have some way to do that, I think still you need to detail out
the idea how your chunk scheme will work incase multiple
backends are trying to flush pages which are part of same chunk.

Also as I mentioned previously there are some operations which
are done without use of shared buffers, so you need to think
how to track the changes done by those operations.

> c) the map is not crash safe by design, because it needs only for incremental backup to track what blocks needs to be backuped, not for consistency or recovery of the whole cluster, so it's not an heavy cost for the whole cluster to maintain it. we could think an option (but it's heavy) to write it at every flush  on file to have crash-safe map, but I not think it's so usefull . I think it's acceptable, and probably it's better to force that, to say: "if your db will crash, you need a fullbackup ",

I am not sure if your this assumption is right/acceptable, how can
we say that in such a case users will be okay to have a fullbackup?
In general, taking fullbackup is very heavy operation and we should
try to avoid such a situation.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: commitfest status
Next
From: Jeff Davis
Date:
Subject: numeric and float comparison oddities