Re: Patch for fail-back without fresh backup - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: Patch for fail-back without fresh backup
Date
Msg-id CABOikdPw=kcS4TM+akxtZNn3VuC42SOF4_cF0KYce1+czd48jg@mail.gmail.com
Whole thread Raw
In response to Re: Patch for fail-back without fresh backup  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: Patch for fail-back without fresh backup  (Sawada Masahiko <sawada.mshk@gmail.com>)
List pgsql-hackers
On Tue, Oct 8, 2013 at 9:22 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:


Yeah, I definitely think we should work on the pg_rewind approach instead of this patch. It's a lot more flexible. The performance hit of WAL-logging hint bit updates is the price you have to pay, but a lot of people were OK with that to get page checksum, so I think a lot of people would be OK with it for this purpose too. As long as it's optional, of course. And anyone using page checksums are already paying that price.

Not that I can find any flaw in the OP's patch, but given the major objections and my own nervousness about documenting this new "failback safe" standby mode, I am also inclining to improve pg_rewind or whatever it takes to get it working. Clearly at first we need to have an optional mechanism to WAL log hint bit updates. There seems to be two ways to do that:

a. Add a new GUC which can turned on/off and requires server restart to take effect
b. Add another option for wal_level setting. 

(b) looks better, but I am not sure if we want to support this new level with and without hot standby. If latter, we will need multiple new levels to differentiate all those cases. I am OK with supporting it only with hot standby which is probably what most people do with streaming replication anyway.

The other issue is to how to optimally WAL log hint bit updates:

a. Should we have separate WAL records just for the purpose or should we piggyback them on heap update/delete/prune etc WAL records ? Of course, there will be occasions when a simple SELECT also updates hint bits, so most likely we will need a separate WAL record anyhow.
b. Does it make sense to try to all hint bits in a page if we are WAL logging it anyways ? I think we have discussed this idea even before just to minimize the number of writes a heap page receives when hint bits of different tuples are set at different times, each update triggering a fresh write. I don't remember whats the consensus for that, but it might be worthwhile to reconsider that option if we are WAL logging the hint bit updates.

We will definitely need some amount of performance benchmarks even if this is optional. But are there other things to worry about ? Any strong objections to this idea or any other stow stopper for pg_rewind itself ?

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Patch for reserved connections for replication users
Next
From: Amit Kapila
Date:
Subject: Re: Review: Patch to compute Max LSN of Data Pages