Re: Patch for fail-back without fresh backup - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Patch for fail-back without fresh backup |
Date | |
Msg-id | CA+U5nMJQyk6OYewL0pozmGOQC_VMDAm_b6=vRv5eNDg7N8rbLw@mail.gmail.com Whole thread Raw |
In response to | Patch for fail-back without fresh backup (Samrat Revagade <revagade.samrat@gmail.com>) |
Responses |
Re: Patch for fail-back without fresh backup
Re: Patch for fail-back without fresh backup |
List | pgsql-hackers |
On 14 June 2013 10:11, Samrat Revagade <revagade.samrat@gmail.com> wrote: > We have already started a discussion on pgsql-hackers for the problem of > taking fresh backup during the failback operation here is the link for that: > > http://www.postgresql.org/message-id/CAF8Q-Gxg3PQTf71NVECe-6OzRaew5pWhk7yQtbJgWrFu513s+Q@mail.gmail.com > So our proposal on this problem is that we must ensure that master should > not make any file system level changes without confirming that the > corresponding WAL record is replicated to the standby. > 1. The main objection was raised by Tom and others is that we should not add > this feature and should go with traditional way of taking fresh backup using > the rsync, because he was concerned about the additional complexity of the > patch and the performance overhead during normal operations. > > 2. Tom and others were also worried about the inconsistencies in the crashed > master and suggested that its better to start with a fresh backup. Fujii > Masao and others correctly countered that suggesting that we trust WAL > recovery to clear all such inconsistencies and there is no reason why we > can't do the same here. > So the patch is showing 1-2% performance overhead. Let's have a look at this... The objections you summarise that Tom has made are ones that I agree with. I also don't think that Fujii "correctly countered" those objections. My perspective is that if the master crashed, assuming that you know everything about that and suddenly jumping back on seem like a recipe for disaster. Attempting that is currently blocked by the technical obstacles you've identified, but that doesn't mean they are the only ones - we don't yet understand what all the problems lurking might be. Personally, I won't be following you onto that minefield anytime soon. So I strongly object to calling this patch anything to do with "failback safe". You simply don't have enough data to make such a bold claim. (Which is why we call it synchronous replication and not "zero data loss", for example). But that's not the whole story. I can see some utility in a patch that makes all WAL transfer synchronous, rather than just commits. Some name like synchronous_transfer might be appropriate. e.g. synchronous_transfer = all | commit (default). The idea of another slew of parameters that are very similar to synchronous replication but yet somehow different seems weird. I can't see a reason why we'd want a second lot of parameters. Why not just use the existing ones for sync rep? (I'm surprised the Parameter Police haven't visited you in the night...) Sure, we might want to expand the design for how we specify multi-node sync rep, but that is a different patch. I'm worried to see that adding this feature and yet turning it off causes a measureable drop in performance. I don't think we want that at all. That clearly needs more work and thought. I also think your performance results are somewhat bogus. Fast transaction workloads were already mostly commit waits - measurements of what happens to large loads, index builds etc would likely reveal something quite different. I'm tempted by the thought that we should put the WaitForLSN inside XLogFlush, rather than scatter additional calls everywhere and then have us inevitably miss one. --Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: