On Tue, 2010-09-07 at 16:31 +0200, Markus Wanner wrote:
> On 09/07/2010 04:15 PM, Robert Haas wrote:
> > In theory, that's true, but if we do that, then there's an even bigger
> > problem: the slave might have replayed WAL ahead of the master
> > location; therefore the slave is now corrupt and a new base backup
> > must be taken.
>
> The slave isn't corrupt. It would suffice to "late abort" committed
> transactions the master doesn't know about.
The slave *might* be ahead of the master. And if it is, the case we're
discussing is where the master just crashed and *might* not even be
coming back at all, at least for a while. The standby does differ from
master, but with the master down I don't regard that as a useful
statement.
If we wait for fsync on master and then transfer to standby the times
are additive. If we do them concurrently the response times will be the
maximum response time of fsync/transfer, as Markus observes.
ISTM that most people would be more interested in reducing response
times by ~50% rather than in being exactly correct in an edge case. So
we should be planning that as a robustness option, not "it cannot be
done", which seems to be echoing around to much for my liking.
-- Simon Riggs www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services