On Thu, 2008-09-11 at 18:17 +0300, Heikki Linnakangas wrote:
> Fujii Masao wrote:
> > I think that this case would often happen. So, we should establish a certain
> > solution or procedure to the case where TLI of the master doesn't match
> > TLI of the slave. If we only allow the case where TLI of both servers is the
> > same, the configuration after failover always needs to get the base backup
> > on the new master. It's unacceptable for many users. But, I think that it's
> > the role of admin or external tools to copy history files to the slave from
> > the master.
>
> Hmm. There's more problems than the TLI with that. For the original
> master to catch up by replaying WAL from the new slave, without
> restoring from a full backup, the original master must not write to disk
> *any* WAL that hasn't made it to the slave yet. That is certainly not
> true for asynchronous replication, but it also throws off the idea of
> flushing the WAL concurrently to the local disk and to the slave in
> synchronous mode.
>
> I agree that having to get a new base backup to get the old master catch
> up with the new master sucks, so I hope someone sees a way around that.
If we were going to recover from failed-over standby back to original
master just via WAL logs we would need all of the WAL files from the
point of failover. So you'd need to be storing all WAL file just in case
the old master recovers. I can't believe doing that would be the common
case, because its so impractical and most people would run out of disk
space and need to delete WAL files.
It should be clear that to make this work you must run with a base
backup that was derived correctly on the current master. You can do that
by re-copying everything, or you can do that by just shipping changed
blocks (rsync etc). So I don't see a problem in the first place.
-- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support