> The real problem here is that we're sending records to the slave which
> might cease to exist on the master if it unexpectedly reboots. I
> believe that what we need to do is make sure that the master only
> sends WAL it has already fsync'd
How about this :
- pg records somewhere the xlog position of the last record synced to
disk. I dont remember the variable name, let's just say xlog_synced_recptr
- pg always writes the xlog first, ie. before writing any page it checks
that the page's xlog recptr < xlog_synced_recptr and if it's not the case
it has to wait before it can write the page.
Now :
- master sends messages to slave with the xlog_synced_recptr after each
fsync
- slave gets these messages and records the master_xlog_synced_recptr
- slave doesn't write any page to disk until BOTH the slave's local WAL
copy AND the master's WAL have reached the recptr of this page
If a master crashes or the slave loses connection, then the in-memory
pages of the slave could be in a state that is "in the future" compared to
the master's state when it comes up.
Therefore when a slave detects that the master has crashed, it could shoot
itself and recover from WAL, at which point the slave will not be "in the
future" anymore from the master, rather it would be in the past, which is
a lot less problematic...
Of course this wouldn't speed up the failover process !...
> I think we should also change the slave to panic and shut down
> immediately if its xlog position is ahead of the master. That can
> never be a watertight solution because you can always advance the xlog
> position on them master and mask the problem. But I think we should
> do it anyway, so that we at least have a chance of noticing that we're
> hosed. I wish I could think of something a little more watertight...
If a slave is "in the future" relative to the master, then the only way to
keep using this slave could be to make it the new master...