Simon Riggs wrote:
> On Thu, 2007-04-19 at 22:37 +0200, Florian G. Pflug wrote:
>> The problem is that after a crash, the master might complete incomplete
>> actions via rm_cleanup() - but since it won't wal-log those changes,
>> the slave won't know about this. This will at least prevent the creation
>> of any further restart points on the slave (because safe_restartpoint)
>> will never return true again - it it might even cause data corruption,
>> if subsequent wal records are interpreted wrongly by the slave because
>> it sees other data than the master did when it generated them.
>
> I agree the problem exists. It is somewhat rare because the idea is that
> if the Primary crashes we would failover to the Standby, which would
> mean that both Primary and Standby have executed rm_cleanup(), if
> needed.
>
> So in the case where the Primary fails and we choose *not* to failover,
> there is a potential difficulty on the Standby.
It occured to me today that this might plague 8.1 too. As you explain below,
the problem is not really connected to restartpoints - failing to create
them is merely a sympton of this. On 8.1, this might still lead to rm_cleanup()
being called much "later" (if you consider the wal position to be the "time")
on the slave than on the master. I dunno if this really causes trouble - I
don't yet understand the btree code well enough to judge this.
> The rationale for this fix could be described somewhat differently:
>
> When we shutdown, we know for certain that safe_restartpoint() is true.
> However, we don't know whether it is true because we successfully did a
> clean shutdown, or because we crashed, recovered and then issued a
> shutdown checkpoint following recovery. In the latter case we *must*
> execute rm_cleanup() on the standby because it has been executed on the
> primary. Not doing so at this point *might* be safe, but is not certain
> to be safe. We don't *need* to log a restartpoint at this time, but it
> seems sensible to do so.
While creating the patch, I've been thinking if it might be worthwile
to note that we just did recovery in the ShutdownCheckpoint
(or create a new checkpoint type RecoveryCheckpoint). This wouldl allow
for more error checking, because then the slave could check that
safe_restartpoint() is true for all ShutdownCheckpoints that were not
after recovering.
> We need to check that rm_cleanup() routines don't assume that they will
> only ever be called once or this will clearly fail. There is also no
> need to call rm_cleanup() unless rm_safe_restartpoint() is false.
But a non-idempotent rm_cleanup() routine will cause trouble anyway,
if postgres crashes after having called rm_cleanup() but before creating
the ShutdownCheckpoint.
greetings, Florian Pflug