On Sun, 2006-07-16 at 10:51 -0400, Tom Lane wrote:
> Andreas Seltenreich <andreas+pg@gate450.dyndns.org> writes:
> > Simon Riggs <simon@2ndquadrant.com> writes:
> >> [2. text/x-patch; restartableRecovery.patch]
>
> > Hmm, wouldn't you have to reboot the resource managers at each
> > checkpoint? I'm afraid otherwise things like postponed page splits
> > could get lost on restart from a later checkpoint.
>
> Ouch. That's a bit nasty. You can't just apply a postponed split at
> checkpoint time, because the WAL record could easily be somewhere after
> the checkpoint, leading to duplicate insertions. Right offhand I don't
> see how to make this work :-(
Yes, ouch. So much for gung-ho code sprints; thanks Andreas.
To do this we would need to have another rmgr specific routine that gets
called at a recovery checkpoint. This would then write to disk the
current state of the incomplete multi-WAL actions, in some manner.
During the startup routines we would check for any pre-existing state
files and use those to initialise the incomplete action cache. Cleanup
would then discard all state files.
That allows us to not-forget actions, but it doesn't help us if there
are problems repeating actions twice. We would at least know that we are
in a potential double-action zone and could give different kinds of
errors or handling.
Or we can simply mark any indexes incomplete-needs-rebuild if they had a
page split during the overlap time between the last known good recovery
checkpoint and the following one. But that does lead to randomly bounded
recovery time, which might be better to have started from scratch
anyway.
Given time available for 8.2, neither one is a quick fix.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com