Simon Riggs wrote:
> Falling back to the secondary checkpoint implies we have a corrupted or
> absent WAL file, so making recovery startup work correctly won't avoid
> the need to re-run the base backup. We'll end with an unrecoverable
> error in either case, so it doesn't seem worth attempting to improve
> this in the way you suggest.
That's true whenever you have to fall back to a secondary checkpoint,
but we still try to get the database up. One could argue that we
shouldn't, of course.
Anyway, the point is that the patch relies on a non-obvious assumption.
Even if the secondary checkpoint issue is a non-issue, it's not obvious
(to me at least) that there isn't other similar scenarios. And someone
might inadvertently break the assumption in a future patch, because it's
not an obvious one; calling ReadRecord looks very innocent. We shouldn't
introduce an assumption like that when we don't have to.
> I think we should completely prevent access to secondary checkpoints
> during archive recovery, because if the primary checkpoint isn't present
> or is corrupt we aren't ever going to get passed it to get to the
> pg_stop_backup() point. Hence an archive recovery can never be valid in
> that case. I'll do a separate patch for that because they are unrelated
> issues.
Well, we already don't use the secondary checkpoint if a backup label
file is present. And you can take a base backup without
pg_start_backup()/pg_stop_backup() if you shut down the system first (a
cold backup).
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com