Re: Verified fix for Bug 4137 - Mailing list pgsql-patches

From Heikki Linnakangas
Subject Re: Verified fix for Bug 4137
Date
Msg-id 48206471.6080402@enterprisedb.com
Whole thread Raw
In response to Re: Verified fix for Bug 4137  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Verified fix for Bug 4137
List pgsql-patches
Simon Riggs wrote:
> Falling back to the secondary checkpoint implies we have a corrupted or
> absent WAL file, so making recovery startup work correctly won't avoid
> the need to re-run the base backup. We'll end with an unrecoverable
> error in either case, so it doesn't seem worth attempting to improve
> this in the way you suggest.

That's true whenever you have to fall back to a secondary checkpoint,
but we still try to get the database up. One could argue that we
shouldn't, of course.

Anyway, the point is that the patch relies on a non-obvious assumption.
Even if the secondary checkpoint issue is a non-issue, it's not obvious
(to me at least) that there isn't other similar scenarios. And someone
might inadvertently break the assumption in a future patch, because it's
not an obvious one; calling ReadRecord looks very innocent. We shouldn't
introduce an assumption like that when we don't have to.

> I think we should completely prevent access to secondary checkpoints
> during archive recovery, because if the primary checkpoint isn't present
> or is corrupt we aren't ever going to get passed it to get to the
> pg_stop_backup() point. Hence an archive recovery can never be valid in
> that case. I'll do a separate patch for that because they are unrelated
> issues.

Well, we already don't use the secondary checkpoint if a backup label
file is present. And you can take a base backup without
pg_start_backup()/pg_stop_backup() if you shut down the system first (a
cold backup).

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

pgsql-patches by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Verified fix for Bug 4137
Next
From: Simon Riggs
Date:
Subject: Re: Verified fix for Bug 4137