Re: Verified fix for Bug 4137 - Mailing list pgsql-patches

From Simon Riggs
Subject Re: Verified fix for Bug 4137
Date
Msg-id 1210081444.4435.317.camel@ebony.site
Whole thread Raw
In response to Re: Verified fix for Bug 4137  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
Responses Re: Verified fix for Bug 4137
List pgsql-patches
On Tue, 2008-05-06 at 12:02 +0100, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > The problem was that at the very start of archive recovery the %r
> > parameter in restore_command could be set to a filename later than the
> > currently requested filename (%f). This could lead to early truncation
> > of the archived WAL files and would cause warm standby replication to
> > fail soon afterwards, in certain specific circumstances.
> >
> > Fix applied to both core server in generating correct %r filenames and
> > also to pg_standby to prevent acceptance of out-of-sequence filenames.
>
> So the core problem is that we use ControlFile->checkPointCopy.redo in
> RestoreArchivedFile to determine the safe truncation point, but when
> there's a backup label file, that's still coming from pg_control file,
> which is wrong.
>
> The patch fixes that by determining the safe truncation point as
> Min(checkPointCopy.redo, xlogfname), where xlogfname is the xlog file
> being restored. That depends on the assumption that everything before
> the first file we (try to) restore is safe to truncate. IOW, we never
> try to restore file B first, and then A, where A < B.
>
> I'm not totally convinced that's a safe assumption. As an example,
> consider doing an archive recovery, but without a backup label, and the
> latest checkpoint record is broken. We would try to read the latest
> (broken) checkpoint record first, and call RestoreArchivedFile to get
> the xlog file containing that. But because that record is broken, we
> fall back to using the previous checkpoint, and will need the xlog file
> where the previous checkpoint record is in.

> That's a pretty contrived example, but the point is that assumption
> feels fragile. At the very least it should be noted explicitly in the
> comments. A less fragile approach would be to use something dummy, like
> "000000000000000000000000" as the truncation point until we've
> successfully read the checkpoint/restartpoint record and started the replay.

Thanks for thinking about this.

Falling back to the secondary checkpoint implies we have a corrupted or
absent WAL file, so making recovery startup work correctly won't avoid
the need to re-run the base backup. We'll end with an unrecoverable
error in either case, so it doesn't seem worth attempting to improve
this in the way you suggest.

I think we should completely prevent access to secondary checkpoints
during archive recovery, because if the primary checkpoint isn't present
or is corrupt we aren't ever going to get passed it to get to the
pg_stop_backup() point. Hence an archive recovery can never be valid in
that case. I'll do a separate patch for that because they are unrelated
issues.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


pgsql-patches by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Verified fix for Bug 4137
Next
From: "Heikki Linnakangas"
Date:
Subject: Re: Verified fix for Bug 4137