Re: Verified fix for Bug 4137 - Mailing list pgsql-patches

From Simon Riggs
Subject Re: Verified fix for Bug 4137
Date
Msg-id 1210072983.4435.263.camel@ebony.site
Whole thread Raw
In response to Re: Verified fix for Bug 4137  ("Heikki Linnakangas" <heikki@enterprisedb.com>)
List pgsql-patches
On Tue, 2008-05-06 at 12:02 +0100, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > The problem was that at the very start of archive recovery the %r
> > parameter in restore_command could be set to a filename later than the
> > currently requested filename (%f). This could lead to early truncation
> > of the archived WAL files and would cause warm standby replication to
> > fail soon afterwards, in certain specific circumstances.
> >
> > Fix applied to both core server in generating correct %r filenames and
> > also to pg_standby to prevent acceptance of out-of-sequence filenames.
>
> So the core problem is that we use ControlFile->checkPointCopy.redo in
> RestoreArchivedFile to determine the safe truncation point, but when
> there's a backup label file, that's still coming from pg_control file,
> which is wrong.
>
> The patch fixes that by determining the safe truncation point as
> Min(checkPointCopy.redo, xlogfname), where xlogfname is the xlog file
> being restored. That depends on the assumption that everything before
> the first file we (try to) restore is safe to truncate. IOW, we never
> try to restore file B first, and then A, where A < B.
>
> I'm not totally convinced that's a safe assumption. As an example,
> consider doing an archive recovery, but without a backup label, and the
> latest checkpoint record is broken. We would try to read the latest
> (broken) checkpoint record first, and call RestoreArchivedFile to get
> the xlog file containing that. But because that record is broken, we
> fall back to using the previous checkpoint, and will need the xlog file
> where the previous checkpoint record is in.
>
> That's a pretty contrived example, but the point is that assumption
> feels fragile. At the very least it should be noted explicitly in the
> comments. A less fragile approach would be to use something dummy, like
> "000000000000000000000000" as the truncation point until we've
> successfully read the checkpoint/restartpoint record and started the replay.

Your comments are interesting and I'll think through that some more to
see if I can see if that leads to bugs. Will talk more later.

The only valid starting place, in all cases, is the checkpoint written
by pg_start_backup(). The user doesn't need to have been archiving files
prior to that so could legitimately have had a null archive_command.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


pgsql-patches by date:

Previous
From: "Heikki Linnakangas"
Date:
Subject: Re: Verified fix for Bug 4137
Next
From: Simon Riggs
Date:
Subject: Re: Verified fix for Bug 4137