Home > mailing lists

Re: Verified fix for Bug 4137 - Mailing list pgsql-patches

From	Heikki Linnakangas
Subject	Re: Verified fix for Bug 4137
Date	May 6, 2008 08:05:51
Msg-id	48203AE0.2080508@enterprisedb.com Whole thread Raw
In response to	Verified fix for Bug 4137 (Simon Riggs <simon@2ndquadrant.com>)
Responses	Re: Verified fix for Bug 4137 Re: Verified fix for Bug 4137
List	pgsql-patches

Tree view

Simon Riggs wrote:
> The problem was that at the very start of archive recovery the %r
> parameter in restore_command could be set to a filename later than the
> currently requested filename (%f). This could lead to early truncation
> of the archived WAL files and would cause warm standby replication to
> fail soon afterwards, in certain specific circumstances.
>
> Fix applied to both core server in generating correct %r filenames and
> also to pg_standby to prevent acceptance of out-of-sequence filenames.

So the core problem is that we use ControlFile->checkPointCopy.redo in
RestoreArchivedFile to determine the safe truncation point, but when
there's a backup label file, that's still coming from pg_control file,
which is wrong.

The patch fixes that by determining the safe truncation point as
Min(checkPointCopy.redo, xlogfname), where xlogfname is the xlog file
being restored. That depends on the assumption that everything before
the first file we (try to) restore is safe to truncate. IOW, we never
try to restore file B first, and then A, where A < B.

I'm not totally convinced that's a safe assumption. As an example,
consider doing an archive recovery, but without a backup label, and the
latest checkpoint record is broken. We would try to read the latest
(broken) checkpoint record first, and call RestoreArchivedFile to get
the xlog file containing that. But because that record is broken, we
fall back to using the previous checkpoint, and will need the xlog file
where the previous checkpoint record is in.

That's a pretty contrived example, but the point is that assumption
feels fragile. At the very least it should be noted explicitly in the
comments. A less fragile approach would be to use something dummy, like
"000000000000000000000000" as the truncation point until we've
successfully read the checkpoint/restartpoint record and started the replay.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

pgsql-patches by date:

From: Simon Riggs
Date: 06 May 2008, 05:29:26
Subject: Verified fix for Bug 4137

From: Simon Riggs
Date: 06 May 2008, 08:21:56
Subject: Re: Verified fix for Bug 4137

Re: Verified fix for Bug 4137 - Mailing list pgsql-patches

Previous

Next