Re: Failing start-up archive recovery at Standby mode in PG9.2.4 - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Failing start-up archive recovery at Standby mode in PG9.2.4
Date
Msg-id 517A3688.7090506@vmware.com
Whole thread Raw
In response to Re: Failing start-up archive recovery at Standby mode in PG9.2.4  (Amit Langote <amitlangote09@gmail.com>)
List pgsql-hackers
On 26.04.2013 07:47, Amit Langote wrote:
>   How would code after applying this patch behave if a recycled segment gets
> renamed using the newest timeline (say 3) while we are still recovering from
> a lower timeline (say 2)? In that case, since XLogFileReadAnyTLI returns
> that recycled segment as the next segment to recover from, we get the error.
> And since XLogFileReadAnyTLI iterates over expectedTLIs (whose head seems to
> be recoveryTargetTLI at all times, is that right?), it will return that
> wrong (recycled segment) in the first iteration itself.

As long as the right segment is present in the archive, that's OK. Even
if a recycled segment with higher TLI is in pg_xlog, with the patch
we'll still read the segment with lower TLI from the archive. But there
is a corner-case where a recycled segment with a higher TLI masks a
segment with lower TLI in pg_xlog. For example, if you try to recover by
copying all the required WAL files directly into pg_xlog, without using
restore_command, you could run into problems.

So yeah, I think you're right and we need to rethink the recycling. The
first question is, do we have to recycle WAL segments during recovery at
all? It's pointless when we're restoring from archive with
restore_command; the recycled files will just get replaced with files
from the archive. It does help when walreceiver is active, but I wonder
how significant it is in practice.

I guess the safest, smallest change is to use a lower TLI when
installing the recycled files. So, instead of using the current recovery
target timeline, use the ID of the timeline we're currently recovering.
That way the reycycled segments will never have a higher TLI than other
files that recovery will try to replay. See attached patch.

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Failing start-up archive recovery at Standby mode in PG9.2.4
Next
From: Ants Aasma
Date:
Subject: Re: Substituting Checksum Algorithm (was: Enabling Checksums)