Re: [BUG] Archive recovery failure on 9.3+. - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: [BUG] Archive recovery failure on 9.3+.
Date
Msg-id 52FCF73A.3040208@vmware.com
Whole thread Raw
In response to Re: [BUG] Archive recovery failure on 9.3+.  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: [BUG] Archive recovery failure on 9.3+.  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
On 02/13/2014 02:42 PM, Heikki Linnakangas wrote:
> The behavior where we prefer a segment from archive with lower TLI over
> a file with higher TLI in pg_xlog actually changed in commit
> a068c391ab0. Arguably changing it wasn't a good idea, but the problem
> your test script demonstrates can be fixed by not archiving the partial
> segment, with no change to the preference of archive/pg_xlog. As
> discussed, archiving a partial segment seems like a bad idea anyway, so
> let's just stop doing that.

After some further thought, while not archiving the partial segment 
fixes your test script, it's not enough to fix all variants of the 
problem. Even if archive recovery doesn't archive the last, partial, 
segment, if the original master server is still running, it's entirely 
possible that it fills the segment and archives it. In that case, 
archive recovery will again prefer the archived segment with lower TLI 
over the segment with newer TLI in pg_xlog.

So I agree we should commit the patch you posted (or something to that 
effect). The change to not archive the last segment still seems like a 
good idea, but perhaps we should only do that in master.

Even if after that patch, you can have a problem in more complicated 
scenarios involving both an archive and streaming replication. For 
example, imagine a timeline history like this:

TLI

1 ----+--------------------------->      |
2     +--------------------------->


Now imagine that timeline 1 has been fully archived, and there are WAL 
segments much higher than the points where the timeline switch occurred 
present in the archive. But none of the WAL segments for timeline 2 have 
been archived; they are only present in a master server. You want to 
perform recovery to timeline 2, using the archived WAL segments for 
timelines 1, and streaming replication to catch up to the tip of timeline 2.

Whether we prefer files from pg_xlog or archive will make no difference 
in this case, as there are no files in pg_xlog. But it will merrily 
apply all the WAL for timeline 1 from the archive that it can find, past 
the timeline switch point. After that, when it tries to connect to the 
server will streaming replication, it will fail.

There's not much we can do about that in 9.2 and below, but in 9.3 the 
timeline history file contains the exact timeline switch points, so we 
could be more careful and not apply any extra WAL on the old timeline 
past the switch point. We could also be more exact in which files we try 
to restore from the archive, instead of just polling every future TLI in 
the history.

- Heikki



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Changeset Extraction v7.6
Next
From: Vik Fearing
Date:
Subject: nextVictimBuffer in README