Re: [BUG] Archive recovery failure on 9.3+. - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: [BUG] Archive recovery failure on 9.3+.
Date
Msg-id 52D3CAFF.3010807@vmware.com
Whole thread Raw
In response to Re: [BUG] Archive recovery failure on 9.3+.  (Josh Berkus <josh@agliodbs.com>)
Responses Re: [BUG] Archive recovery failure on 9.3+.  (Tomonari Katsumata <katsumata.tomonari@po.ntts.co.jp>)
Re: [BUG] Archive recovery failure on 9.3+.  (Christoph Berg <christoph.berg@credativ.de>)
List pgsql-hackers
On 01/09/2014 10:55 PM, Josh Berkus wrote:
> On 01/09/2014 12:05 PM, Heikki Linnakangas wrote:
>
>> Actually, why is the partially-filled 000000010000000000000002 file
>> archived in the first place? Looking at the code, it's been like that
>> forever, but it seems like a bad idea. If the original server is still
>> up and running, and writing more data to that file, what will happen is
>> that when the original server later tries to archive it, it will fail
>> because the partial version of the file is already in the archive. Or
>> worse, the partial version overwrites a previously archived more
>> complete version.
>
> Oh!  This explains some transient errors I've seen.
>
>> Wouldn't it be better to not archive the old segment, and instead switch
>> to a new segment after writing the end-of-recovery checkpoint, so that
>> the segment on the new timeline is archived sooner?
>
> It would be better to zero-fill and switch segments, yes.  We should
> NEVER be in a position of archiving two different versions of the same
> segment.

Ok, I think we're in agreement that that's the way to go for master.

Now, what to do about back-branches? On one hand, I'd like to apply the 
same fix to all stable branches, as the current behavior is silly and 
always has been. On the other hand, we haven't heard any complaints 
about it, so we probably shouldn't fix what ain't broken. Perhaps we 
should apply it to 9.3, as that's where we have the acute problem the OP 
reported. Thoughts?

In summary, I propose that we change master and REL9_3_STABLE to not 
archive the partial segment from previous timeline. Older branches will 
keep the current behavior.

- Heikki



pgsql-hackers by date:

Previous
From: Oskari Saarenmaa
Date:
Subject: Re: [PATCH] Filter error log statements by sqlstate
Next
From: Peter Eisentraut
Date:
Subject: Re: generic pseudotype IO functions?