Re: [BUG] Archive recovery failure on 9.3+. - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: [BUG] Archive recovery failure on 9.3+.
Date
Msg-id 52FC9468.4050602@vmware.com
Whole thread Raw
In response to Re: [BUG] Archive recovery failure on 9.3+.  (Christoph Berg <christoph.berg@credativ.de>)
Responses Re: [BUG] Archive recovery failure on 9.3+.  (Christoph Berg <christoph.berg@credativ.de>)
List pgsql-hackers
On 02/12/2014 01:24 PM, Christoph Berg wrote:
> Re: Heikki Linnakangas 2014-01-13 <52D3CAFF.3010807@vmware.com>
>>>> Actually, why is the partially-filled 000000010000000000000002 file
>>>> archived in the first place? Looking at the code, it's been like that
>>>> forever, but it seems like a bad idea. If the original server is still
>>>> up and running, and writing more data to that file, what will happen is
>>>> that when the original server later tries to archive it, it will fail
>>>> because the partial version of the file is already in the archive. Or
>>>> worse, the partial version overwrites a previously archived more
>>>> complete version.
>>>
>>> Oh!  This explains some transient errors I've seen.
>>>
>>>> Wouldn't it be better to not archive the old segment, and instead switch
>>>> to a new segment after writing the end-of-recovery checkpoint, so that
>>>> the segment on the new timeline is archived sooner?
>>>
>>> It would be better to zero-fill and switch segments, yes.  We should
>>> NEVER be in a position of archiving two different versions of the same
>>> segment.
>>
>> Ok, I think we're in agreement that that's the way to go for master.
>>
>> Now, what to do about back-branches? On one hand, I'd like to apply
>> the same fix to all stable branches, as the current behavior is
>> silly and always has been. On the other hand, we haven't heard any
>> complaints about it, so we probably shouldn't fix what ain't broken.
>> Perhaps we should apply it to 9.3, as that's where we have the acute
>> problem the OP reported. Thoughts?
>>
>> In summary, I propose that we change master and REL9_3_STABLE to not
>> archive the partial segment from previous timeline. Older branches
>> will keep the current behavior.
>
> I've seen the "can't archive file from the old timeline" problem on
> 9.2 and 9.3 slaves after promotion. The problem is in conjunction with
> the proposed archive_command in the default postgresql.conf comments:
>
> # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
>
> With 9.1, it works, but 9.2 and 9.3 don't archive anything until I
> remove the "test ! -f" part. (An alternative fix would be to declare
> the behavior ok and adjust that example in the config.)

Hmm, the behavior is the same in 9.1 and 9.2. Did you use a different 
archive_command in 9.1, without the "test"?

- Heikki



pgsql-hackers by date:

Previous
From: amul sul
Date:
Subject: how set GUC_check_errhint_string in call_string_check_hook()
Next
From: Amit Langote
Date:
Subject: Re: how set GUC_check_errhint_string in call_string_check_hook()