Thread: pg_stat_archiver issue with aborted archiver

pg_stat_archiver issue with aborted archiver

From
Julien Rouhaud
Date:
Hello,

I just noticed that if the archiver aborts (for instance if the
archive_command exited with a return code > 127), pg_stat_archiver won't
report those failed attempts. This happens with both 9.4 and 9.5 branches.

Please find attached a patch that fix this issue, based on current head.

Regards.
--
Julien Rouhaud
http://dalibo.com - http://dalibo.org


Attachment

Re: pg_stat_archiver issue with aborted archiver

From
Michael Paquier
Date:
On Sun, Jun 7, 2015 at 1:11 AM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> I just noticed that if the archiver aborts (for instance if the
> archive_command exited with a return code > 127), pg_stat_archiver won't
> report those failed attempts. This happens with both 9.4 and 9.5 branches.
>
> Please find attached a patch that fix this issue, based on current head.

The current code seems right to me. When the archive command dies
because of a signal (exit code > 128), the server should fail
immediately with FATAL and should not do any extra processing. It will
also try to archive again the same segment file after restart. When
trying again, if this time the failure is not caused by a signal but
still fails it will be reported to pg_stat_archiver.
-- 
Michael



Re: pg_stat_archiver issue with aborted archiver

From
Julien Rouhaud
Date:
Le 08/06/2015 05:56, Michael Paquier a écrit :
> On Sun, Jun 7, 2015 at 1:11 AM, Julien Rouhaud 
> <julien.rouhaud@dalibo.com> wrote:
>> I just noticed that if the archiver aborts (for instance if the 
>> archive_command exited with a return code > 127),
>> pg_stat_archiver won't report those failed attempts. This happens
>> with both 9.4 and 9.5 branches.
>> 
>> Please find attached a patch that fix this issue, based on
>> current head.
> 
> The current code seems right to me. When the archive command dies 
> because of a signal (exit code > 128), the server should fail 
> immediately with FATAL and should not do any extra processing.

Ok. It may be worth to document it though.

> It will also try to archive again the same segment file after
> restart. When trying again, if this time the failure is not caused
> by a signal but still fails it will be reported to
> pg_stat_archiver.
> 

Yes, my comment was only about the failure not reported in some
special cases.

Thank for your response.
-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org



Re: pg_stat_archiver issue with aborted archiver

From
Fujii Masao
Date:
On Mon, Jun 8, 2015 at 5:17 PM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> Le 08/06/2015 05:56, Michael Paquier a écrit :
>> On Sun, Jun 7, 2015 at 1:11 AM, Julien Rouhaud
>> <julien.rouhaud@dalibo.com> wrote:
>>> I just noticed that if the archiver aborts (for instance if the
>>> archive_command exited with a return code > 127),
>>> pg_stat_archiver won't report those failed attempts. This happens
>>> with both 9.4 and 9.5 branches.
>>>
>>> Please find attached a patch that fix this issue, based on
>>> current head.
>>
>> The current code seems right to me. When the archive command dies
>> because of a signal (exit code > 128), the server should fail
>> immediately with FATAL and should not do any extra processing.

In that case, ISTM that the archiver process dies with FATAL but
the server not. No? Then the archiver is restarted by postmaster.
If my understanding is right, it seems worth applying something like
Julien's patch.

Regards,

--
Fujii Masao



Re: pg_stat_archiver issue with aborted archiver

From
Michael Paquier
Date:
On Tue, Jun 9, 2015 at 4:23 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Mon, Jun 8, 2015 at 5:17 PM, Julien Rouhaud
> <julien.rouhaud@dalibo.com> wrote:
>> Le 08/06/2015 05:56, Michael Paquier a écrit :
>>> On Sun, Jun 7, 2015 at 1:11 AM, Julien Rouhaud
>>> <julien.rouhaud@dalibo.com> wrote:
>>>> I just noticed that if the archiver aborts (for instance if the
>>>> archive_command exited with a return code > 127),
>>>> pg_stat_archiver won't report those failed attempts. This happens
>>>> with both 9.4 and 9.5 branches.
>>>>
>>>> Please find attached a patch that fix this issue, based on
>>>> current head.
>>>
>>> The current code seems right to me. When the archive command dies
>>> because of a signal (exit code > 128), the server should fail
>>> immediately with FATAL and should not do any extra processing.
>
> In that case, ISTM that the archiver process dies with FATAL but
> the server not. No? Then the archiver is restarted by postmaster.
> If my understanding is right, it seems worth applying something like
> Julien's patch.

Er, sure. Please understand the archiver process... My point is that
3ad0728 introduced the behavior that we have now in pgarch.c, and that
we should immediately bail out from the archiver process without
interacting with pgstat, the archiver coming back to this file
archiving at restart, and only use pgstat_send_archiver when there is
a status from pgarch_archiveXlog().
--
Michael