Re: [HACKERS] Walsender timeouts and large transactions - Mailing list pgsql-hackers

From Petr Jelinek
Subject Re: [HACKERS] Walsender timeouts and large transactions
Date
Msg-id 97e5e70a-87d2-dee1-0482-7e3609822095@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] Walsender timeouts and large transactions  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: [HACKERS] Walsender timeouts and large transactions  (Petr Jelinek <petr.jelinek@2ndquadrant.com>)
List pgsql-hackers
On 30/05/17 11:02, Kyotaro HORIGUCHI wrote:
> At Thu, 25 May 2017 17:52:50 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in
<e082a56a-fd95-a250-3bae-0fff93832510@2ndquadrant.com>
>> Hi,
>>
>> We have had issue with walsender timeout when used with logical decoding
>> and the transaction is taking long time to be decoded (because it
>> contains many changes)
>>
>> I was looking today at the walsender code and realized that it's because
>> if the network and downstream are fast enough, we'll always take fast
>> path in WalSndWriteData which does not do reply or keepalive processing
>> and is only reached once the transaction has finished by other code. So
>> paradoxically we die of timeout because everything was fast enough to
>> never fall back to slow code path.
>>
>> I propose we only use fast path if the last processed reply is not older
>> than half of walsender timeout, if it is then we'll force the slow code
>> path to process the replies again. This is similar logic that we use to
>> determine if to send keepalive message. I also added CHECK_INTERRUPRS
>> call to fast code path because otherwise walsender might ignore them for
>> too long on large transactions.
>>
>> Thoughts?
> 
> +    TimestampTz    now = GetCurrentTimestamp();
> 
> I think it is not recommended to read the current time too
> frequently, especially within a loop that hates slowness. (I
> suppose that a loop that can fill up a send queue falls into that

Yeah that was my main worry for the patch as well, although given that
the loop does tuple processing it might not be very noticeable.

> category.)  If you don't mind a certain amount of additional
> complexity for eliminating the possible slowdown by the check,
> timeout would be usable. Attached patch does almost the same
> thing with your patch but without busy time check.
> 
> What do you think about this?
> 

I think we could do it that way.

> # I saw that SIGQUIT doens't work for active publisher, which I
> # think mention in another thread.

Ah missed that email I guess, but that's the missing CHECK_INTERRUPTS();
in the fast-path which btw your updated patch is missing as well.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] [COMMITTERS] Re: pgsql: Code review focused on new node types added by partitioning supp
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] [PATCH] relocation truncated to fit: citus buildfailure on s390x