Thread: [HACKERS] Walsender timeouts and large transactions

[HACKERS] Walsender timeouts and large transactions

From
Petr Jelinek
Date:
Hi,

We have had issue with walsender timeout when used with logical decoding
and the transaction is taking long time to be decoded (because it
contains many changes)

I was looking today at the walsender code and realized that it's because
if the network and downstream are fast enough, we'll always take fast
path in WalSndWriteData which does not do reply or keepalive processing
and is only reached once the transaction has finished by other code. So
paradoxically we die of timeout because everything was fast enough to
never fall back to slow code path.

I propose we only use fast path if the last processed reply is not older
than half of walsender timeout, if it is then we'll force the slow code
path to process the replies again. This is similar logic that we use to
determine if to send keepalive message. I also added CHECK_INTERRUPRS
call to fast code path because otherwise walsender might ignore them for
too long on large transactions.

Thoughts?

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Walsender timeouts and large transactions

From
Kyotaro HORIGUCHI
Date:
At Thu, 25 May 2017 17:52:50 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in
<e082a56a-fd95-a250-3bae-0fff93832510@2ndquadrant.com>
> Hi,
> 
> We have had issue with walsender timeout when used with logical decoding
> and the transaction is taking long time to be decoded (because it
> contains many changes)
> 
> I was looking today at the walsender code and realized that it's because
> if the network and downstream are fast enough, we'll always take fast
> path in WalSndWriteData which does not do reply or keepalive processing
> and is only reached once the transaction has finished by other code. So
> paradoxically we die of timeout because everything was fast enough to
> never fall back to slow code path.
> 
> I propose we only use fast path if the last processed reply is not older
> than half of walsender timeout, if it is then we'll force the slow code
> path to process the replies again. This is similar logic that we use to
> determine if to send keepalive message. I also added CHECK_INTERRUPRS
> call to fast code path because otherwise walsender might ignore them for
> too long on large transactions.
> 
> Thoughts?

+    TimestampTz    now = GetCurrentTimestamp();

I think it is not recommended to read the current time too
frequently, especially within a loop that hates slowness. (I
suppose that a loop that can fill up a send queue falls into that
category.)  If you don't mind a certain amount of additional
complexity for eliminating the possible slowdown by the check,
timeout would be usable. Attached patch does almost the same
thing with your patch but without busy time check.

What do you think about this?

# I saw that SIGQUIT doens't work for active publisher, which I
# think mention in another thread.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 49cce38..ec33357 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -194,6 +194,15 @@ static volatile sig_atomic_t replication_active = false;static LogicalDecodingContext
*logical_decoding_ctx= NULL;static XLogRecPtr logical_startptr = InvalidXLogRecPtr;
 
+/*
+ * For logical replication, WalSndWriteData needs to process replies from the
+ * client to check if keepalive to be sent.  WAL send loop may skip check
+ * replies only while check_replies_needed = false.
+ */
+#define REPLY_SEND_TIMEOUT USER_TIMEOUT
+static bool    keepalive_timeout_initialized = false;
+static bool    check_replies_needed = false;
+/* A sample associating a WAL location with the time it was written. */typedef struct{
@@ -1175,12 +1184,12 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
memcpy(&ctx->out->data[1+ sizeof(int64) + sizeof(int64)],           tmpbuf.data, sizeof(int64));
 
-    /* fast path */    /* Try to flush pending output to the client */    if (pq_flush_if_writable() != 0)
WalSndShutdown();
-    if (!pq_is_send_pending())
+     /* fast path: return immediately if possible */
+    if (!check_replies_needed && !pq_is_send_pending())        return;    for (;;)
@@ -1216,10 +1225,6 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,        if
(pq_flush_if_writable()!= 0)            WalSndShutdown();
 
-        /* If we finished clearing the buffered data, we're done here. */
-        if (!pq_is_send_pending())
-            break;
-        now = GetCurrentTimestamp();        /* die if timeout was reached */
@@ -1228,6 +1233,10 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,        /* Send
keepaliveif the time has come */        WalSndKeepaliveIfNecessary(now);
 
+        /* If we finished clearing the buffered data, we're done here. */
+        if (!pq_is_send_pending())
+            break;
+        sleeptime = WalSndComputeSleeptime(now);        wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
@@ -1562,6 +1571,13 @@ exec_replication_command(const char *cmd_string)    return true;}
+static void
+LogicalDecodeReplyTimeoutHandler(void)
+{
+    check_replies_needed = true;
+}
+
+/* * Process any incoming messages while streaming. Also checks if the remote * end has closed the connection.
@@ -1662,6 +1678,22 @@ ProcessRepliesIfAny(void)    {        last_reply_timestamp = GetCurrentTimestamp();
waiting_for_ping_response= false;
 
+
+        if (wal_sender_timeout > 0)
+        {
+            if (!keepalive_timeout_initialized)
+            {
+                RegisterTimeout(REPLY_SEND_TIMEOUT,
+                                LogicalDecodeReplyTimeoutHandler);
+                keepalive_timeout_initialized = true;
+            }
+
+            check_replies_needed = false;
+            enable_timeout_at(REPLY_SEND_TIMEOUT,
+                          TimestampTzPlusMilliseconds(last_reply_timestamp,
+                                                      wal_sender_timeout / 2));
+        }
+            }}

Re: [HACKERS] Walsender timeouts and large transactions

From
Petr Jelinek
Date:
On 30/05/17 11:02, Kyotaro HORIGUCHI wrote:
> At Thu, 25 May 2017 17:52:50 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in
<e082a56a-fd95-a250-3bae-0fff93832510@2ndquadrant.com>
>> Hi,
>>
>> We have had issue with walsender timeout when used with logical decoding
>> and the transaction is taking long time to be decoded (because it
>> contains many changes)
>>
>> I was looking today at the walsender code and realized that it's because
>> if the network and downstream are fast enough, we'll always take fast
>> path in WalSndWriteData which does not do reply or keepalive processing
>> and is only reached once the transaction has finished by other code. So
>> paradoxically we die of timeout because everything was fast enough to
>> never fall back to slow code path.
>>
>> I propose we only use fast path if the last processed reply is not older
>> than half of walsender timeout, if it is then we'll force the slow code
>> path to process the replies again. This is similar logic that we use to
>> determine if to send keepalive message. I also added CHECK_INTERRUPRS
>> call to fast code path because otherwise walsender might ignore them for
>> too long on large transactions.
>>
>> Thoughts?
> 
> +    TimestampTz    now = GetCurrentTimestamp();
> 
> I think it is not recommended to read the current time too
> frequently, especially within a loop that hates slowness. (I
> suppose that a loop that can fill up a send queue falls into that

Yeah that was my main worry for the patch as well, although given that
the loop does tuple processing it might not be very noticeable.

> category.)  If you don't mind a certain amount of additional
> complexity for eliminating the possible slowdown by the check,
> timeout would be usable. Attached patch does almost the same
> thing with your patch but without busy time check.
> 
> What do you think about this?
> 

I think we could do it that way.

> # I saw that SIGQUIT doens't work for active publisher, which I
> # think mention in another thread.

Ah missed that email I guess, but that's the missing CHECK_INTERRUPTS();
in the fast-path which btw your updated patch is missing as well.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Walsender timeouts and large transactions

From
Petr Jelinek
Date:
On 30/05/17 15:44, Petr Jelinek wrote:
> On 30/05/17 11:02, Kyotaro HORIGUCHI wrote:
>>
>> +    TimestampTz    now = GetCurrentTimestamp();
>>
>> I think it is not recommended to read the current time too
>> frequently, especially within a loop that hates slowness. (I
>> suppose that a loop that can fill up a send queue falls into that
> 
> Yeah that was my main worry for the patch as well, although given that
> the loop does tuple processing it might not be very noticeable.
> 

I realized we actually call GetCurrentTimestamp() there anyway (for the
pq_sendint64). So I just modified the patch to use the now variable
there instead which means there are no additional GetCurrentTimestamp()
calls compared to state before patch now.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Walsender timeouts and large transactions

From
Yura Sokolov
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested

There is no check for (last_reply_timestamp <= 0 || wal_sender_timeout <= 0) as in other places
(in WalSndKeepaliveIfNecessary for example).

I don't think moving update of 'now' down to end of loop body is correct:
there are calls to ProcessConfigFile with SyncRepInitConfig, ProcessRepliesIfAny that can
last non-negligible time. It could lead to over sleeping due to larger computed sleeptime.
Though I could be mistaken.

I'm not sure about moving `if (!pg_is_send_pending())` in a body loop after WalSndKeepaliveIfNecessary.
Is it necessary? But it looks harmless at least.

Could patch be reduced to check after first `if (!pg_is_sendpending())` ? like:
if (!pq_is_send_pending())
-        return;
+    {
+        if (last_reply_timestamp <= 0 || wal_sender_timeout <= 0)
+        {
+            CHECK_FOR_INTERRUPTS();
+            return;
+        }
+        if (now <= TimestampTzPlusMilliseconds(last_reply_timestamp, wal_sender_timeout / 2))
+            return;
+    }

If not, what problem prevents?

The new status of this patch is: Waiting on Author

Re: [HACKERS] Walsender timeouts and large transactions

From
Petr Jelinek
Date:
On 02/08/17 19:35, Yura Sokolov wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:       not tested
> Spec compliant:           not tested
> Documentation:            not tested
> 
> There is no check for (last_reply_timestamp <= 0 || wal_sender_timeout <= 0) as in other places
> (in WalSndKeepaliveIfNecessary for example).
> 
> I don't think moving update of 'now' down to end of loop body is correct:
> there are calls to ProcessConfigFile with SyncRepInitConfig, ProcessRepliesIfAny that can
> last non-negligible time. It could lead to over sleeping due to larger computed sleeptime.
> Though I could be mistaken.
> 
> I'm not sure about moving `if (!pg_is_send_pending())` in a body loop after WalSndKeepaliveIfNecessary.
> Is it necessary? But it looks harmless at least.
> 

We also need to do actual timeout handing so that the timeout is not
deferred to the end of the transaction (Which is why I moved `if
(!pg_is_send_pending())` under WalSndCheckTimeOut() and
WalSndKeepaliveIfNecessary() calls).

> Could patch be reduced to check after first `if (!pg_is_sendpending())` ? like:
> 
>     if (!pq_is_send_pending())
> -        return;
> +    {
> +        if (last_reply_timestamp <= 0 || wal_sender_timeout <= 0)
> +        {
> +            CHECK_FOR_INTERRUPTS();
> +            return;
> +        }
> +        if (now <= TimestampTzPlusMilliseconds(last_reply_timestamp, wal_sender_timeout / 2))
> +            return;
> +    }
> 
> If not, what problem prevents?

We should do CHECK_FOR_INTERRUPTS() independently of pq_is_send_pending
so that it's possible to stop walsender while it's processing large
transaction.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [HACKERS] Walsender timeouts and large transactions

From
Craig Ringer
Date:
On 9 August 2017 at 21:23, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
On 02/08/17 19:35, Yura Sokolov wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:       not tested
> Spec compliant:           not tested
> Documentation:            not tested
>
> There is no check for (last_reply_timestamp <= 0 || wal_sender_timeout <= 0) as in other places
> (in WalSndKeepaliveIfNecessary for example).
>
> I don't think moving update of 'now' down to end of loop body is correct:
> there are calls to ProcessConfigFile with SyncRepInitConfig, ProcessRepliesIfAny that can
> last non-negligible time. It could lead to over sleeping due to larger computed sleeptime.
> Though I could be mistaken.
>
> I'm not sure about moving `if (!pg_is_send_pending())` in a body loop after WalSndKeepaliveIfNecessary.
> Is it necessary? But it looks harmless at least.
>

We also need to do actual timeout handing so that the timeout is not
deferred to the end of the transaction (Which is why I moved `if
(!pg_is_send_pending())` under WalSndCheckTimeOut() and
WalSndKeepaliveIfNecessary() calls).

> Could patch be reduced to check after first `if (!pg_is_sendpending())` ? like:
>
>       if (!pq_is_send_pending())
> -             return;
> +     {
> +             if (last_reply_timestamp <= 0 || wal_sender_timeout <= 0)
> +             {
> +                     CHECK_FOR_INTERRUPTS();
> +                     return;
> +             }
> +             if (now <= TimestampTzPlusMilliseconds(last_reply_timestamp, wal_sender_timeout / 2))
> +                     return;
> +     }
>
> If not, what problem prevents?

We should do CHECK_FOR_INTERRUPTS() independently of pq_is_send_pending
so that it's possible to stop walsender while it's processing large
transaction.


Is there any chance of getting this bugfix into Pg 10?

We've just cut back branches, so it'd be a sensible time. 



--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Walsender timeouts and large transactions

From
Sokolov Yura
Date:
On 2017-08-09 16:23, Petr Jelinek wrote:
> On 02/08/17 19:35, Yura Sokolov wrote:
>> The following review has been posted through the commitfest 
>> application:
>> make installcheck-world:  tested, passed
>> Implements feature:       not tested
>> Spec compliant:           not tested
>> Documentation:            not tested
>> 
>> There is no check for (last_reply_timestamp <= 0 || wal_sender_timeout 
>> <= 0) as in other places
>> (in WalSndKeepaliveIfNecessary for example).
>> 
>> I don't think moving update of 'now' down to end of loop body is 
>> correct:
>> there are calls to ProcessConfigFile with SyncRepInitConfig, 
>> ProcessRepliesIfAny that can
>> last non-negligible time. It could lead to over sleeping due to larger 
>> computed sleeptime.
>> Though I could be mistaken.
>> 
>> I'm not sure about moving `if (!pg_is_send_pending())` in a body loop 
>> after WalSndKeepaliveIfNecessary.
>> Is it necessary? But it looks harmless at least.
>> 
> 
> We also need to do actual timeout handing so that the timeout is not
> deferred to the end of the transaction (Which is why I moved `if
> (!pg_is_send_pending())` under WalSndCheckTimeOut() and
> WalSndKeepaliveIfNecessary() calls).
> 

If standby really stalled, then it will not read from socket, and then
`pg_is_sendpending` eventually will return false, and timeout will be 
checked.
If standby doesn't stall, then `last_reply_timestamp` will be updated in
`ProcessRepliesIfAny`, and so timeout will not be triggered.
Do I understand correctly, or I missed something?

>> Could patch be reduced to check after first `if 
>> (!pg_is_sendpending())` ? like:
>> 
>>     if (!pq_is_send_pending())
>> -        return;
>> +    {
>> +        if (last_reply_timestamp <= 0 || wal_sender_timeout <= 0)
>> +        {
>> +            CHECK_FOR_INTERRUPTS();
>> +            return;
>> +        }
>> +        if (now <= TimestampTzPlusMilliseconds(last_reply_timestamp, 
>> wal_sender_timeout / 2))
>> +            return;
>> +    }
>> 
>> If not, what problem prevents?
> 
> We should do CHECK_FOR_INTERRUPTS() independently of pq_is_send_pending
> so that it's possible to stop walsender while it's processing large
> transaction.

In this case CHECK_FOR_INTERRUPTS will be called after 
wal_sender_timeout/2.
(This diff is for first appearance of `pq_is_send_pending` in a 
function).

If it should be called more often, then patch could be simplier:
     if (!pq_is_send_pending())
-        return;
+    {
+        CHECK_FOR_INTERRUPTS();
+        if (last_reply_timestamp <= 0 || wal_sender_timeout <= 0 ||
+                now <= TimestampTzPlusMilliseconds(last_reply_timestamp, 
wal_sender_timeout / 2))
+            return;
+    }

(Still, I could be mistaken, it is just suggestion).

Is it hard to add test for case this patch fixes?

With regards,
-- 
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company



Re: [HACKERS] Walsender timeouts and large transactions

From
Sokolov Yura
Date:
On 2017-05-25 17:52 Petr Jelinek wrote:
> Hi,
> 
> We have had issue with walsender timeout when used with logical
> decoding and the transaction is taking long time to be decoded
> (because it contains many changes)
> 
> I was looking today at the walsender code and realized that it's
> because if the network and downstream are fast enough, we'll always
> take fast path in WalSndWriteData which does not do reply or
> keepalive processing and is only reached once the transaction has
> finished by other code. So paradoxically we die of timeout because
> everything was fast enough to never fall back to slow code path.
> 
> I propose we only use fast path if the last processed reply is not
> older than half of walsender timeout, if it is then we'll force the
> slow code path to process the replies again. This is similar logic
> that we use to determine if to send keepalive message. I also added
> CHECK_INTERRUPRS call to fast code path because otherwise walsender
> might ignore them for too long on large transactions.
> 
> Thoughts?
> 

On 2017-08-10 14:20 Sokolov Yura wrote:
> On 2017-08-09 16:23, Petr Jelinek wrote:
> > On 02/08/17 19:35, Yura Sokolov wrote:  
> >> The following review has been posted through the commitfest 
> >> application:
> >> make installcheck-world:  tested, passed
> >> Implements feature:       not tested
> >> Spec compliant:           not tested
> >> Documentation:            not tested
> >> 
> >> There is no check for (last_reply_timestamp <= 0 ||
> >> wal_sender_timeout <= 0) as in other places
> >> (in WalSndKeepaliveIfNecessary for example).
> >> 
> >> I don't think moving update of 'now' down to end of loop body is 
> >> correct:
> >> there are calls to ProcessConfigFile with SyncRepInitConfig, 
> >> ProcessRepliesIfAny that can
> >> last non-negligible time. It could lead to over sleeping due to
> >> larger computed sleeptime.
> >> Though I could be mistaken.
> >> 
> >> I'm not sure about moving `if (!pg_is_send_pending())` in a body
> >> loop after WalSndKeepaliveIfNecessary.
> >> Is it necessary? But it looks harmless at least.
> >>   
> > 
> > We also need to do actual timeout handing so that the timeout is not
> > deferred to the end of the transaction (Which is why I moved `if
> > (!pg_is_send_pending())` under WalSndCheckTimeOut() and
> > WalSndKeepaliveIfNecessary() calls).
> >   
> 
> If standby really stalled, then it will not read from socket, and then
> `pg_is_sendpending` eventually will return false, and timeout will be 
> checked.
> If standby doesn't stall, then `last_reply_timestamp` will be updated
> in `ProcessRepliesIfAny`, and so timeout will not be triggered.
> Do I understand correctly, or I missed something?
> 
> >> Could patch be reduced to check after first `if 
> >> (!pg_is_sendpending())` ? like:
> >> 
> >>     if (!pq_is_send_pending())
> >> -        return;
> >> +    {
> >> +        if (last_reply_timestamp <= 0 ||
> >> wal_sender_timeout <= 0)
> >> +        {
> >> +            CHECK_FOR_INTERRUPTS();
> >> +            return;
> >> +        }
> >> +        if (now <=
> >> TimestampTzPlusMilliseconds(last_reply_timestamp,
> >> wal_sender_timeout / 2))
> >> +            return;
> >> +    }
> >> 
> >> If not, what problem prevents?  
> > 
> > We should do CHECK_FOR_INTERRUPTS() independently of
> > pq_is_send_pending so that it's possible to stop walsender while
> > it's processing large transaction.  
> 
> In this case CHECK_FOR_INTERRUPTS will be called after 
> wal_sender_timeout/2.
> (This diff is for first appearance of `pq_is_send_pending` in a 
> function).
> 
> If it should be called more often, then patch could be simplier:
> 
>       if (!pq_is_send_pending())
> -        return;
> +    {
> +        CHECK_FOR_INTERRUPTS();
> +        if (last_reply_timestamp <= 0 || wal_sender_timeout
> <= 0 ||
> +                now <=
> TimestampTzPlusMilliseconds(last_reply_timestamp,
> wal_sender_timeout / 2))
> +            return;
> +    }
> 
> (Still, I could be mistaken, it is just suggestion).
> 
> Is it hard to add test for case this patch fixes?
> 
> With regards,

Tom, Robert,

I believe this bug have to be fixed in Pg10, and I don't wonna
be that guy who prevents it from being fixed.
What should/could I do?
( https://commitfest.postgresql.org/14/1151/ )

--
With regards,
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company



Re: [HACKERS] Walsender timeouts and large transactions

From
Yura Sokolov
Date:
I've changed to "need review" to gain more attention from other.

Re: [HACKERS] Walsender timeouts and large transactions

From
Kyotaro HORIGUCHI
Date:
Hello,

At Wed, 06 Sep 2017 13:46:16 +0000, Yura Sokolov <funny.falcon@postgrespro.ru> wrote in
<20170906134616.18925.88390.pgcf@coridan.postgresql.org>
> I've changed to "need review" to gain more attention from other.

I understand that the problem here is too fast network prohibits
walsender from sending replies.

In physical replication, WAL records are sent as soon as written
and the timeout is handled in the topmost loop in WalSndLoop. In
logical replication, data is sent at once at commit time in most
cases. So it can take a long time in ReorderBufferCommit without
returning to WalSndLoop (or even XLogSendLogical).

One problem here is that WalSndWriteData waits for the arrival of
the next *WAL record* in the slow-ptah because it is called by
cues of ReorderBuffer* functions (mainly *Commit) irrelevantly to
WAL insertion. This is I think the root cause of this problem.

On the other hand, it ought to take a sleep when network is
stalled, in other words, data to send remains after a flush. We
don't have a means to signal when the socket queue gets a new
room for another bytes. However, I suppose that such slow network
allows us to sleep several or several tens of milliseconds. Or,
if we could know how many bytes ps_flush_if_writable() pushed,
it's enough to wait only when the function returns pushing
nothing.

As the result, I think that the functions should be modified as
the following.

- Forcing slow-path if time elapses a half of a ping period is right. (GetCurrentTimestamp is anyway requried.)

- The slow-path should not sleep waiting Latch. It should just pg_usleep() for maybe 1-10ms.

- We should go to the fast path just after keepalive or response message has been sent. In other words, the "if (now <"
blockshould be in the "for (;;)" loop. This avoids needless runs on the slow-path.
 


It would be refactorable as the following.
 prepare for the send buffer;
 for (;;) {   now = GetCurrentTimeStamp();   if (now < )...   {     fast-path   }   else   {     slow-path   }   return
iffinished   sleep for 1ms? }
 


What do you think about this?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Walsender timeouts and large transactions

From
Sokolov Yura
Date:
On 2017-09-12 11:28, Kyotaro HORIGUCHI wrote:
> Hello,
> 
> At Wed, 06 Sep 2017 13:46:16 +0000, Yura Sokolov
> <funny.falcon@postgrespro.ru> wrote in
> <20170906134616.18925.88390.pgcf@coridan.postgresql.org>
>> I've changed to "need review" to gain more attention from other.
> 
> I understand that the problem here is too fast network prohibits
> walsender from sending replies.
> 
> In physical replication, WAL records are sent as soon as written
> and the timeout is handled in the topmost loop in WalSndLoop. In
> logical replication, data is sent at once at commit time in most
> cases. So it can take a long time in ReorderBufferCommit without
> returning to WalSndLoop (or even XLogSendLogical).
> 
> One problem here is that WalSndWriteData waits for the arrival of
> the next *WAL record* in the slow-ptah because it is called by
> cues of ReorderBuffer* functions (mainly *Commit) irrelevantly to
> WAL insertion. This is I think the root cause of this problem.
> 
> On the other hand, it ought to take a sleep when network is
> stalled, in other words, data to send remains after a flush. We
> don't have a means to signal when the socket queue gets a new
> room for another bytes. However, I suppose that such slow network
> allows us to sleep several or several tens of milliseconds. Or,
> if we could know how many bytes ps_flush_if_writable() pushed,
> it's enough to wait only when the function returns pushing
> nothing.
> 
> As the result, I think that the functions should be modified as
> the following.
> 
> - Forcing slow-path if time elapses a half of a ping period is
>   right. (GetCurrentTimestamp is anyway requried.)
> 
> - The slow-path should not sleep waiting Latch. It should just
>   pg_usleep() for maybe 1-10ms.
> 
> - We should go to the fast path just after keepalive or response
>   message has been sent. In other words, the "if (now <" block
>   should be in the "for (;;)" loop. This avoids needless runs on
>   the slow-path.
> 
> 
> It would be refactorable as the following.
> 
>   prepare for the send buffer;
> 
>   for (;;)
>   {
>     now = GetCurrentTimeStamp();
>     if (now < )...
>     {
>       fast-path
>     }
>     else
>     {
>       slow-path
>     }
>     return if finished
>     sleep for 1ms?
>   }
> 
> 
> What do you think about this?
> 
> regards,
> 
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center

Good day, Petr, Kyotaro

I've created failing test for issue (0001-Add-failing-test...) .
It tests insertion of 20000 rows with 10ms wal_sender_timeout
(it fails in WalSndWriteData on master) and then deletion of
those rows with 1ms wal_sender_timeout (it fails in WalSndLoop).

Both Peter's patch and my simplified suggestion didn't pass the
test. I didn't checked Kyotaro's suggestion, though, cause I
didn't understand it well.

I've made patch that passes the test (0002-Fix-walsender...) .
(I've used Petr's commit message. Don't you mind, Petr?)

In WalSndWriteData it adds CHECK_FOR_INTERRUPTS to fastpath and
falls through to slow path after half of wal_sender_timeout as
were discussed.
In a slow path, it just skips fast exit on `!pq_is_send_pending()`
and check for timeout for the first loop iteration. And it sets
sleeptime to 1ms even if timeout were reached. It gives a chance
to receiver's response to arrive.

In WalSndLoop it also skips check for timeout first iteration after
send_data were called, and also sleeps at least 1 ms.

I'm not sure about correctness of my patch. Given test exists,
you may suggest better solutions, or improve this solution.

I'll set commitfest topic status to 'Need review' assuming
my patch could be reviewed.

-- 
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Walsender timeouts and large transactions

From
Kyotaro HORIGUCHI
Date:
Hello,

At Wed, 27 Sep 2017 14:28:37 +0300, Sokolov Yura <funny.falcon@postgrespro.ru> wrote in
<90bb67da7131e6186b50897c4b0f0ec3@postgrespro.ru>
> On 2017-09-12 11:28, Kyotaro HORIGUCHI wrote:
> > Hello,
> > At Wed, 06 Sep 2017 13:46:16 +0000, Yura Sokolov
> > <funny.falcon@postgrespro.ru> wrote in
> > <20170906134616.18925.88390.pgcf@coridan.postgresql.org>
> > As the result, I think that the functions should be modified as
> > the following.
> > - Forcing slow-path if time elapses a half of a ping period is
> >   right. (GetCurrentTimestamp is anyway requried.)
> > - The slow-path should not sleep waiting Latch. It should just
> >   pg_usleep() for maybe 1-10ms.
> > - We should go to the fast path just after keepalive or response
> >   message has been sent. In other words, the "if (now <" block
> >   should be in the "for (;;)" loop. This avoids needless runs on
> >   the slow-path.
> > It would be refactorable as the following.
> >   prepare for the send buffer;
> >   for (;;)
> >   {
> >     now = GetCurrentTimeStamp();
> >     if (now < )...
> >     {
> >       fast-path
> >     }
> >     else
> >     {
> >       slow-path
> >     }
> >     return if finished
> >     sleep for 1ms?
> >   }
> > What do you think about this?
> > regards,
> > --
> > Kyotaro Horiguchi
> > NTT Open Source Software Center
> 
> Good day, Petr, Kyotaro
> 
> I've created failing test for issue (0001-Add-failing-test...) .
> It tests insertion of 20000 rows with 10ms wal_sender_timeout
> (it fails in WalSndWriteData on master) and then deletion of
> those rows with 1ms wal_sender_timeout (it fails in WalSndLoop).
> 
> Both Peter's patch and my simplified suggestion didn't pass the
> test. I didn't checked Kyotaro's suggestion, though, cause I
> didn't understand it well.

Mmm. The test seems broken. wal_sender_timeout = 10ms with
wal_receiver_status_interval=10s immediately causes a
timeout. Avoiding the timeout is just breaking the sane code.

wal_sender_timeout = 3s and wal_receiver_status_interval=1s
effectively causes the problem with about 1000000 lines of (int)
insertion on UNIX socket connection, on my poor box.

The original complain here came from the fact that
WalSndWriteData skips processing of replies for a long time on a
fast network. However Petr's patch fixed the problem, I pointed
that just letting the function take the slow path leads to
another problem, that is, waiting for new WAL records can result
in a unwanted pause in the slow path.

Combining the solutions for the two problem is my proposal sited
above. The sentence seems in quite bad style but the attached
file is the concorete patch of that.

Any thoughts?

regards,
-- 
Kyotaro Horiguchi
NTT Open Source Software Center
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
***************
*** 1151,1156 **** static void
--- 1151,1158 ---- WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,                 bool
last_write){
 
+     TimestampTz now = GetCurrentTimestamp();
+      /* output previously gathered data in a CopyData packet */     pq_putmessage_noblock('d', ctx->out->data,
ctx->out->len);
 
***************
*** 1160,1235 **** WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,      * several
releasesby streaming physical replication.      */     resetStringInfo(&tmpbuf);
 
!     pq_sendint64(&tmpbuf, GetCurrentTimestamp());     memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)],
     tmpbuf.data, sizeof(int64)); 
 
-     /* fast path */
-     /* Try to flush pending output to the client */
-     if (pq_flush_if_writable() != 0)
-         WalSndShutdown();
- 
-     if (!pq_is_send_pending())
-         return;
-      for (;;)     {         int            wakeEvents;         long        sleeptime;         TimestampTz now;
 /*
 
!          * Emergency bailout if postmaster has died.  This is to avoid the
!          * necessity for manual cleanup of all postmaster children.          */
!         if (!PostmasterIsAlive())
!             exit(1);
! 
!         /* Clear any already-pending wakeups */
!         ResetLatch(MyLatch);
! 
!         CHECK_FOR_INTERRUPTS();
! 
!         /* Process any requests or signals received recently */
!         if (ConfigReloadPending)
!         {
!             ConfigReloadPending = false;
!             ProcessConfigFile(PGC_SIGHUP);
!             SyncRepInitConfig();
!         }
! 
!         /* Check for input from the client */
!         ProcessRepliesIfAny();
! 
!         /* Try to flush pending output to the client */
!         if (pq_flush_if_writable() != 0)
!             WalSndShutdown();
! 
!         /* If we finished clearing the buffered data, we're done here. */
!         if (!pq_is_send_pending())
!             break;
!          now = GetCurrentTimestamp();
- 
-         /* die if timeout was reached */
-         WalSndCheckTimeOut(now);
- 
-         /* Send keepalive if the time has come */
-         WalSndKeepaliveIfNecessary(now);
- 
-         sleeptime = WalSndComputeSleeptime(now);
- 
-         wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
-             WL_SOCKET_WRITEABLE | WL_SOCKET_READABLE | WL_TIMEOUT;
- 
-         /* Sleep until something happens or we time out */
-         WaitLatchOrSocket(MyLatch, wakeEvents,
-                           MyProcPort->sock, sleeptime,
-                           WAIT_EVENT_WAL_SENDER_WRITE_DATA);     }
- 
-     /* reactivate latch so WalSndLoop knows to continue */
-     SetLatch(MyLatch); }  /*
--- 1162,1238 ----      * several releases by streaming physical replication.      */     resetStringInfo(&tmpbuf);
!     pq_sendint64(&tmpbuf, now);     memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)],
tmpbuf.data,sizeof(int64));      for (;;)     {         int            wakeEvents;         long        sleeptime;
 TimestampTz now; 
 
+         if (now <= TimestampTzPlusMilliseconds(last_reply_timestamp,
+                                                wal_sender_timeout / 2))
+         {
+             /* fast path */
+             /* Try to flush pending output to the client */
+             if (pq_flush_if_writable() != 0)
+                 WalSndShutdown();
+         }
+         else
+         {
+             /*
+              * Emergency bailout if postmaster has died.  This is to avoid the
+              * necessity for manual cleanup of all postmaster children.
+              */
+             if (!PostmasterIsAlive())
+                 exit(1);
+ 
+             CHECK_FOR_INTERRUPTS();
+ 
+             /* Process any requests or signals received recently */
+             if (ConfigReloadPending)
+             {
+                 ConfigReloadPending = false;
+                 ProcessConfigFile(PGC_SIGHUP);
+                 SyncRepInitConfig();
+             }
+ 
+             /* Check for input from the client */
+             ProcessRepliesIfAny();
+ 
+             /* Try to flush pending output to the client */
+             if (pq_flush_if_writable() != 0)
+                 WalSndShutdown();
+ 
+             /* If we finished clearing the buffered data, we're done here. */
+             if (!pq_is_send_pending())
+                 break;
+ 
+             /* die if timeout was reached */
+             WalSndCheckTimeOut(now);
+ 
+             /* Send keepalive if the time has come */
+             WalSndKeepaliveIfNecessary(now);
+         }
+ 
+         /* return if finished */
+         if (!pq_is_send_pending())
+             return;
+          /*
!          * Send buffer is filled up. Being different from WalSndLoop, since
!          * this function is called at a commit time for all the modifications
!          * within the commit, we should continue sending data without waiting
!          * for the next WAL record.
!          *
!          * Not sure how long we should wait for a room to send more data, but
!          * 1ms would be sufficient.          */
!         pg_usleep(1000);         now = GetCurrentTimestamp();     } }  /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Walsender timeouts and large transactions

From
Sokolov Yura
Date:
Good day, Kyoutaro

On 2017-09-29 11:26, Kyotaro HORIGUCHI wrote:
> Hello,
> 
> At Wed, 27 Sep 2017 14:28:37 +0300, Sokolov Yura
> <funny.falcon@postgrespro.ru> wrote in
> <90bb67da7131e6186b50897c4b0f0ec3@postgrespro.ru>
>> On 2017-09-12 11:28, Kyotaro HORIGUCHI wrote:
>> > Hello,
>> > At Wed, 06 Sep 2017 13:46:16 +0000, Yura Sokolov
>> > <funny.falcon@postgrespro.ru> wrote in
>> > <20170906134616.18925.88390.pgcf@coridan.postgresql.org>
>> > As the result, I think that the functions should be modified as
>> > the following.
>> > - Forcing slow-path if time elapses a half of a ping period is
>> >   right. (GetCurrentTimestamp is anyway requried.)
>> > - The slow-path should not sleep waiting Latch. It should just
>> >   pg_usleep() for maybe 1-10ms.
>> > - We should go to the fast path just after keepalive or response
>> >   message has been sent. In other words, the "if (now <" block
>> >   should be in the "for (;;)" loop. This avoids needless runs on
>> >   the slow-path.
>> > It would be refactorable as the following.
>> >   prepare for the send buffer;
>> >   for (;;)
>> >   {
>> >     now = GetCurrentTimeStamp();
>> >     if (now < )...
>> >     {
>> >       fast-path
>> >     }
>> >     else
>> >     {
>> >       slow-path
>> >     }
>> >     return if finished
>> >     sleep for 1ms?
>> >   }
>> > What do you think about this?
>> > regards,
>> > --
>> > Kyotaro Horiguchi
>> > NTT Open Source Software Center
>> 
>> Good day, Petr, Kyotaro
>> 
>> I've created failing test for issue (0001-Add-failing-test...) .
>> It tests insertion of 20000 rows with 10ms wal_sender_timeout
>> (it fails in WalSndWriteData on master) and then deletion of
>> those rows with 1ms wal_sender_timeout (it fails in WalSndLoop).
>> 
>> Both Peter's patch and my simplified suggestion didn't pass the
>> test. I didn't checked Kyotaro's suggestion, though, cause I
>> didn't understand it well.
> 
> Mmm. The test seems broken. wal_sender_timeout = 10ms with
> wal_receiver_status_interval=10s immediately causes a
> timeout. Avoiding the timeout is just breaking the sane code.

I think you're not right. `wal_receiver_status_interval` is just
for status update, not for replies. Before I made my patch version,
I've added logging to every `now` and `last_reply_timestamp` during
test run. `last_reply_timestamp` definitely updates more often than
once in 10s. So, `wal_receiver_status_interval = 10s` has nothing
in common with receiver's replies, as I see.

(btw, logging slows down sender enough to "fix" test :-) )

And my patch doesn't avoid timeout check, so it doesn't break
anything. It just delays timeout on 1ms. Given, it is unpractical
to set wal_sender_timeout less than 50ms in production, this 1ms
increase in timeout check is negligible.

And I've checked just now that my patch passes test even with
wal_receiver_status_interval = 10s.

> 
> wal_sender_timeout = 3s and wal_receiver_status_interval=1s
> effectively causes the problem with about 1000000 lines of (int)
> insertion on UNIX socket connection, on my poor box.

I don't want to make test to lasts so long and generate so many data.
That is why I used such small timeouts for tests.

> The original complain here came from the fact that
> WalSndWriteData skips processing of replies for a long time on a
> fast network. However Petr's patch fixed the problem, I pointed
> that just letting the function take the slow path leads to
> another problem, that is, waiting for new WAL records can result
> in a unwanted pause in the slow path.
> 
> Combining the solutions for the two problem is my proposal sited
> above. The sentence seems in quite bad style but the attached
> file is the concorete patch of that.

Test is failing if there is "short quit" after `!pq_is_send_pending()`,
so I doubt your patch will pass the test.
And you've change calculated sleep time with sane waiting on all
insteresting events (using WaitLatchOrSocket) to semi-busy loop.
It at least could affect throughput.

And why did you remove `SetLatch(MyLatch)` in the end of function?
Probably this change is correct, but not obvious.

> Any thoughts?

It certainly could be my test and my patch is wrong. But my point
is that test should be written first. Especially for such difficult
case. Without test it is impossible to say does our patches fix
something. And it is impossible to say if patch does something
wrong. And impossible to say if patch fixes this problem but
introduce new problem.

Please, write test for your remarks. If you think, my patch breaks
something, write test for the case my patch did broke. If you think
my test is wrong, write your test that is more correct.

Without tests it will be just bird's hubbub.

> regards,

regards,
-- 
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Walsender timeouts and large transactions

From
Kyotaro HORIGUCHI
Date:
Hello Sokolov.

At Fri, 29 Sep 2017 15:19:23 +0300, Sokolov Yura <funny.falcon@postgrespro.ru> wrote in
<d076dae18b437be89c787a854034f3f2@postgrespro.ru>
> I don't want to make test to lasts so long and generate so many data.
> That is why I used such small timeouts for tests.

I understand your point, but still *I* think such a short timeout
is out of expectation by design. (But it can be set.)

Does anyone have opinions on this?

> Test is failing if there is "short quit" after
> `!pq_is_send_pending()`,
> so I doubt your patch will pass the test.

It is because I think that the test "should" fail since the
timeout is out of expected range. I (and perhaps also Petr) is
thinking that the problem is just that a large transaction causes
a timeout with an ordinary timeout. My test case is based on the
assumption.

Your test is for a timeout during replication-startup with
extremely short timeout. This may be a different problem to
discuss, but perhaps better to be solved together.

I'd like to have opnions from others on this point.

> And you've change calculated sleep time with sane waiting on all
> insteresting events (using WaitLatchOrSocket) to semi-busy loop.
> It at least could affect throughput.

Uggh! I misunderstood there. It wais for writing socket so the
sleep is wrong and WaitLatchOrSocket is right.

After all, I put +1 for Petr's latest patch. Sorry for my
carelessness.

> And why did you remove `SetLatch(MyLatch)` in the end of function?
> Probably this change is correct, but not obvious.

Latch is needless there if it waited a fixed duration, but if it
waits writefd events there, also latch should be waited.


> > Any thoughts?
> 
> It certainly could be my test and my patch is wrong. But my point
> is that test should be written first. Especially for such difficult
> case. Without test it is impossible to say does our patches fix
> something. And it is impossible to say if patch does something
> wrong. And impossible to say if patch fixes this problem but
> introduce new problem.
> 
> Please, write test for your remarks. If you think, my patch breaks
> something, write test for the case my patch did broke. If you think
> my test is wrong, write your test that is more correct.
> 
> Without tests it will be just bird's hubbub.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Walsender timeouts and large transactions

From
Petr Jelinek
Date:
Hi,

sorry for the delay but I didn't have much time in past weeks to follow
this thread.

On 02/10/17 05:44, Kyotaro HORIGUCHI wrote:
> Hello Sokolov.
> 
> At Fri, 29 Sep 2017 15:19:23 +0300, Sokolov Yura <funny.falcon@postgrespro.ru> wrote in
<d076dae18b437be89c787a854034f3f2@postgrespro.ru>
>> I don't want to make test to lasts so long and generate so many data.
>> That is why I used such small timeouts for tests.
> 
> I understand your point, but still *I* think such a short timeout
> is out of expectation by design. (But it can be set.)
> 
> Does anyone have opinions on this?

Yes, it's not practically useful to have such small wal_sender_timeout
given that the main purpose of that is to detect network issues.

> 
>> Test is failing if there is "short quit" after
>> `!pq_is_send_pending()`,
>> so I doubt your patch will pass the test.
> 
> It is because I think that the test "should" fail since the
> timeout is out of expected range. I (and perhaps also Petr) is
> thinking that the problem is just that a large transaction causes
> a timeout with an ordinary timeout. My test case is based on the
> assumption.
> 
> Your test is for a timeout during replication-startup with
> extremely short timeout. This may be a different problem to
> discuss, but perhaps better to be solved together.
> 
> I'd like to have opnions from others on this point.

I think it's different problem and because of what I wrote above it does
not seem to me like something we should spend out time on trying to fix.
> 
> Uggh! I misunderstood there. It wais for writing socket so the
> sleep is wrong and WaitLatchOrSocket is right.
> 
> After all, I put +1 for Petr's latest patch. Sorry for my
> carelessness.
> 

Great, I attached version 3 which is just rebased on current master and
also does not move the GetCurrentTimestamp() call because the concern
about skewing the timestamp during config reload (and also network flush
as I realized later) is valid.

It's rather hard to test all the scenarios that this patch fixes in
automated fashion without generating a lot of wal or adding sleeps to
the processing. That's why I didn't produce usable TAP test.

Since it seems like some of my reasoning is unclear I will try to
describe it once more.

The main problem we have is that unless we call the
ProcessRepliesIfAny() before the wal_sender_timeout expires we'll die
because of timeout eventually. The current coding will skip that call if
there is a long transaction being processed (if network is fast enough).
This is what the first part (first 2 hunks) of the patch solves. There
is also issue that while this is happening the walsender ignores signals
so it's impossible to stop it which is why I added the
CHECK_FOR_INTERRUPTS to the fast path.

The second problem is that if we solved just the first one, then if
downstream (and again network) is fast enough, the ProcessRepliesIfAny()
will not do anything useful because downstream is not going to send any
response while the network buffer contains any data. This is caused by
the fact that we normally code the receiver side to receive until there
is something and only send reply when there is a "pause" in the stream.
To get around this problem we also need to make sure that we send
WalSndKeepaliveIfNecessary() periodically and that will not happen on
fast network unless we do the second part of the patch (3rd hunk), ie,
move the pq_is_send_pending() after the keepalive handling.

This code is specific to logical decoding walsender interface so it only
happens for logical decoding/replication (which means it should be
backported all the way to 9.4). The physical one

These two issues happen quite normally in the wild as all we need is big
data load in single transaction, or update of large part of an table or
something similar for this to happen with default settings.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Walsender timeouts and large transactions

From
Robert Haas
Date:
On Wed, Nov 1, 2017 at 8:19 PM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
> sorry for the delay but I didn't have much time in past weeks to follow
> this thread.

+    TimestampTz now = GetCurrentTimestamp();
+    /* output previously gathered data in a CopyData packet */    pq_putmessage_noblock('d', ctx->out->data,
ctx->out->len);
    /*     * Fill the send timestamp last, so that it is taken as late as possible.     * This is somewhat ugly, but
theprotocol is set as it's already used for     * several releases by streaming physical replication.     */
resetStringInfo(&tmpbuf);
-    pq_sendint64(&tmpbuf, GetCurrentTimestamp());
+    pq_sendint64(&tmpbuf, now);    memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)],           tmpbuf.data,
sizeof(int64));

This change falsifies the comments.  Maybe initialize now just after
resetSetringInfo() is done.

-    /* fast path */
-    /* Try to flush pending output to the client */
-    if (pq_flush_if_writable() != 0)
-        WalSndShutdown();
+    /* Try taking fast path unless we get too close to walsender timeout. */
+    if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
+                                          wal_sender_timeout / 2))
+    {
+        CHECK_FOR_INTERRUPTS();

-    if (!pq_is_send_pending())
-        return;
+        /* Try to flush pending output to the client */
+        if (pq_flush_if_writable() != 0)
+            WalSndShutdown();
+
+        if (!pq_is_send_pending())
+            return;
+    }

I think it's only the if (!pq_is_send_pending()) return; that needs to
be conditional here, isn't it?  The pq_flush_if_writable() can be done
unconditionally.

Other than that this looks like a reasonable change to me, but I'm not
an expert on this code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Walsender timeouts and large transactions

From
Petr Jelinek
Date:
Hi,

thanks for checking.

On 02/11/17 10:00, Robert Haas wrote:
> On Wed, Nov 1, 2017 at 8:19 PM, Petr Jelinek
> <petr.jelinek@2ndquadrant.com> wrote:
>> sorry for the delay but I didn't have much time in past weeks to follow
>> this thread.
> 
> +    TimestampTz now = GetCurrentTimestamp();
> +
>      /* output previously gathered data in a CopyData packet */
>      pq_putmessage_noblock('d', ctx->out->data, ctx->out->len);
> 
>      /*
>       * Fill the send timestamp last, so that it is taken as late as possible.
>       * This is somewhat ugly, but the protocol is set as it's already used for
>       * several releases by streaming physical replication.
>       */
>      resetStringInfo(&tmpbuf);
> -    pq_sendint64(&tmpbuf, GetCurrentTimestamp());
> +    pq_sendint64(&tmpbuf, now);
>      memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)],
>             tmpbuf.data, sizeof(int64));
> 
> This change falsifies the comments.  Maybe initialize now just after
> resetSetringInfo() is done.

Eh, right, I can do that.

> 
> -    /* fast path */
> -    /* Try to flush pending output to the client */
> -    if (pq_flush_if_writable() != 0)
> -        WalSndShutdown();
> +    /* Try taking fast path unless we get too close to walsender timeout. */
> +    if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
> +                                          wal_sender_timeout / 2))
> +    {
> +        CHECK_FOR_INTERRUPTS();
> 
> -    if (!pq_is_send_pending())
> -        return;
> +        /* Try to flush pending output to the client */
> +        if (pq_flush_if_writable() != 0)
> +            WalSndShutdown();
> +
> +        if (!pq_is_send_pending())
> +            return;
> +    }
> 
> I think it's only the if (!pq_is_send_pending()) return; that needs to
> be conditional here, isn't it?  The pq_flush_if_writable() can be done
> unconditionally.
> 

Well, even the CHECK_FOR_INTERRUPTS() can be called unconditionally yes.
It just seems like it's needless call as we'll call both in for loop
anyway if we take the "slow" path. I admit it's not exactly big win
though. If you think it would improve readability I can move it.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Walsender timeouts and large transactions

From
Kyotaro HORIGUCHI
Date:
Hello,

At Fri, 3 Nov 2017 15:54:09 +0100, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in
<e2939d26-f5cb-6581-0ca3-a1b0556ed729@2ndquadrant.com>
> Hi,
> 
> thanks for checking.
> 
> On 02/11/17 10:00, Robert Haas wrote:
> > On Wed, Nov 1, 2017 at 8:19 PM, Petr Jelinek
> > <petr.jelinek@2ndquadrant.com> wrote:
> >> sorry for the delay but I didn't have much time in past weeks to follow
> >> this thread.
> > 
> > +    TimestampTz now = GetCurrentTimestamp();
> > +
> >      /* output previously gathered data in a CopyData packet */
> >      pq_putmessage_noblock('d', ctx->out->data, ctx->out->len);
> > 
> >      /*
> >       * Fill the send timestamp last, so that it is taken as late as possible.
> >       * This is somewhat ugly, but the protocol is set as it's already used for
> >       * several releases by streaming physical replication.
> >       */
> >      resetStringInfo(&tmpbuf);
> > -    pq_sendint64(&tmpbuf, GetCurrentTimestamp());
> > +    pq_sendint64(&tmpbuf, now);
> >      memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)],
> >             tmpbuf.data, sizeof(int64));
> > 
> > This change falsifies the comments.  Maybe initialize now just after
> > resetSetringInfo() is done.
> 
> Eh, right, I can do that.
> 
> > 
> > -    /* fast path */
> > -    /* Try to flush pending output to the client */
> > -    if (pq_flush_if_writable() != 0)
> > -        WalSndShutdown();
> > +    /* Try taking fast path unless we get too close to walsender timeout. */
> > +    if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
> > +                                          wal_sender_timeout / 2))
> > +    {
> > +        CHECK_FOR_INTERRUPTS();
> > 
> > -    if (!pq_is_send_pending())
> > -        return;
> > +        /* Try to flush pending output to the client */
> > +        if (pq_flush_if_writable() != 0)
> > +            WalSndShutdown();
> > +
> > +        if (!pq_is_send_pending())
> > +            return;
> > +    }
> > 
> > I think it's only the if (!pq_is_send_pending()) return; that needs to
> > be conditional here, isn't it?  The pq_flush_if_writable() can be done
> > unconditionally.
> > 
> 
> Well, even the CHECK_FOR_INTERRUPTS() can be called unconditionally yes.
> It just seems like it's needless call as we'll call both in for loop
> anyway if we take the "slow" path. I admit it's not exactly big win
> though. If you think it would improve readability I can move it.

I think this is the last message in this thread so I changed the
status of the CF entry to "Waiting for Author".

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [HACKERS] Walsender timeouts and large transactions

From
Kyotaro HORIGUCHI
Date:
Hello,

At Fri, 17 Nov 2017 13:24:08 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20171117.132408.85564852.horiguchi.kyotaro@lab.ntt.co.jp>
> > Well, even the CHECK_FOR_INTERRUPTS() can be called unconditionally yes.
> > It just seems like it's needless call as we'll call both in for loop
> > anyway if we take the "slow" path. I admit it's not exactly big win
> > though. If you think it would improve readability I can move it.
> 
> I think this is the last message in this thread so I changed the
> status of the CF entry to "Waiting for Author".

Hmm. Somehow the last patch and Robert's comment * which is the
base of the patch * has been reached me in reverse order.

I found that the patch is the latest one and will look this
soon. Sorry for the ignorance.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [HACKERS] Walsender timeouts and large transactions

From
Kyotaro HORIGUCHI
Date:
Ouch.. I'd doubly mistaked.

> I found that the patch is the latest one and will look this
> soon. Sorry for the ignorance.

Thats...wrong. Sorry. There's no new patch since the Reboer's
comment.

I think this is just a bug fix and needs no more argument on its
functionality.  (and might ought to be backpatched?)

At Fri, 3 Nov 2017 15:54:09 +0100, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in
<e2939d26-f5cb-6581-0ca3-a1b0556ed729@2ndquadrant.com>
> > This change falsifies the comments.  Maybe initialize now just after
> > resetSetringInfo() is done.
> 
> Eh, right, I can do that.

It is reasonable. (Or rewrite the comment?)

At Fri, 3 Nov 2017 15:54:09 +0100, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in
<e2939d26-f5cb-6581-0ca3-a1b0556ed729@2ndquadrant.com>
> > 
> > -    /* fast path */
> > -    /* Try to flush pending output to the client */
> > -    if (pq_flush_if_writable() != 0)
> > -        WalSndShutdown();
> > +    /* Try taking fast path unless we get too close to walsender timeout. */
> > +    if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
> > +                                          wal_sender_timeout / 2))
> > +    {
> > +        CHECK_FOR_INTERRUPTS();
> > 
> > -    if (!pq_is_send_pending())
> > -        return;
> > +        /* Try to flush pending output to the client */
> > +        if (pq_flush_if_writable() != 0)
> > +            WalSndShutdown();
> > +
> > +        if (!pq_is_send_pending())
> > +            return;
> > +    }
> > 
> > I think it's only the if (!pq_is_send_pending()) return; that needs to
> > be conditional here, isn't it?  The pq_flush_if_writable() can be done
> > unconditionally.
> > 
> 
> Well, even the CHECK_FOR_INTERRUPTS() can be called unconditionally yes.
> It just seems like it's needless call as we'll call both in for loop
> anyway if we take the "slow" path. I admit it's not exactly big win
> though. If you think it would improve readability I can move it.

Moving around the code allow us to place ps_is_send_pending() in
the while condition, which seems to be more proper place to do
that. I haven't added test for this particular case.

I tested this that

- cleanly applies on the current master HEAD and passes make check and subscription test.

- walsender properly chooses the slow-path even if pq_is_send_pending() is always false. (happens on a fast enough
network)

- walsender waits properly waits on socket and process-reply time in WaitLatchOrSocket.

- walsender exits by timeout on network stall.

So, I think the patch is functionally perfect.

I'm a reviewer of this patch but I think I'm not allowed to mark
this "Ready for Commiter" since the last change is made by me.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 508c43ef43eebee7de89094658b74f9cef8e4342 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Tue, 12 Sep 2017 17:31:28 +0900
Subject: [PATCH] Fix walsender timeouts when decoding large transaction

The logical slots have fast code path for sending data in order to not
impose too high per message overhead. The fast path skips checks for
interrupts and timeouts. However, the existing coding failed to consider
the fact that transaction with large number of changes may take very long
to be processed and sent to the client. This causes walsender to ignore
interrupts for potentially long time and more importantly it will cause
walsender being killed due to timeout at the end of such transaction.

This commit changes the fast path to also check for interrupts and only
allows calling the fast path when last keeplaive check happened less
than half of walsender timeout ago, otherwise the slower code path will
be taken.
---src/backend/replication/walsender.c | 58 ++++++++++++++++++++-----------------1 file changed, 32 insertions(+), 26
deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index fa1db74..5a4c31f 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1151,6 +1151,8 @@ static voidWalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
       bool last_write){
 
+    TimestampTz    now;
+    /* output previously gathered data in a CopyData packet */    pq_putmessage_noblock('d', ctx->out->data,
ctx->out->len);
@@ -1160,23 +1162,46 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,     * several
releasesby streaming physical replication.     */    resetStringInfo(&tmpbuf);
 
-    pq_sendint64(&tmpbuf, GetCurrentTimestamp());
+    now = GetCurrentTimestamp();
+    pq_sendint64(&tmpbuf, now);    memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)],           tmpbuf.data,
sizeof(int64));
-    /* fast path */
+    CHECK_FOR_INTERRUPTS();
+    /* Try to flush pending output to the client */    if (pq_flush_if_writable() != 0)        WalSndShutdown();
-    if (!pq_is_send_pending())
-        return;
+    /* Try taking fast path unless we get too close to walsender timeout. */
+    if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
+                                          wal_sender_timeout / 2))
+    {
+        if (!pq_is_send_pending())
+            return;
+    }
-    for (;;)
+    /* If we have pending write here, go to slow path */
+    while (pq_is_send_pending())    {        int            wakeEvents;        long        sleeptime;
-        TimestampTz now;
+
+        /* die if timeout was reached */
+        WalSndCheckTimeOut(now);
+
+        /* Send keepalive if the time has come */
+        WalSndKeepaliveIfNecessary(now);
+
+        sleeptime = WalSndComputeSleeptime(now);
+
+        wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
+            WL_SOCKET_WRITEABLE | WL_SOCKET_READABLE | WL_TIMEOUT;
+
+        /* Sleep until something happens or we time out */
+        WaitLatchOrSocket(MyLatch, wakeEvents,
+                          MyProcPort->sock, sleeptime,
+                          WAIT_EVENT_WAL_SENDER_WRITE_DATA);        /*         * Emergency bailout if postmaster has
died. This is to avoid the
 
@@ -1205,27 +1230,8 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,        if
(pq_flush_if_writable()!= 0)            WalSndShutdown();
 
-        /* If we finished clearing the buffered data, we're done here. */
-        if (!pq_is_send_pending())
-            break;
-
+        /* Update the timestamp for the next trial */        now = GetCurrentTimestamp();
-
-        /* die if timeout was reached */
-        WalSndCheckTimeOut(now);
-
-        /* Send keepalive if the time has come */
-        WalSndKeepaliveIfNecessary(now);
-
-        sleeptime = WalSndComputeSleeptime(now);
-
-        wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
-            WL_SOCKET_WRITEABLE | WL_SOCKET_READABLE | WL_TIMEOUT;
-
-        /* Sleep until something happens or we time out */
-        WaitLatchOrSocket(MyLatch, wakeEvents,
-                          MyProcPort->sock, sleeptime,
-                          WAIT_EVENT_WAL_SENDER_WRITE_DATA);    }    /* reactivate latch so WalSndLoop knows to
continue*/
 
-- 
2.9.2


Re: [HACKERS] Walsender timeouts and large transactions

From
Michael Paquier
Date:
On Fri, Nov 17, 2017 at 4:35 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> I'm a reviewer of this patch but I think I'm not allowed to mark
> this "Ready for Commiter" since the last change is made by me.

Yes, it is a better idea to wait for reviews here.
-- 
Michael


Re: [HACKERS] Walsender timeouts and large transactions

From
Petr Jelinek
Date:
Hi,

On 17/11/17 08:35, Kyotaro HORIGUCHI wrote:
> 
> Moving around the code allow us to place ps_is_send_pending() in
> the while condition, which seems to be more proper place to do
> that. I haven't added test for this particular case.
> 
> I tested this that
> 
> - cleanly applies on the current master HEAD and passes make
>   check and subscription test.
> 
> - walsender properly chooses the slow-path even if
>   pq_is_send_pending() is always false. (happens on a fast enough
>   network)
> 
> - walsender waits properly waits on socket and process-reply time
>   in WaitLatchOrSocket.
> 
> - walsender exits by timeout on network stall.
> 
> So, I think the patch is functionally perfect.
> 
> I'm a reviewer of this patch but I think I'm not allowed to mark
> this "Ready for Commiter" since the last change is made by me.
> 

Thanks for working on this, but there are couple of problems with your
modifications which mean that it does not actually fix the original
issue anymore (transaction taking long time to decode while sending
changes over network works fine will result in walsender timout).

The firs one is that you put pq_is_send_pending() in the while so the
while is again never executed if there is no network send pending which
makes the if above meaningless. Also you missed ProcessRepliesIfAny()
when moving code around. That's needed for timeout calculations to work
correctly.

So one more revision attached with those things fixed. This version
fixes the original issue as well.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: [HACKERS] Walsender timeouts and large transactions

From
Craig Ringer
Date:
On 29 November 2017 at 23:59, Petr Jelinek wrote: > Hi, > > On 17/11/17 08:35, Kyotaro HORIGUCHI wrote: > > > > Moving around the code allow us to place ps_is_send_pending() in > > the while condition, which seems to be more proper place to do > > that. I haven't added test for this particular case. > > > > I tested this that > > > > - cleanly applies on the current master HEAD and passes make > > check and subscription test. > > > > - walsender properly chooses the slow-path even if > > pq_is_send_pending() is always false. (happens on a fast enough > > network) > > > > - walsender waits properly waits on socket and process-reply time > > in WaitLatchOrSocket. > > > > - walsender exits by timeout on network stall. > > > > So, I think the patch is functionally perfect. > > > > I'm a reviewer of this patch but I think I'm not allowed to mark > > this "Ready for Commiter" since the last change is made by me. > > > > Thanks for working on this, but there are couple of problems with your > modifications which mean that it does not actually fix the original > issue anymore (transaction taking long time to decode while sending > changes over network works fine will result in walsender timout). > > The firs one is that you put pq_is_send_pending() in the while so the > while is again never executed if there is no network send pending which > makes the if above meaningless. Also you missed ProcessRepliesIfAny() > when moving code around. That's needed for timeout calculations to work > correctly. > > So one more revision attached with those things fixed. This version > fixes the original issue as well. > > I'm happy with what I see here. Commit message needs tweaking, but any committer would do that anyway. To me it looks like it's time to get this committed, marking as such. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Walsender timeouts and large transactions

From
Robert Haas
Date:
On Mon, Dec 4, 2017 at 10:59 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> To me it looks like it's time to get this committed, marking as such.

This version has noticeably more code rearrangement than before, and
I'm not sure that is actually buying us anything.  Why not keep the
changes minimal?

Also, TBH, this doesn't seem to have been carefully reviewed for style:

-    if (!pq_is_send_pending())
-        return;
+    /* Try taking fast path unless we get too close to walsender timeout. */
+    if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
+                                          wal_sender_timeout / 2))
+    {
+        if (!pq_is_send_pending())
+            return;
+    }

Generally we write if (a && b) { ... } not if (a) { if (b) .. }

-    }
+    };

It's hard to understand how this got through review.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Walsender timeouts and large transactions

From
Craig Ringer
Date:
On 6 December 2017 at 04:07, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Dec 4, 2017 at 10:59 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> To me it looks like it's time to get this committed, marking as such.

This version has noticeably more code rearrangement than before, and
I'm not sure that is actually buying us anything.  Why not keep the
changes minimal?

Also, TBH, this doesn't seem to have been carefully reviewed for style:

-    if (!pq_is_send_pending())
-        return;
+    /* Try taking fast path unless we get too close to walsender timeout. */
+    if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
+                                          wal_sender_timeout / 2))
+    {
+        if (!pq_is_send_pending())
+            return;
+    }

Generally we write if (a && b) { ... } not if (a) { if (b) .. }

-    }
+    };

It's hard to understand how this got through review.
 

Entirely my fault - I tend to forget to look at code style when I'm focused on functionality. My apologies, and thanks for the reminder.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Walsender timeouts and large transactions

From
Petr Jelinek
Date:
On 05/12/17 21:07, Robert Haas wrote:
> On Mon, Dec 4, 2017 at 10:59 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> To me it looks like it's time to get this committed, marking as such.
> 
> This version has noticeably more code rearrangement than before, and
> I'm not sure that is actually buying us anything.  Why not keep the
> changes minimal?
> 

Yeah we moved things around in the end, the main reason would be that it
actually works closer to how it was originally designed to work. Meaning
that the slow path is not so slow when !pq_is_send_pending() which seems
to have been the reasoning for original coding.

It's not completely necessary to do it for fixing the bug, but why make
things slower than they need to be.

> Also, TBH, this doesn't seem to have been carefully reviewed for style:
> 
> -    if (!pq_is_send_pending())
> -        return;
> +    /* Try taking fast path unless we get too close to walsender timeout. */
> +    if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
> +                                          wal_sender_timeout / 2))
> +    {
> +        if (!pq_is_send_pending())
> +            return;
> +    }
> 
> Generally we write if (a && b) { ... } not if (a) { if (b) .. }
> 

It's rather ugly with && because one of the conditions is two line, but
okay here you go. I am keeping the brackets even if normally don't for
one-liners because it's completely unreadable without them IMHO.

> -    }
> +    };
> 

Oops.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: [HACKERS] Walsender timeouts and large transactions

From
Craig Ringer
Date:
On 7 December 2017 at 01:22, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
On 05/12/17 21:07, Robert Haas wrote:

> Generally we write if (a && b) { ... } not if (a) { if (b) .. }
>

It's rather ugly with && because one of the conditions is two line, but
okay here you go. I am keeping the brackets even if normally don't for
one-liners because it's completely unreadable without them IMHO.


 
Yeah, that's why I passed on that FWIW. Sometimes breaking up a condition is nice. Personally I intensely dislike the convention of 


if (big_condition
    && big_condition)
   one_linerdo_something;


as awfully unreadable, but I guess code convention means you live with things you don't like.
 

Anyway, I've just hit this bug in the wild for the umpteenth time this year, and I'd like to know what I can do to help progress it to commit+backport.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Walsender timeouts and large transactions

From
Andrew Dunstan
Date:

On 12/14/2017 01:46 AM, Craig Ringer wrote:
> On 7 December 2017 at 01:22, Petr Jelinek
> <petr.jelinek@2ndquadrant.com <mailto:petr.jelinek@2ndquadrant.com>>
> wrote:
>
>     On 05/12/17 21:07, Robert Haas wrote:
>     > 
>     > Generally we write if (a && b) { ... } not if (a) { if (b) .. }
>     >
>
>     It's rather ugly with && because one of the conditions is two
>     line, but
>     okay here you go. I am keeping the brackets even if normally don't for
>     one-liners because it's completely unreadable without them IMHO.
>
>
>  
> Yeah, that's why I passed on that FWIW. Sometimes breaking up a
> condition is nice. Personally I intensely dislike the convention of 
>
>
> if (big_condition
>     && big_condition)
>    one_linerdo_something;
>
>
> as awfully unreadable, but I guess code convention means you live with
> things you don't like.
>  
>
> Anyway, I've just hit this bug in the wild for the umpteenth time this
> year, and I'd like to know what I can do to help progress it to
> commit+backport.
>
>


Ask and ye shall receive. I've just committed it.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Walsender timeouts and large transactions

From
Craig Ringer
Date:
On 15 December 2017 at 00:36, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:


On 12/14/2017 01:46 AM, Craig Ringer wrote:
> On 7 December 2017 at 01:22, Petr Jelinek
> <petr.jelinek@2ndquadrant.com <mailto:petr.jelinek@2ndquadrant.com>>
> wrote:
>
>     On 05/12/17 21:07, Robert Haas wrote:
>     > 
>     > Generally we write if (a && b) { ... } not if (a) { if (b) .. }
>     >
>
>     It's rather ugly with && because one of the conditions is two
>     line, but
>     okay here you go. I am keeping the brackets even if normally don't for
>     one-liners because it's completely unreadable without them IMHO.
>
>
>  
> Yeah, that's why I passed on that FWIW. Sometimes breaking up a
> condition is nice. Personally I intensely dislike the convention of 
>
>
> if (big_condition
>     && big_condition)
>    one_linerdo_something;
>
>
> as awfully unreadable, but I guess code convention means you live with
> things you don't like.
>  
>
> Anyway, I've just hit this bug in the wild for the umpteenth time this
> year, and I'd like to know what I can do to help progress it to
> commit+backport.
>
>


Ask and ye shall receive. I've just committed it.


Brilliant, thanks. Backpatched too, great.

Now I'm going to shamelessly point you at the other nasty recurring logical decoding bug while you're fresh from thinking about replication. I can hope, right ;)

https://commitfest.postgresql.org/16/1397/ causes errors or bad data to be decoded, so it's a serious bug.

I'll see if I can rustle up some review attention first though.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services