Thread: [HACKERS] Walsender timeouts and large transactions
Hi, We have had issue with walsender timeout when used with logical decoding and the transaction is taking long time to be decoded (because it contains many changes) I was looking today at the walsender code and realized that it's because if the network and downstream are fast enough, we'll always take fast path in WalSndWriteData which does not do reply or keepalive processing and is only reached once the transaction has finished by other code. So paradoxically we die of timeout because everything was fast enough to never fall back to slow code path. I propose we only use fast path if the last processed reply is not older than half of walsender timeout, if it is then we'll force the slow code path to process the replies again. This is similar logic that we use to determine if to send keepalive message. I also added CHECK_INTERRUPRS call to fast code path because otherwise walsender might ignore them for too long on large transactions. Thoughts? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
At Thu, 25 May 2017 17:52:50 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <e082a56a-fd95-a250-3bae-0fff93832510@2ndquadrant.com> > Hi, > > We have had issue with walsender timeout when used with logical decoding > and the transaction is taking long time to be decoded (because it > contains many changes) > > I was looking today at the walsender code and realized that it's because > if the network and downstream are fast enough, we'll always take fast > path in WalSndWriteData which does not do reply or keepalive processing > and is only reached once the transaction has finished by other code. So > paradoxically we die of timeout because everything was fast enough to > never fall back to slow code path. > > I propose we only use fast path if the last processed reply is not older > than half of walsender timeout, if it is then we'll force the slow code > path to process the replies again. This is similar logic that we use to > determine if to send keepalive message. I also added CHECK_INTERRUPRS > call to fast code path because otherwise walsender might ignore them for > too long on large transactions. > > Thoughts? + TimestampTz now = GetCurrentTimestamp(); I think it is not recommended to read the current time too frequently, especially within a loop that hates slowness. (I suppose that a loop that can fill up a send queue falls into that category.) If you don't mind a certain amount of additional complexity for eliminating the possible slowdown by the check, timeout would be usable. Attached patch does almost the same thing with your patch but without busy time check. What do you think about this? # I saw that SIGQUIT doens't work for active publisher, which I # think mention in another thread. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 49cce38..ec33357 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -194,6 +194,15 @@ static volatile sig_atomic_t replication_active = false;static LogicalDecodingContext *logical_decoding_ctx= NULL;static XLogRecPtr logical_startptr = InvalidXLogRecPtr; +/* + * For logical replication, WalSndWriteData needs to process replies from the + * client to check if keepalive to be sent. WAL send loop may skip check + * replies only while check_replies_needed = false. + */ +#define REPLY_SEND_TIMEOUT USER_TIMEOUT +static bool keepalive_timeout_initialized = false; +static bool check_replies_needed = false; +/* A sample associating a WAL location with the time it was written. */typedef struct{ @@ -1175,12 +1184,12 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, memcpy(&ctx->out->data[1+ sizeof(int64) + sizeof(int64)], tmpbuf.data, sizeof(int64)); - /* fast path */ /* Try to flush pending output to the client */ if (pq_flush_if_writable() != 0) WalSndShutdown(); - if (!pq_is_send_pending()) + /* fast path: return immediately if possible */ + if (!check_replies_needed && !pq_is_send_pending()) return; for (;;) @@ -1216,10 +1225,6 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, if (pq_flush_if_writable()!= 0) WalSndShutdown(); - /* If we finished clearing the buffered data, we're done here. */ - if (!pq_is_send_pending()) - break; - now = GetCurrentTimestamp(); /* die if timeout was reached */ @@ -1228,6 +1233,10 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, /* Send keepaliveif the time has come */ WalSndKeepaliveIfNecessary(now); + /* If we finished clearing the buffered data, we're done here. */ + if (!pq_is_send_pending()) + break; + sleeptime = WalSndComputeSleeptime(now); wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | @@ -1562,6 +1571,13 @@ exec_replication_command(const char *cmd_string) return true;} +static void +LogicalDecodeReplyTimeoutHandler(void) +{ + check_replies_needed = true; +} + +/* * Process any incoming messages while streaming. Also checks if the remote * end has closed the connection. @@ -1662,6 +1678,22 @@ ProcessRepliesIfAny(void) { last_reply_timestamp = GetCurrentTimestamp(); waiting_for_ping_response= false; + + if (wal_sender_timeout > 0) + { + if (!keepalive_timeout_initialized) + { + RegisterTimeout(REPLY_SEND_TIMEOUT, + LogicalDecodeReplyTimeoutHandler); + keepalive_timeout_initialized = true; + } + + check_replies_needed = false; + enable_timeout_at(REPLY_SEND_TIMEOUT, + TimestampTzPlusMilliseconds(last_reply_timestamp, + wal_sender_timeout / 2)); + } + }}
On 30/05/17 11:02, Kyotaro HORIGUCHI wrote: > At Thu, 25 May 2017 17:52:50 +0200, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <e082a56a-fd95-a250-3bae-0fff93832510@2ndquadrant.com> >> Hi, >> >> We have had issue with walsender timeout when used with logical decoding >> and the transaction is taking long time to be decoded (because it >> contains many changes) >> >> I was looking today at the walsender code and realized that it's because >> if the network and downstream are fast enough, we'll always take fast >> path in WalSndWriteData which does not do reply or keepalive processing >> and is only reached once the transaction has finished by other code. So >> paradoxically we die of timeout because everything was fast enough to >> never fall back to slow code path. >> >> I propose we only use fast path if the last processed reply is not older >> than half of walsender timeout, if it is then we'll force the slow code >> path to process the replies again. This is similar logic that we use to >> determine if to send keepalive message. I also added CHECK_INTERRUPRS >> call to fast code path because otherwise walsender might ignore them for >> too long on large transactions. >> >> Thoughts? > > + TimestampTz now = GetCurrentTimestamp(); > > I think it is not recommended to read the current time too > frequently, especially within a loop that hates slowness. (I > suppose that a loop that can fill up a send queue falls into that Yeah that was my main worry for the patch as well, although given that the loop does tuple processing it might not be very noticeable. > category.) If you don't mind a certain amount of additional > complexity for eliminating the possible slowdown by the check, > timeout would be usable. Attached patch does almost the same > thing with your patch but without busy time check. > > What do you think about this? > I think we could do it that way. > # I saw that SIGQUIT doens't work for active publisher, which I > # think mention in another thread. Ah missed that email I guess, but that's the missing CHECK_INTERRUPTS(); in the fast-path which btw your updated patch is missing as well. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 30/05/17 15:44, Petr Jelinek wrote: > On 30/05/17 11:02, Kyotaro HORIGUCHI wrote: >> >> + TimestampTz now = GetCurrentTimestamp(); >> >> I think it is not recommended to read the current time too >> frequently, especially within a loop that hates slowness. (I >> suppose that a loop that can fill up a send queue falls into that > > Yeah that was my main worry for the patch as well, although given that > the loop does tuple processing it might not be very noticeable. > I realized we actually call GetCurrentTimestamp() there anyway (for the pq_sendint64). So I just modified the patch to use the now variable there instead which means there are no additional GetCurrentTimestamp() calls compared to state before patch now. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: not tested Spec compliant: not tested Documentation: not tested There is no check for (last_reply_timestamp <= 0 || wal_sender_timeout <= 0) as in other places (in WalSndKeepaliveIfNecessary for example). I don't think moving update of 'now' down to end of loop body is correct: there are calls to ProcessConfigFile with SyncRepInitConfig, ProcessRepliesIfAny that can last non-negligible time. It could lead to over sleeping due to larger computed sleeptime. Though I could be mistaken. I'm not sure about moving `if (!pg_is_send_pending())` in a body loop after WalSndKeepaliveIfNecessary. Is it necessary? But it looks harmless at least. Could patch be reduced to check after first `if (!pg_is_sendpending())` ? like: if (!pq_is_send_pending()) - return; + { + if (last_reply_timestamp <= 0 || wal_sender_timeout <= 0) + { + CHECK_FOR_INTERRUPTS(); + return; + } + if (now <= TimestampTzPlusMilliseconds(last_reply_timestamp, wal_sender_timeout / 2)) + return; + } If not, what problem prevents? The new status of this patch is: Waiting on Author
On 02/08/17 19:35, Yura Sokolov wrote: > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: not tested > Spec compliant: not tested > Documentation: not tested > > There is no check for (last_reply_timestamp <= 0 || wal_sender_timeout <= 0) as in other places > (in WalSndKeepaliveIfNecessary for example). > > I don't think moving update of 'now' down to end of loop body is correct: > there are calls to ProcessConfigFile with SyncRepInitConfig, ProcessRepliesIfAny that can > last non-negligible time. It could lead to over sleeping due to larger computed sleeptime. > Though I could be mistaken. > > I'm not sure about moving `if (!pg_is_send_pending())` in a body loop after WalSndKeepaliveIfNecessary. > Is it necessary? But it looks harmless at least. > We also need to do actual timeout handing so that the timeout is not deferred to the end of the transaction (Which is why I moved `if (!pg_is_send_pending())` under WalSndCheckTimeOut() and WalSndKeepaliveIfNecessary() calls). > Could patch be reduced to check after first `if (!pg_is_sendpending())` ? like: > > if (!pq_is_send_pending()) > - return; > + { > + if (last_reply_timestamp <= 0 || wal_sender_timeout <= 0) > + { > + CHECK_FOR_INTERRUPTS(); > + return; > + } > + if (now <= TimestampTzPlusMilliseconds(last_reply_timestamp, wal_sender_timeout / 2)) > + return; > + } > > If not, what problem prevents? We should do CHECK_FOR_INTERRUPTS() independently of pq_is_send_pending so that it's possible to stop walsender while it's processing large transaction. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 9 August 2017 at 21:23, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
On 02/08/17 19:35, Yura Sokolov wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world: tested, passed
> Implements feature: not tested
> Spec compliant: not tested
> Documentation: not tested
>
> There is no check for (last_reply_timestamp <= 0 || wal_sender_timeout <= 0) as in other places
> (in WalSndKeepaliveIfNecessary for example).
>
> I don't think moving update of 'now' down to end of loop body is correct:
> there are calls to ProcessConfigFile with SyncRepInitConfig, ProcessRepliesIfAny that can
> last non-negligible time. It could lead to over sleeping due to larger computed sleeptime.
> Though I could be mistaken.
>
> I'm not sure about moving `if (!pg_is_send_pending())` in a body loop after WalSndKeepaliveIfNecessary.
> Is it necessary? But it looks harmless at least.
>
We also need to do actual timeout handing so that the timeout is not
deferred to the end of the transaction (Which is why I moved `if
(!pg_is_send_pending())` under WalSndCheckTimeOut() and
WalSndKeepaliveIfNecessary() calls).
> Could patch be reduced to check after first `if (!pg_is_sendpending())` ? like:
>
> if (!pq_is_send_pending())
> - return;
> + {
> + if (last_reply_timestamp <= 0 || wal_sender_timeout <= 0)
> + {
> + CHECK_FOR_INTERRUPTS();
> + return;
> + }
> + if (now <= TimestampTzPlusMilliseconds(last_reply_timestamp, wal_sender_timeout / 2))
> + return;
> + }
>
> If not, what problem prevents?
We should do CHECK_FOR_INTERRUPTS() independently of pq_is_send_pending
so that it's possible to stop walsender while it's processing large
transaction.
Is there any chance of getting this bugfix into Pg 10?
We've just cut back branches, so it'd be a sensible time.
On 2017-08-09 16:23, Petr Jelinek wrote: > On 02/08/17 19:35, Yura Sokolov wrote: >> The following review has been posted through the commitfest >> application: >> make installcheck-world: tested, passed >> Implements feature: not tested >> Spec compliant: not tested >> Documentation: not tested >> >> There is no check for (last_reply_timestamp <= 0 || wal_sender_timeout >> <= 0) as in other places >> (in WalSndKeepaliveIfNecessary for example). >> >> I don't think moving update of 'now' down to end of loop body is >> correct: >> there are calls to ProcessConfigFile with SyncRepInitConfig, >> ProcessRepliesIfAny that can >> last non-negligible time. It could lead to over sleeping due to larger >> computed sleeptime. >> Though I could be mistaken. >> >> I'm not sure about moving `if (!pg_is_send_pending())` in a body loop >> after WalSndKeepaliveIfNecessary. >> Is it necessary? But it looks harmless at least. >> > > We also need to do actual timeout handing so that the timeout is not > deferred to the end of the transaction (Which is why I moved `if > (!pg_is_send_pending())` under WalSndCheckTimeOut() and > WalSndKeepaliveIfNecessary() calls). > If standby really stalled, then it will not read from socket, and then `pg_is_sendpending` eventually will return false, and timeout will be checked. If standby doesn't stall, then `last_reply_timestamp` will be updated in `ProcessRepliesIfAny`, and so timeout will not be triggered. Do I understand correctly, or I missed something? >> Could patch be reduced to check after first `if >> (!pg_is_sendpending())` ? like: >> >> if (!pq_is_send_pending()) >> - return; >> + { >> + if (last_reply_timestamp <= 0 || wal_sender_timeout <= 0) >> + { >> + CHECK_FOR_INTERRUPTS(); >> + return; >> + } >> + if (now <= TimestampTzPlusMilliseconds(last_reply_timestamp, >> wal_sender_timeout / 2)) >> + return; >> + } >> >> If not, what problem prevents? > > We should do CHECK_FOR_INTERRUPTS() independently of pq_is_send_pending > so that it's possible to stop walsender while it's processing large > transaction. In this case CHECK_FOR_INTERRUPTS will be called after wal_sender_timeout/2. (This diff is for first appearance of `pq_is_send_pending` in a function). If it should be called more often, then patch could be simplier: if (!pq_is_send_pending()) - return; + { + CHECK_FOR_INTERRUPTS(); + if (last_reply_timestamp <= 0 || wal_sender_timeout <= 0 || + now <= TimestampTzPlusMilliseconds(last_reply_timestamp, wal_sender_timeout / 2)) + return; + } (Still, I could be mistaken, it is just suggestion). Is it hard to add test for case this patch fixes? With regards, -- Sokolov Yura aka funny_falcon Postgres Professional: https://postgrespro.ru The Russian Postgres Company
On 2017-05-25 17:52 Petr Jelinek wrote: > Hi, > > We have had issue with walsender timeout when used with logical > decoding and the transaction is taking long time to be decoded > (because it contains many changes) > > I was looking today at the walsender code and realized that it's > because if the network and downstream are fast enough, we'll always > take fast path in WalSndWriteData which does not do reply or > keepalive processing and is only reached once the transaction has > finished by other code. So paradoxically we die of timeout because > everything was fast enough to never fall back to slow code path. > > I propose we only use fast path if the last processed reply is not > older than half of walsender timeout, if it is then we'll force the > slow code path to process the replies again. This is similar logic > that we use to determine if to send keepalive message. I also added > CHECK_INTERRUPRS call to fast code path because otherwise walsender > might ignore them for too long on large transactions. > > Thoughts? > On 2017-08-10 14:20 Sokolov Yura wrote: > On 2017-08-09 16:23, Petr Jelinek wrote: > > On 02/08/17 19:35, Yura Sokolov wrote: > >> The following review has been posted through the commitfest > >> application: > >> make installcheck-world: tested, passed > >> Implements feature: not tested > >> Spec compliant: not tested > >> Documentation: not tested > >> > >> There is no check for (last_reply_timestamp <= 0 || > >> wal_sender_timeout <= 0) as in other places > >> (in WalSndKeepaliveIfNecessary for example). > >> > >> I don't think moving update of 'now' down to end of loop body is > >> correct: > >> there are calls to ProcessConfigFile with SyncRepInitConfig, > >> ProcessRepliesIfAny that can > >> last non-negligible time. It could lead to over sleeping due to > >> larger computed sleeptime. > >> Though I could be mistaken. > >> > >> I'm not sure about moving `if (!pg_is_send_pending())` in a body > >> loop after WalSndKeepaliveIfNecessary. > >> Is it necessary? But it looks harmless at least. > >> > > > > We also need to do actual timeout handing so that the timeout is not > > deferred to the end of the transaction (Which is why I moved `if > > (!pg_is_send_pending())` under WalSndCheckTimeOut() and > > WalSndKeepaliveIfNecessary() calls). > > > > If standby really stalled, then it will not read from socket, and then > `pg_is_sendpending` eventually will return false, and timeout will be > checked. > If standby doesn't stall, then `last_reply_timestamp` will be updated > in `ProcessRepliesIfAny`, and so timeout will not be triggered. > Do I understand correctly, or I missed something? > > >> Could patch be reduced to check after first `if > >> (!pg_is_sendpending())` ? like: > >> > >> if (!pq_is_send_pending()) > >> - return; > >> + { > >> + if (last_reply_timestamp <= 0 || > >> wal_sender_timeout <= 0) > >> + { > >> + CHECK_FOR_INTERRUPTS(); > >> + return; > >> + } > >> + if (now <= > >> TimestampTzPlusMilliseconds(last_reply_timestamp, > >> wal_sender_timeout / 2)) > >> + return; > >> + } > >> > >> If not, what problem prevents? > > > > We should do CHECK_FOR_INTERRUPTS() independently of > > pq_is_send_pending so that it's possible to stop walsender while > > it's processing large transaction. > > In this case CHECK_FOR_INTERRUPTS will be called after > wal_sender_timeout/2. > (This diff is for first appearance of `pq_is_send_pending` in a > function). > > If it should be called more often, then patch could be simplier: > > if (!pq_is_send_pending()) > - return; > + { > + CHECK_FOR_INTERRUPTS(); > + if (last_reply_timestamp <= 0 || wal_sender_timeout > <= 0 || > + now <= > TimestampTzPlusMilliseconds(last_reply_timestamp, > wal_sender_timeout / 2)) > + return; > + } > > (Still, I could be mistaken, it is just suggestion). > > Is it hard to add test for case this patch fixes? > > With regards, Tom, Robert, I believe this bug have to be fixed in Pg10, and I don't wonna be that guy who prevents it from being fixed. What should/could I do? ( https://commitfest.postgresql.org/14/1151/ ) -- With regards, Sokolov Yura aka funny_falcon Postgres Professional: https://postgrespro.ru The Russian Postgres Company
I've changed to "need review" to gain more attention from other.
Hello, At Wed, 06 Sep 2017 13:46:16 +0000, Yura Sokolov <funny.falcon@postgrespro.ru> wrote in <20170906134616.18925.88390.pgcf@coridan.postgresql.org> > I've changed to "need review" to gain more attention from other. I understand that the problem here is too fast network prohibits walsender from sending replies. In physical replication, WAL records are sent as soon as written and the timeout is handled in the topmost loop in WalSndLoop. In logical replication, data is sent at once at commit time in most cases. So it can take a long time in ReorderBufferCommit without returning to WalSndLoop (or even XLogSendLogical). One problem here is that WalSndWriteData waits for the arrival of the next *WAL record* in the slow-ptah because it is called by cues of ReorderBuffer* functions (mainly *Commit) irrelevantly to WAL insertion. This is I think the root cause of this problem. On the other hand, it ought to take a sleep when network is stalled, in other words, data to send remains after a flush. We don't have a means to signal when the socket queue gets a new room for another bytes. However, I suppose that such slow network allows us to sleep several or several tens of milliseconds. Or, if we could know how many bytes ps_flush_if_writable() pushed, it's enough to wait only when the function returns pushing nothing. As the result, I think that the functions should be modified as the following. - Forcing slow-path if time elapses a half of a ping period is right. (GetCurrentTimestamp is anyway requried.) - The slow-path should not sleep waiting Latch. It should just pg_usleep() for maybe 1-10ms. - We should go to the fast path just after keepalive or response message has been sent. In other words, the "if (now <" blockshould be in the "for (;;)" loop. This avoids needless runs on the slow-path. It would be refactorable as the following. prepare for the send buffer; for (;;) { now = GetCurrentTimeStamp(); if (now < )... { fast-path } else { slow-path } return iffinished sleep for 1ms? } What do you think about this? regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-09-12 11:28, Kyotaro HORIGUCHI wrote: > Hello, > > At Wed, 06 Sep 2017 13:46:16 +0000, Yura Sokolov > <funny.falcon@postgrespro.ru> wrote in > <20170906134616.18925.88390.pgcf@coridan.postgresql.org> >> I've changed to "need review" to gain more attention from other. > > I understand that the problem here is too fast network prohibits > walsender from sending replies. > > In physical replication, WAL records are sent as soon as written > and the timeout is handled in the topmost loop in WalSndLoop. In > logical replication, data is sent at once at commit time in most > cases. So it can take a long time in ReorderBufferCommit without > returning to WalSndLoop (or even XLogSendLogical). > > One problem here is that WalSndWriteData waits for the arrival of > the next *WAL record* in the slow-ptah because it is called by > cues of ReorderBuffer* functions (mainly *Commit) irrelevantly to > WAL insertion. This is I think the root cause of this problem. > > On the other hand, it ought to take a sleep when network is > stalled, in other words, data to send remains after a flush. We > don't have a means to signal when the socket queue gets a new > room for another bytes. However, I suppose that such slow network > allows us to sleep several or several tens of milliseconds. Or, > if we could know how many bytes ps_flush_if_writable() pushed, > it's enough to wait only when the function returns pushing > nothing. > > As the result, I think that the functions should be modified as > the following. > > - Forcing slow-path if time elapses a half of a ping period is > right. (GetCurrentTimestamp is anyway requried.) > > - The slow-path should not sleep waiting Latch. It should just > pg_usleep() for maybe 1-10ms. > > - We should go to the fast path just after keepalive or response > message has been sent. In other words, the "if (now <" block > should be in the "for (;;)" loop. This avoids needless runs on > the slow-path. > > > It would be refactorable as the following. > > prepare for the send buffer; > > for (;;) > { > now = GetCurrentTimeStamp(); > if (now < )... > { > fast-path > } > else > { > slow-path > } > return if finished > sleep for 1ms? > } > > > What do you think about this? > > regards, > > -- > Kyotaro Horiguchi > NTT Open Source Software Center Good day, Petr, Kyotaro I've created failing test for issue (0001-Add-failing-test...) . It tests insertion of 20000 rows with 10ms wal_sender_timeout (it fails in WalSndWriteData on master) and then deletion of those rows with 1ms wal_sender_timeout (it fails in WalSndLoop). Both Peter's patch and my simplified suggestion didn't pass the test. I didn't checked Kyotaro's suggestion, though, cause I didn't understand it well. I've made patch that passes the test (0002-Fix-walsender...) . (I've used Petr's commit message. Don't you mind, Petr?) In WalSndWriteData it adds CHECK_FOR_INTERRUPTS to fastpath and falls through to slow path after half of wal_sender_timeout as were discussed. In a slow path, it just skips fast exit on `!pq_is_send_pending()` and check for timeout for the first loop iteration. And it sets sleeptime to 1ms even if timeout were reached. It gives a chance to receiver's response to arrive. In WalSndLoop it also skips check for timeout first iteration after send_data were called, and also sleeps at least 1 ms. I'm not sure about correctness of my patch. Given test exists, you may suggest better solutions, or improve this solution. I'll set commitfest topic status to 'Need review' assuming my patch could be reviewed. -- Sokolov Yura aka funny_falcon Postgres Professional: https://postgrespro.ru The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Hello, At Wed, 27 Sep 2017 14:28:37 +0300, Sokolov Yura <funny.falcon@postgrespro.ru> wrote in <90bb67da7131e6186b50897c4b0f0ec3@postgrespro.ru> > On 2017-09-12 11:28, Kyotaro HORIGUCHI wrote: > > Hello, > > At Wed, 06 Sep 2017 13:46:16 +0000, Yura Sokolov > > <funny.falcon@postgrespro.ru> wrote in > > <20170906134616.18925.88390.pgcf@coridan.postgresql.org> > > As the result, I think that the functions should be modified as > > the following. > > - Forcing slow-path if time elapses a half of a ping period is > > right. (GetCurrentTimestamp is anyway requried.) > > - The slow-path should not sleep waiting Latch. It should just > > pg_usleep() for maybe 1-10ms. > > - We should go to the fast path just after keepalive or response > > message has been sent. In other words, the "if (now <" block > > should be in the "for (;;)" loop. This avoids needless runs on > > the slow-path. > > It would be refactorable as the following. > > prepare for the send buffer; > > for (;;) > > { > > now = GetCurrentTimeStamp(); > > if (now < )... > > { > > fast-path > > } > > else > > { > > slow-path > > } > > return if finished > > sleep for 1ms? > > } > > What do you think about this? > > regards, > > -- > > Kyotaro Horiguchi > > NTT Open Source Software Center > > Good day, Petr, Kyotaro > > I've created failing test for issue (0001-Add-failing-test...) . > It tests insertion of 20000 rows with 10ms wal_sender_timeout > (it fails in WalSndWriteData on master) and then deletion of > those rows with 1ms wal_sender_timeout (it fails in WalSndLoop). > > Both Peter's patch and my simplified suggestion didn't pass the > test. I didn't checked Kyotaro's suggestion, though, cause I > didn't understand it well. Mmm. The test seems broken. wal_sender_timeout = 10ms with wal_receiver_status_interval=10s immediately causes a timeout. Avoiding the timeout is just breaking the sane code. wal_sender_timeout = 3s and wal_receiver_status_interval=1s effectively causes the problem with about 1000000 lines of (int) insertion on UNIX socket connection, on my poor box. The original complain here came from the fact that WalSndWriteData skips processing of replies for a long time on a fast network. However Petr's patch fixed the problem, I pointed that just letting the function take the slow path leads to another problem, that is, waiting for new WAL records can result in a unwanted pause in the slow path. Combining the solutions for the two problem is my proposal sited above. The sentence seems in quite bad style but the attached file is the concorete patch of that. Any thoughts? regards, -- Kyotaro Horiguchi NTT Open Source Software Center *** a/src/backend/replication/walsender.c --- b/src/backend/replication/walsender.c *************** *** 1151,1156 **** static void --- 1151,1158 ---- WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, bool last_write){ + TimestampTz now = GetCurrentTimestamp(); + /* output previously gathered data in a CopyData packet */ pq_putmessage_noblock('d', ctx->out->data, ctx->out->len); *************** *** 1160,1235 **** WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, * several releasesby streaming physical replication. */ resetStringInfo(&tmpbuf); ! pq_sendint64(&tmpbuf, GetCurrentTimestamp()); memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)], tmpbuf.data, sizeof(int64)); - /* fast path */ - /* Try to flush pending output to the client */ - if (pq_flush_if_writable() != 0) - WalSndShutdown(); - - if (!pq_is_send_pending()) - return; - for (;;) { int wakeEvents; long sleeptime; TimestampTz now; /* ! * Emergency bailout if postmaster has died. This is to avoid the ! * necessity for manual cleanup of all postmaster children. */ ! if (!PostmasterIsAlive()) ! exit(1); ! ! /* Clear any already-pending wakeups */ ! ResetLatch(MyLatch); ! ! CHECK_FOR_INTERRUPTS(); ! ! /* Process any requests or signals received recently */ ! if (ConfigReloadPending) ! { ! ConfigReloadPending = false; ! ProcessConfigFile(PGC_SIGHUP); ! SyncRepInitConfig(); ! } ! ! /* Check for input from the client */ ! ProcessRepliesIfAny(); ! ! /* Try to flush pending output to the client */ ! if (pq_flush_if_writable() != 0) ! WalSndShutdown(); ! ! /* If we finished clearing the buffered data, we're done here. */ ! if (!pq_is_send_pending()) ! break; ! now = GetCurrentTimestamp(); - - /* die if timeout was reached */ - WalSndCheckTimeOut(now); - - /* Send keepalive if the time has come */ - WalSndKeepaliveIfNecessary(now); - - sleeptime = WalSndComputeSleeptime(now); - - wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | - WL_SOCKET_WRITEABLE | WL_SOCKET_READABLE | WL_TIMEOUT; - - /* Sleep until something happens or we time out */ - WaitLatchOrSocket(MyLatch, wakeEvents, - MyProcPort->sock, sleeptime, - WAIT_EVENT_WAL_SENDER_WRITE_DATA); } - - /* reactivate latch so WalSndLoop knows to continue */ - SetLatch(MyLatch); } /* --- 1162,1238 ---- * several releases by streaming physical replication. */ resetStringInfo(&tmpbuf); ! pq_sendint64(&tmpbuf, now); memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)], tmpbuf.data,sizeof(int64)); for (;;) { int wakeEvents; long sleeptime; TimestampTz now; + if (now <= TimestampTzPlusMilliseconds(last_reply_timestamp, + wal_sender_timeout / 2)) + { + /* fast path */ + /* Try to flush pending output to the client */ + if (pq_flush_if_writable() != 0) + WalSndShutdown(); + } + else + { + /* + * Emergency bailout if postmaster has died. This is to avoid the + * necessity for manual cleanup of all postmaster children. + */ + if (!PostmasterIsAlive()) + exit(1); + + CHECK_FOR_INTERRUPTS(); + + /* Process any requests or signals received recently */ + if (ConfigReloadPending) + { + ConfigReloadPending = false; + ProcessConfigFile(PGC_SIGHUP); + SyncRepInitConfig(); + } + + /* Check for input from the client */ + ProcessRepliesIfAny(); + + /* Try to flush pending output to the client */ + if (pq_flush_if_writable() != 0) + WalSndShutdown(); + + /* If we finished clearing the buffered data, we're done here. */ + if (!pq_is_send_pending()) + break; + + /* die if timeout was reached */ + WalSndCheckTimeOut(now); + + /* Send keepalive if the time has come */ + WalSndKeepaliveIfNecessary(now); + } + + /* return if finished */ + if (!pq_is_send_pending()) + return; + /* ! * Send buffer is filled up. Being different from WalSndLoop, since ! * this function is called at a commit time for all the modifications ! * within the commit, we should continue sending data without waiting ! * for the next WAL record. ! * ! * Not sure how long we should wait for a room to send more data, but ! * 1ms would be sufficient. */ ! pg_usleep(1000); now = GetCurrentTimestamp(); } } /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Good day, Kyoutaro On 2017-09-29 11:26, Kyotaro HORIGUCHI wrote: > Hello, > > At Wed, 27 Sep 2017 14:28:37 +0300, Sokolov Yura > <funny.falcon@postgrespro.ru> wrote in > <90bb67da7131e6186b50897c4b0f0ec3@postgrespro.ru> >> On 2017-09-12 11:28, Kyotaro HORIGUCHI wrote: >> > Hello, >> > At Wed, 06 Sep 2017 13:46:16 +0000, Yura Sokolov >> > <funny.falcon@postgrespro.ru> wrote in >> > <20170906134616.18925.88390.pgcf@coridan.postgresql.org> >> > As the result, I think that the functions should be modified as >> > the following. >> > - Forcing slow-path if time elapses a half of a ping period is >> > right. (GetCurrentTimestamp is anyway requried.) >> > - The slow-path should not sleep waiting Latch. It should just >> > pg_usleep() for maybe 1-10ms. >> > - We should go to the fast path just after keepalive or response >> > message has been sent. In other words, the "if (now <" block >> > should be in the "for (;;)" loop. This avoids needless runs on >> > the slow-path. >> > It would be refactorable as the following. >> > prepare for the send buffer; >> > for (;;) >> > { >> > now = GetCurrentTimeStamp(); >> > if (now < )... >> > { >> > fast-path >> > } >> > else >> > { >> > slow-path >> > } >> > return if finished >> > sleep for 1ms? >> > } >> > What do you think about this? >> > regards, >> > -- >> > Kyotaro Horiguchi >> > NTT Open Source Software Center >> >> Good day, Petr, Kyotaro >> >> I've created failing test for issue (0001-Add-failing-test...) . >> It tests insertion of 20000 rows with 10ms wal_sender_timeout >> (it fails in WalSndWriteData on master) and then deletion of >> those rows with 1ms wal_sender_timeout (it fails in WalSndLoop). >> >> Both Peter's patch and my simplified suggestion didn't pass the >> test. I didn't checked Kyotaro's suggestion, though, cause I >> didn't understand it well. > > Mmm. The test seems broken. wal_sender_timeout = 10ms with > wal_receiver_status_interval=10s immediately causes a > timeout. Avoiding the timeout is just breaking the sane code. I think you're not right. `wal_receiver_status_interval` is just for status update, not for replies. Before I made my patch version, I've added logging to every `now` and `last_reply_timestamp` during test run. `last_reply_timestamp` definitely updates more often than once in 10s. So, `wal_receiver_status_interval = 10s` has nothing in common with receiver's replies, as I see. (btw, logging slows down sender enough to "fix" test :-) ) And my patch doesn't avoid timeout check, so it doesn't break anything. It just delays timeout on 1ms. Given, it is unpractical to set wal_sender_timeout less than 50ms in production, this 1ms increase in timeout check is negligible. And I've checked just now that my patch passes test even with wal_receiver_status_interval = 10s. > > wal_sender_timeout = 3s and wal_receiver_status_interval=1s > effectively causes the problem with about 1000000 lines of (int) > insertion on UNIX socket connection, on my poor box. I don't want to make test to lasts so long and generate so many data. That is why I used such small timeouts for tests. > The original complain here came from the fact that > WalSndWriteData skips processing of replies for a long time on a > fast network. However Petr's patch fixed the problem, I pointed > that just letting the function take the slow path leads to > another problem, that is, waiting for new WAL records can result > in a unwanted pause in the slow path. > > Combining the solutions for the two problem is my proposal sited > above. The sentence seems in quite bad style but the attached > file is the concorete patch of that. Test is failing if there is "short quit" after `!pq_is_send_pending()`, so I doubt your patch will pass the test. And you've change calculated sleep time with sane waiting on all insteresting events (using WaitLatchOrSocket) to semi-busy loop. It at least could affect throughput. And why did you remove `SetLatch(MyLatch)` in the end of function? Probably this change is correct, but not obvious. > Any thoughts? It certainly could be my test and my patch is wrong. But my point is that test should be written first. Especially for such difficult case. Without test it is impossible to say does our patches fix something. And it is impossible to say if patch does something wrong. And impossible to say if patch fixes this problem but introduce new problem. Please, write test for your remarks. If you think, my patch breaks something, write test for the case my patch did broke. If you think my test is wrong, write your test that is more correct. Without tests it will be just bird's hubbub. > regards, regards, -- Sokolov Yura aka funny_falcon Postgres Professional: https://postgrespro.ru The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Hello Sokolov. At Fri, 29 Sep 2017 15:19:23 +0300, Sokolov Yura <funny.falcon@postgrespro.ru> wrote in <d076dae18b437be89c787a854034f3f2@postgrespro.ru> > I don't want to make test to lasts so long and generate so many data. > That is why I used such small timeouts for tests. I understand your point, but still *I* think such a short timeout is out of expectation by design. (But it can be set.) Does anyone have opinions on this? > Test is failing if there is "short quit" after > `!pq_is_send_pending()`, > so I doubt your patch will pass the test. It is because I think that the test "should" fail since the timeout is out of expected range. I (and perhaps also Petr) is thinking that the problem is just that a large transaction causes a timeout with an ordinary timeout. My test case is based on the assumption. Your test is for a timeout during replication-startup with extremely short timeout. This may be a different problem to discuss, but perhaps better to be solved together. I'd like to have opnions from others on this point. > And you've change calculated sleep time with sane waiting on all > insteresting events (using WaitLatchOrSocket) to semi-busy loop. > It at least could affect throughput. Uggh! I misunderstood there. It wais for writing socket so the sleep is wrong and WaitLatchOrSocket is right. After all, I put +1 for Petr's latest patch. Sorry for my carelessness. > And why did you remove `SetLatch(MyLatch)` in the end of function? > Probably this change is correct, but not obvious. Latch is needless there if it waited a fixed duration, but if it waits writefd events there, also latch should be waited. > > Any thoughts? > > It certainly could be my test and my patch is wrong. But my point > is that test should be written first. Especially for such difficult > case. Without test it is impossible to say does our patches fix > something. And it is impossible to say if patch does something > wrong. And impossible to say if patch fixes this problem but > introduce new problem. > > Please, write test for your remarks. If you think, my patch breaks > something, write test for the case my patch did broke. If you think > my test is wrong, write your test that is more correct. > > Without tests it will be just bird's hubbub. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Hi, sorry for the delay but I didn't have much time in past weeks to follow this thread. On 02/10/17 05:44, Kyotaro HORIGUCHI wrote: > Hello Sokolov. > > At Fri, 29 Sep 2017 15:19:23 +0300, Sokolov Yura <funny.falcon@postgrespro.ru> wrote in <d076dae18b437be89c787a854034f3f2@postgrespro.ru> >> I don't want to make test to lasts so long and generate so many data. >> That is why I used such small timeouts for tests. > > I understand your point, but still *I* think such a short timeout > is out of expectation by design. (But it can be set.) > > Does anyone have opinions on this? Yes, it's not practically useful to have such small wal_sender_timeout given that the main purpose of that is to detect network issues. > >> Test is failing if there is "short quit" after >> `!pq_is_send_pending()`, >> so I doubt your patch will pass the test. > > It is because I think that the test "should" fail since the > timeout is out of expected range. I (and perhaps also Petr) is > thinking that the problem is just that a large transaction causes > a timeout with an ordinary timeout. My test case is based on the > assumption. > > Your test is for a timeout during replication-startup with > extremely short timeout. This may be a different problem to > discuss, but perhaps better to be solved together. > > I'd like to have opnions from others on this point. I think it's different problem and because of what I wrote above it does not seem to me like something we should spend out time on trying to fix. > > Uggh! I misunderstood there. It wais for writing socket so the > sleep is wrong and WaitLatchOrSocket is right. > > After all, I put +1 for Petr's latest patch. Sorry for my > carelessness. > Great, I attached version 3 which is just rebased on current master and also does not move the GetCurrentTimestamp() call because the concern about skewing the timestamp during config reload (and also network flush as I realized later) is valid. It's rather hard to test all the scenarios that this patch fixes in automated fashion without generating a lot of wal or adding sleeps to the processing. That's why I didn't produce usable TAP test. Since it seems like some of my reasoning is unclear I will try to describe it once more. The main problem we have is that unless we call the ProcessRepliesIfAny() before the wal_sender_timeout expires we'll die because of timeout eventually. The current coding will skip that call if there is a long transaction being processed (if network is fast enough). This is what the first part (first 2 hunks) of the patch solves. There is also issue that while this is happening the walsender ignores signals so it's impossible to stop it which is why I added the CHECK_FOR_INTERRUPTS to the fast path. The second problem is that if we solved just the first one, then if downstream (and again network) is fast enough, the ProcessRepliesIfAny() will not do anything useful because downstream is not going to send any response while the network buffer contains any data. This is caused by the fact that we normally code the receiver side to receive until there is something and only send reply when there is a "pause" in the stream. To get around this problem we also need to make sure that we send WalSndKeepaliveIfNecessary() periodically and that will not happen on fast network unless we do the second part of the patch (3rd hunk), ie, move the pq_is_send_pending() after the keepalive handling. This code is specific to logical decoding walsender interface so it only happens for logical decoding/replication (which means it should be backported all the way to 9.4). The physical one These two issues happen quite normally in the wild as all we need is big data load in single transaction, or update of large part of an table or something similar for this to happen with default settings. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Wed, Nov 1, 2017 at 8:19 PM, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: > sorry for the delay but I didn't have much time in past weeks to follow > this thread. + TimestampTz now = GetCurrentTimestamp(); + /* output previously gathered data in a CopyData packet */ pq_putmessage_noblock('d', ctx->out->data, ctx->out->len); /* * Fill the send timestamp last, so that it is taken as late as possible. * This is somewhat ugly, but theprotocol is set as it's already used for * several releases by streaming physical replication. */ resetStringInfo(&tmpbuf); - pq_sendint64(&tmpbuf, GetCurrentTimestamp()); + pq_sendint64(&tmpbuf, now); memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)], tmpbuf.data, sizeof(int64)); This change falsifies the comments. Maybe initialize now just after resetSetringInfo() is done. - /* fast path */ - /* Try to flush pending output to the client */ - if (pq_flush_if_writable() != 0) - WalSndShutdown(); + /* Try taking fast path unless we get too close to walsender timeout. */ + if (now < TimestampTzPlusMilliseconds(last_reply_timestamp, + wal_sender_timeout / 2)) + { + CHECK_FOR_INTERRUPTS(); - if (!pq_is_send_pending()) - return; + /* Try to flush pending output to the client */ + if (pq_flush_if_writable() != 0) + WalSndShutdown(); + + if (!pq_is_send_pending()) + return; + } I think it's only the if (!pq_is_send_pending()) return; that needs to be conditional here, isn't it? The pq_flush_if_writable() can be done unconditionally. Other than that this looks like a reasonable change to me, but I'm not an expert on this code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Hi, thanks for checking. On 02/11/17 10:00, Robert Haas wrote: > On Wed, Nov 1, 2017 at 8:19 PM, Petr Jelinek > <petr.jelinek@2ndquadrant.com> wrote: >> sorry for the delay but I didn't have much time in past weeks to follow >> this thread. > > + TimestampTz now = GetCurrentTimestamp(); > + > /* output previously gathered data in a CopyData packet */ > pq_putmessage_noblock('d', ctx->out->data, ctx->out->len); > > /* > * Fill the send timestamp last, so that it is taken as late as possible. > * This is somewhat ugly, but the protocol is set as it's already used for > * several releases by streaming physical replication. > */ > resetStringInfo(&tmpbuf); > - pq_sendint64(&tmpbuf, GetCurrentTimestamp()); > + pq_sendint64(&tmpbuf, now); > memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)], > tmpbuf.data, sizeof(int64)); > > This change falsifies the comments. Maybe initialize now just after > resetSetringInfo() is done. Eh, right, I can do that. > > - /* fast path */ > - /* Try to flush pending output to the client */ > - if (pq_flush_if_writable() != 0) > - WalSndShutdown(); > + /* Try taking fast path unless we get too close to walsender timeout. */ > + if (now < TimestampTzPlusMilliseconds(last_reply_timestamp, > + wal_sender_timeout / 2)) > + { > + CHECK_FOR_INTERRUPTS(); > > - if (!pq_is_send_pending()) > - return; > + /* Try to flush pending output to the client */ > + if (pq_flush_if_writable() != 0) > + WalSndShutdown(); > + > + if (!pq_is_send_pending()) > + return; > + } > > I think it's only the if (!pq_is_send_pending()) return; that needs to > be conditional here, isn't it? The pq_flush_if_writable() can be done > unconditionally. > Well, even the CHECK_FOR_INTERRUPTS() can be called unconditionally yes. It just seems like it's needless call as we'll call both in for loop anyway if we take the "slow" path. I admit it's not exactly big win though. If you think it would improve readability I can move it. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Hello, At Fri, 3 Nov 2017 15:54:09 +0100, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <e2939d26-f5cb-6581-0ca3-a1b0556ed729@2ndquadrant.com> > Hi, > > thanks for checking. > > On 02/11/17 10:00, Robert Haas wrote: > > On Wed, Nov 1, 2017 at 8:19 PM, Petr Jelinek > > <petr.jelinek@2ndquadrant.com> wrote: > >> sorry for the delay but I didn't have much time in past weeks to follow > >> this thread. > > > > + TimestampTz now = GetCurrentTimestamp(); > > + > > /* output previously gathered data in a CopyData packet */ > > pq_putmessage_noblock('d', ctx->out->data, ctx->out->len); > > > > /* > > * Fill the send timestamp last, so that it is taken as late as possible. > > * This is somewhat ugly, but the protocol is set as it's already used for > > * several releases by streaming physical replication. > > */ > > resetStringInfo(&tmpbuf); > > - pq_sendint64(&tmpbuf, GetCurrentTimestamp()); > > + pq_sendint64(&tmpbuf, now); > > memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)], > > tmpbuf.data, sizeof(int64)); > > > > This change falsifies the comments. Maybe initialize now just after > > resetSetringInfo() is done. > > Eh, right, I can do that. > > > > > - /* fast path */ > > - /* Try to flush pending output to the client */ > > - if (pq_flush_if_writable() != 0) > > - WalSndShutdown(); > > + /* Try taking fast path unless we get too close to walsender timeout. */ > > + if (now < TimestampTzPlusMilliseconds(last_reply_timestamp, > > + wal_sender_timeout / 2)) > > + { > > + CHECK_FOR_INTERRUPTS(); > > > > - if (!pq_is_send_pending()) > > - return; > > + /* Try to flush pending output to the client */ > > + if (pq_flush_if_writable() != 0) > > + WalSndShutdown(); > > + > > + if (!pq_is_send_pending()) > > + return; > > + } > > > > I think it's only the if (!pq_is_send_pending()) return; that needs to > > be conditional here, isn't it? The pq_flush_if_writable() can be done > > unconditionally. > > > > Well, even the CHECK_FOR_INTERRUPTS() can be called unconditionally yes. > It just seems like it's needless call as we'll call both in for loop > anyway if we take the "slow" path. I admit it's not exactly big win > though. If you think it would improve readability I can move it. I think this is the last message in this thread so I changed the status of the CF entry to "Waiting for Author". regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Hello, At Fri, 17 Nov 2017 13:24:08 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20171117.132408.85564852.horiguchi.kyotaro@lab.ntt.co.jp> > > Well, even the CHECK_FOR_INTERRUPTS() can be called unconditionally yes. > > It just seems like it's needless call as we'll call both in for loop > > anyway if we take the "slow" path. I admit it's not exactly big win > > though. If you think it would improve readability I can move it. > > I think this is the last message in this thread so I changed the > status of the CF entry to "Waiting for Author". Hmm. Somehow the last patch and Robert's comment * which is the base of the patch * has been reached me in reverse order. I found that the patch is the latest one and will look this soon. Sorry for the ignorance. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Ouch.. I'd doubly mistaked. > I found that the patch is the latest one and will look this > soon. Sorry for the ignorance. Thats...wrong. Sorry. There's no new patch since the Reboer's comment. I think this is just a bug fix and needs no more argument on its functionality. (and might ought to be backpatched?) At Fri, 3 Nov 2017 15:54:09 +0100, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <e2939d26-f5cb-6581-0ca3-a1b0556ed729@2ndquadrant.com> > > This change falsifies the comments. Maybe initialize now just after > > resetSetringInfo() is done. > > Eh, right, I can do that. It is reasonable. (Or rewrite the comment?) At Fri, 3 Nov 2017 15:54:09 +0100, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote in <e2939d26-f5cb-6581-0ca3-a1b0556ed729@2ndquadrant.com> > > > > - /* fast path */ > > - /* Try to flush pending output to the client */ > > - if (pq_flush_if_writable() != 0) > > - WalSndShutdown(); > > + /* Try taking fast path unless we get too close to walsender timeout. */ > > + if (now < TimestampTzPlusMilliseconds(last_reply_timestamp, > > + wal_sender_timeout / 2)) > > + { > > + CHECK_FOR_INTERRUPTS(); > > > > - if (!pq_is_send_pending()) > > - return; > > + /* Try to flush pending output to the client */ > > + if (pq_flush_if_writable() != 0) > > + WalSndShutdown(); > > + > > + if (!pq_is_send_pending()) > > + return; > > + } > > > > I think it's only the if (!pq_is_send_pending()) return; that needs to > > be conditional here, isn't it? The pq_flush_if_writable() can be done > > unconditionally. > > > > Well, even the CHECK_FOR_INTERRUPTS() can be called unconditionally yes. > It just seems like it's needless call as we'll call both in for loop > anyway if we take the "slow" path. I admit it's not exactly big win > though. If you think it would improve readability I can move it. Moving around the code allow us to place ps_is_send_pending() in the while condition, which seems to be more proper place to do that. I haven't added test for this particular case. I tested this that - cleanly applies on the current master HEAD and passes make check and subscription test. - walsender properly chooses the slow-path even if pq_is_send_pending() is always false. (happens on a fast enough network) - walsender waits properly waits on socket and process-reply time in WaitLatchOrSocket. - walsender exits by timeout on network stall. So, I think the patch is functionally perfect. I'm a reviewer of this patch but I think I'm not allowed to mark this "Ready for Commiter" since the last change is made by me. regards, -- Kyotaro Horiguchi NTT Open Source Software Center From 508c43ef43eebee7de89094658b74f9cef8e4342 Mon Sep 17 00:00:00 2001 From: Petr Jelinek <pjmodos@pjmodos.net> Date: Tue, 12 Sep 2017 17:31:28 +0900 Subject: [PATCH] Fix walsender timeouts when decoding large transaction The logical slots have fast code path for sending data in order to not impose too high per message overhead. The fast path skips checks for interrupts and timeouts. However, the existing coding failed to consider the fact that transaction with large number of changes may take very long to be processed and sent to the client. This causes walsender to ignore interrupts for potentially long time and more importantly it will cause walsender being killed due to timeout at the end of such transaction. This commit changes the fast path to also check for interrupts and only allows calling the fast path when last keeplaive check happened less than half of walsender timeout ago, otherwise the slower code path will be taken. ---src/backend/replication/walsender.c | 58 ++++++++++++++++++++-----------------1 file changed, 32 insertions(+), 26 deletions(-) diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index fa1db74..5a4c31f 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -1151,6 +1151,8 @@ static voidWalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, bool last_write){ + TimestampTz now; + /* output previously gathered data in a CopyData packet */ pq_putmessage_noblock('d', ctx->out->data, ctx->out->len); @@ -1160,23 +1162,46 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, * several releasesby streaming physical replication. */ resetStringInfo(&tmpbuf); - pq_sendint64(&tmpbuf, GetCurrentTimestamp()); + now = GetCurrentTimestamp(); + pq_sendint64(&tmpbuf, now); memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)], tmpbuf.data, sizeof(int64)); - /* fast path */ + CHECK_FOR_INTERRUPTS(); + /* Try to flush pending output to the client */ if (pq_flush_if_writable() != 0) WalSndShutdown(); - if (!pq_is_send_pending()) - return; + /* Try taking fast path unless we get too close to walsender timeout. */ + if (now < TimestampTzPlusMilliseconds(last_reply_timestamp, + wal_sender_timeout / 2)) + { + if (!pq_is_send_pending()) + return; + } - for (;;) + /* If we have pending write here, go to slow path */ + while (pq_is_send_pending()) { int wakeEvents; long sleeptime; - TimestampTz now; + + /* die if timeout was reached */ + WalSndCheckTimeOut(now); + + /* Send keepalive if the time has come */ + WalSndKeepaliveIfNecessary(now); + + sleeptime = WalSndComputeSleeptime(now); + + wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | + WL_SOCKET_WRITEABLE | WL_SOCKET_READABLE | WL_TIMEOUT; + + /* Sleep until something happens or we time out */ + WaitLatchOrSocket(MyLatch, wakeEvents, + MyProcPort->sock, sleeptime, + WAIT_EVENT_WAL_SENDER_WRITE_DATA); /* * Emergency bailout if postmaster has died. This is to avoid the @@ -1205,27 +1230,8 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, if (pq_flush_if_writable()!= 0) WalSndShutdown(); - /* If we finished clearing the buffered data, we're done here. */ - if (!pq_is_send_pending()) - break; - + /* Update the timestamp for the next trial */ now = GetCurrentTimestamp(); - - /* die if timeout was reached */ - WalSndCheckTimeOut(now); - - /* Send keepalive if the time has come */ - WalSndKeepaliveIfNecessary(now); - - sleeptime = WalSndComputeSleeptime(now); - - wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH | - WL_SOCKET_WRITEABLE | WL_SOCKET_READABLE | WL_TIMEOUT; - - /* Sleep until something happens or we time out */ - WaitLatchOrSocket(MyLatch, wakeEvents, - MyProcPort->sock, sleeptime, - WAIT_EVENT_WAL_SENDER_WRITE_DATA); } /* reactivate latch so WalSndLoop knows to continue*/ -- 2.9.2
On Fri, Nov 17, 2017 at 4:35 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > I'm a reviewer of this patch but I think I'm not allowed to mark > this "Ready for Commiter" since the last change is made by me. Yes, it is a better idea to wait for reviews here. -- Michael
Hi, On 17/11/17 08:35, Kyotaro HORIGUCHI wrote: > > Moving around the code allow us to place ps_is_send_pending() in > the while condition, which seems to be more proper place to do > that. I haven't added test for this particular case. > > I tested this that > > - cleanly applies on the current master HEAD and passes make > check and subscription test. > > - walsender properly chooses the slow-path even if > pq_is_send_pending() is always false. (happens on a fast enough > network) > > - walsender waits properly waits on socket and process-reply time > in WaitLatchOrSocket. > > - walsender exits by timeout on network stall. > > So, I think the patch is functionally perfect. > > I'm a reviewer of this patch but I think I'm not allowed to mark > this "Ready for Commiter" since the last change is made by me. > Thanks for working on this, but there are couple of problems with your modifications which mean that it does not actually fix the original issue anymore (transaction taking long time to decode while sending changes over network works fine will result in walsender timout). The firs one is that you put pq_is_send_pending() in the while so the while is again never executed if there is no network send pending which makes the if above meaningless. Also you missed ProcessRepliesIfAny() when moving code around. That's needed for timeout calculations to work correctly. So one more revision attached with those things fixed. This version fixes the original issue as well. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 29 November 2017 at 23:59, Petr Jelinek
wrote:
> Hi,
>
> On 17/11/17 08:35, Kyotaro HORIGUCHI wrote:
> >
> > Moving around the code allow us to place ps_is_send_pending() in
> > the while condition, which seems to be more proper place to do
> > that. I haven't added test for this particular case.
> >
> > I tested this that
> >
> > - cleanly applies on the current master HEAD and passes make
> > check and subscription test.
> >
> > - walsender properly chooses the slow-path even if
> > pq_is_send_pending() is always false. (happens on a fast enough
> > network)
> >
> > - walsender waits properly waits on socket and process-reply time
> > in WaitLatchOrSocket.
> >
> > - walsender exits by timeout on network stall.
> >
> > So, I think the patch is functionally perfect.
> >
> > I'm a reviewer of this patch but I think I'm not allowed to mark
> > this "Ready for Commiter" since the last change is made by me.
> >
>
> Thanks for working on this, but there are couple of problems with your
> modifications which mean that it does not actually fix the original
> issue anymore (transaction taking long time to decode while sending
> changes over network works fine will result in walsender timout).
>
> The firs one is that you put pq_is_send_pending() in the while so the
> while is again never executed if there is no network send pending which
> makes the if above meaningless. Also you missed ProcessRepliesIfAny()
> when moving code around. That's needed for timeout calculations to work
> correctly.
>
> So one more revision attached with those things fixed. This version
> fixes the original issue as well.
>
>
I'm happy with what I see here. Commit message needs tweaking, but any
committer would do that anyway.
To me it looks like it's time to get this committed, marking as such.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Dec 4, 2017 at 10:59 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > To me it looks like it's time to get this committed, marking as such. This version has noticeably more code rearrangement than before, and I'm not sure that is actually buying us anything. Why not keep the changes minimal? Also, TBH, this doesn't seem to have been carefully reviewed for style: - if (!pq_is_send_pending()) - return; + /* Try taking fast path unless we get too close to walsender timeout. */ + if (now < TimestampTzPlusMilliseconds(last_reply_timestamp, + wal_sender_timeout / 2)) + { + if (!pq_is_send_pending()) + return; + } Generally we write if (a && b) { ... } not if (a) { if (b) .. } - } + }; It's hard to understand how this got through review. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 6 December 2017 at 04:07, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Dec 4, 2017 at 10:59 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> To me it looks like it's time to get this committed, marking as such.
This version has noticeably more code rearrangement than before, and
I'm not sure that is actually buying us anything. Why not keep the
changes minimal?
Also, TBH, this doesn't seem to have been carefully reviewed for style:
- if (!pq_is_send_pending())
- return;
+ /* Try taking fast path unless we get too close to walsender timeout. */
+ if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
+ wal_sender_timeout / 2))
+ {
+ if (!pq_is_send_pending())
+ return;
+ }
Generally we write if (a && b) { ... } not if (a) { if (b) .. }
- }
+ };
It's hard to understand how this got through review.
Entirely my fault - I tend to forget to look at code style when I'm focused on functionality. My apologies, and thanks for the reminder.
On 05/12/17 21:07, Robert Haas wrote: > On Mon, Dec 4, 2017 at 10:59 PM, Craig Ringer <craig@2ndquadrant.com> wrote: >> To me it looks like it's time to get this committed, marking as such. > > This version has noticeably more code rearrangement than before, and > I'm not sure that is actually buying us anything. Why not keep the > changes minimal? > Yeah we moved things around in the end, the main reason would be that it actually works closer to how it was originally designed to work. Meaning that the slow path is not so slow when !pq_is_send_pending() which seems to have been the reasoning for original coding. It's not completely necessary to do it for fixing the bug, but why make things slower than they need to be. > Also, TBH, this doesn't seem to have been carefully reviewed for style: > > - if (!pq_is_send_pending()) > - return; > + /* Try taking fast path unless we get too close to walsender timeout. */ > + if (now < TimestampTzPlusMilliseconds(last_reply_timestamp, > + wal_sender_timeout / 2)) > + { > + if (!pq_is_send_pending()) > + return; > + } > > Generally we write if (a && b) { ... } not if (a) { if (b) .. } > It's rather ugly with && because one of the conditions is two line, but okay here you go. I am keeping the brackets even if normally don't for one-liners because it's completely unreadable without them IMHO. > - } > + }; > Oops. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 7 December 2017 at 01:22, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
On 05/12/17 21:07, Robert Haas wrote:
>
> Generally we write if (a && b) { ... } not if (a) { if (b) .. }
>
It's rather ugly with && because one of the conditions is two line, but
okay here you go. I am keeping the brackets even if normally don't for
one-liners because it's completely unreadable without them IMHO.
Yeah, that's why I passed on that FWIW. Sometimes breaking up a condition is nice. Personally I intensely dislike the convention of
if (big_condition
&& big_condition)
&& big_condition)
one_linerdo_something;
as awfully unreadable, but I guess code convention means you live with things you don't like.
On 12/14/2017 01:46 AM, Craig Ringer wrote: > On 7 December 2017 at 01:22, Petr Jelinek > <petr.jelinek@2ndquadrant.com <mailto:petr.jelinek@2ndquadrant.com>> > wrote: > > On 05/12/17 21:07, Robert Haas wrote: > > > > Generally we write if (a && b) { ... } not if (a) { if (b) .. } > > > > It's rather ugly with && because one of the conditions is two > line, but > okay here you go. I am keeping the brackets even if normally don't for > one-liners because it's completely unreadable without them IMHO. > > > > Yeah, that's why I passed on that FWIW. Sometimes breaking up a > condition is nice. Personally I intensely dislike the convention of > > > if (big_condition > && big_condition) > one_linerdo_something; > > > as awfully unreadable, but I guess code convention means you live with > things you don't like. > > > Anyway, I've just hit this bug in the wild for the umpteenth time this > year, and I'd like to know what I can do to help progress it to > commit+backport. > > Ask and ye shall receive. I've just committed it. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 15 December 2017 at 00:36, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:
On 12/14/2017 01:46 AM, Craig Ringer wrote:
> On 7 December 2017 at 01:22, Petr Jelinek
> <petr.jelinek@2ndquadrant.com <mailto:petr.jelinek@2ndquadrant.com>>
> wrote:
>
> On 05/12/17 21:07, Robert Haas wrote:
> >
> > Generally we write if (a && b) { ... } not if (a) { if (b) .. }
> >
>
> It's rather ugly with && because one of the conditions is two
> line, but
> okay here you go. I am keeping the brackets even if normally don't for
> one-liners because it's completely unreadable without them IMHO.
>
>
>
> Yeah, that's why I passed on that FWIW. Sometimes breaking up a
> condition is nice. Personally I intensely dislike the convention of
>
>
> if (big_condition
> && big_condition)
> one_linerdo_something;
>
>
> as awfully unreadable, but I guess code convention means you live with
> things you don't like.
>
>
> Anyway, I've just hit this bug in the wild for the umpteenth time this
> year, and I'd like to know what I can do to help progress it to
> commit+backport.
>
>
Ask and ye shall receive. I've just committed it.
Brilliant, thanks. Backpatched too, great.
Now I'm going to shamelessly point you at the other nasty recurring logical decoding bug while you're fresh from thinking about replication. I can hope, right ;)
https://commitfest.postgresql.org/16/1397/ causes errors or bad data to be decoded, so it's a serious bug.
I'll see if I can rustle up some review attention first though.