Thread: Coding in WalSndWaitForWal
in src/backend/replication/walsender.c, there is the section quoted below. It looks like nothing interesting happens between the GetFlushRecPtr just before the loop starts, and the one inside the loop the first time through the loop. If we want to avoid doing CHECK_FOR_INTERRUPTS(); etc. needlessly, then we should check the result of GetFlushRecPtr and return early if it is sufficiently advanced--before entering the loop. If we don't care, then what is the point of updating it twice with no meaningful action in between? We could just get rid of the section just before the loop starts. The current coding seems confusing, and increases traffic on a potentially busy spin lock.
if (!RecoveryInProgress())
RecentFlushPtr = GetFlushRecPtr();
else
RecentFlushPtr = GetXLogReplayRecPtr(NULL);
for (;;)
{
long sleeptime;
/* Clear any already-pending wakeups */
ResetLatch(MyLatch);
CHECK_FOR_INTERRUPTS();
/* Process any requests or signals received recently */
if (ConfigReloadPending)
{
ConfigReloadPending = false;
ProcessConfigFile(PGC_SIGHUP);
SyncRepInitConfig();
}
/* Check for input from the client */
ProcessRepliesIfAny();
/*
* If we're shutting down, trigger pending WAL to be written out,
* otherwise we'd possibly end up waiting for WAL that never gets
* written, because walwriter has shut down already.
*/
if (got_STOPPING)
XLogBackgroundFlush();
/* Update our idea of the currently flushed position. */
if (!RecoveryInProgress())
RecentFlushPtr = GetFlushRecPtr();
else
RecentFlushPtr = GetXLogReplayRecPtr(NULL);
Cheers,
Jeff
On Sun, Nov 10, 2019 at 5:51 AM Jeff Janes <jeff.janes@gmail.com> wrote: > > in src/backend/replication/walsender.c, there is the section quoted below. It looks like nothing interesting happens betweenthe GetFlushRecPtr just before the loop starts, and the one inside the loop the first time through the loop. If wewant to avoid doing CHECK_FOR_INTERRUPTS(); etc. needlessly, then we should check the result of GetFlushRecPtr and returnearly if it is sufficiently advanced--before entering the loop. If we don't care, then what is the point of updatingit twice with no meaningful action >in between? We could just get rid of the section just before the loop starts. > +1. I also think we should do one of the two things suggested by you. I would prefer earlier as it can save us some processing in some cases when the WAL is flushed in the meantime by WALWriter. BTW, I have noticed that this part of code is same as it was first introduced in below commit: commit 5a991ef8692ed0d170b44958a81a6bd70e90585c Author: Robert Haas <rhaas@postgresql.org> Date: Mon Mar 10 13:50:28 2014 -0400 Allow logical decoding via the walsender interface. .. .. Andres Freund, with contributions from Álvaro Herrera, and further review by me. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sun, Nov 10, 2019 at 10:43:33AM +0530, Amit Kapila wrote: > On Sun, Nov 10, 2019 at 5:51 AM Jeff Janes <jeff.janes@gmail.com> wrote: >> in src/backend/replication/walsender.c, there is the section >> quoted below. It looks like nothing interesting happens between >> the GetFlushRecPtr just before the loop starts, and the one inside >> the loop the first time through the loop. If we want to avoid >> doing CHECK_FOR_INTERRUPTS(); etc. needlessly, then we should >> check the result of GetFlushRecPtr and return early if it is >> sufficiently advanced--before entering the loop. If we don't >> care, then what is the point of updating it twice with no >> meaningful action >in between? We could just get rid of the >> section just before the loop starts. > > +1. I also think we should do one of the two things suggested by you. > I would prefer earlier as it can save us some processing in some cases > when the WAL is flushed in the meantime by WALWriter. So your suggestion would be to call GetFlushRecPtr() before the first check on RecentFlushPtr and before entering the loop? It seems to me that we don't want to do that to avoid any unnecessary spin lock contention if the flush position is sufficiently advanced. Or are you just suggesting to move the first check on RecentFlushPtr within the loop just after resetting the latch but before checking for interrupts? If we were to do some cleanup here, I would just remove the first update of RecentFlushPtr before the loop as per the attached, which is the second suggestion from Jeff. Any thoughts? -- Michael
Attachment
On Mon, Nov 11, 2019 at 7:53 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Sun, Nov 10, 2019 at 10:43:33AM +0530, Amit Kapila wrote: > > On Sun, Nov 10, 2019 at 5:51 AM Jeff Janes <jeff.janes@gmail.com> wrote: > >> in src/backend/replication/walsender.c, there is the section > >> quoted below. It looks like nothing interesting happens between > >> the GetFlushRecPtr just before the loop starts, and the one inside > >> the loop the first time through the loop. If we want to avoid > >> doing CHECK_FOR_INTERRUPTS(); etc. needlessly, then we should > >> check the result of GetFlushRecPtr and return early if it is > >> sufficiently advanced--before entering the loop. If we don't > >> care, then what is the point of updating it twice with no > >> meaningful action >in between? We could just get rid of the > >> section just before the loop starts. > > > > +1. I also think we should do one of the two things suggested by you. > > I would prefer earlier as it can save us some processing in some cases > > when the WAL is flushed in the meantime by WALWriter. > > So your suggestion would be to call GetFlushRecPtr() before the first > check on RecentFlushPtr and before entering the loop? > No. What I meant was to keep the current code as-is and have an additional check on RecentFlushPtr before entering the loop. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 2019-Nov-11, Amit Kapila wrote: > On Mon, Nov 11, 2019 at 7:53 AM Michael Paquier <michael@paquier.xyz> wrote: > > So your suggestion would be to call GetFlushRecPtr() before the first > > check on RecentFlushPtr and before entering the loop? > > No. What I meant was to keep the current code as-is and have an > additional check on RecentFlushPtr before entering the loop. I noticed that the "return" at the bottom of the function does a SetLatch(), but the other returns do not. Isn't that a bug? Also, what's up with those useless returns? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Mon, Nov 11, 2019 at 01:53:40PM -0300, Alvaro Herrera wrote: > On 2019-Nov-11, Amit Kapila wrote: > >> On Mon, Nov 11, 2019 at 7:53 AM Michael Paquier <michael@paquier.xyz> wrote: >>> So your suggestion would be to call GetFlushRecPtr() before the first >>> check on RecentFlushPtr and before entering the loop? >> >> No. What I meant was to keep the current code as-is and have an >> additional check on RecentFlushPtr before entering the loop. Okay, but is that really useful? > I noticed that the "return" at the bottom of the function does a > SetLatch(), but the other returns do not. Isn't that a bug? I don't think that it is necessary to set the latch in the first check as in this case WalSndWaitForWal() would have gone through its loop to set RecentFlushPtr to the last position available already, which would have already set the latch. If you add an extra check based on (loc <= RecentFlushPtr) as your patch does, then you need to set the latch appropriately before returning. Anyway, I don't think that there is any reason to do this extra work at the beginning of the routine before entering the loop. But there is an extra reason not to do that: your patch would prevent more pings to be sent, which means less flush LSN updates. If you think that the extra check makes sense, then I think that the patch should at least clearly document why it is done this way, and why it makes sense to do so. Personally, my take would be to remove the extra call to GetFlushRecPtr() before entering the loop. > Also, what's up with those useless returns? Yes, let's rip them out. -- Michael
Attachment
At Tue, 12 Nov 2019 11:17:26 +0900, Michael Paquier <michael@paquier.xyz> wrote in > On Mon, Nov 11, 2019 at 01:53:40PM -0300, Alvaro Herrera wrote: > > On 2019-Nov-11, Amit Kapila wrote: > > > >> On Mon, Nov 11, 2019 at 7:53 AM Michael Paquier <michael@paquier.xyz> wrote: > >>> So your suggestion would be to call GetFlushRecPtr() before the first > >>> check on RecentFlushPtr and before entering the loop? > >> > >> No. What I meant was to keep the current code as-is and have an > >> additional check on RecentFlushPtr before entering the loop. > > Okay, but is that really useful? > > > I noticed that the "return" at the bottom of the function does a > > SetLatch(), but the other returns do not. Isn't that a bug? > > I don't think that it is necessary to set the latch in the first check > as in this case WalSndWaitForWal() would have gone through its loop to > set RecentFlushPtr to the last position available already, which would > have already set the latch. If you add an extra check based on (loc > <= RecentFlushPtr) as your patch does, then you need to set the > latch appropriately before returning. > > Anyway, I don't think that there is any reason to do this extra work > at the beginning of the routine before entering the loop. But there It seems to me as if it is a fast-path when RecentFlushPtr reached the target location before enterig the loop. It is frequently called in (AFAICS) interruptible loops. On that standpoint I vote +1 for Amit. Or we could shift the stuff of the for loop so that the duplicate code is placed at the beginning. > is an extra reason not to do that: your patch would prevent more pings > to be sent, which means less flush LSN updates. If you think that > the extra check makes sense, then I think that the patch should at > least clearly document why it is done this way, and why it makes > sense to do so. > > Personally, my take would be to remove the extra call to > GetFlushRecPtr() before entering the loop. > > > Also, what's up with those useless returns? > > Yes, let's rip them out. It seems to me that the fast-path seems intentional. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Tue, Nov 12, 2019 at 7:47 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Nov 11, 2019 at 01:53:40PM -0300, Alvaro Herrera wrote: > > On 2019-Nov-11, Amit Kapila wrote: > > > >> On Mon, Nov 11, 2019 at 7:53 AM Michael Paquier <michael@paquier.xyz> wrote: > >>> So your suggestion would be to call GetFlushRecPtr() before the first > >>> check on RecentFlushPtr and before entering the loop? > >> > >> No. What I meant was to keep the current code as-is and have an > >> additional check on RecentFlushPtr before entering the loop. > > Okay, but is that really useful? > I think so. It will be useful in cases where the WAL is already flushed by the WALWriter in the meantime. > > I noticed that the "return" at the bottom of the function does a > > SetLatch(), but the other returns do not. Isn't that a bug? > > I don't think that it is necessary to set the latch in the first check > as in this case WalSndWaitForWal() would have gone through its loop to > set RecentFlushPtr to the last position available already, which would > have already set the latch. If you add an extra check based on (loc > <= RecentFlushPtr) as your patch does, then you need to set the > latch appropriately before returning. > This point makes sense to me. > Anyway, I don't think that there is any reason to do this extra work > at the beginning of the routine before entering the loop. But there > is an extra reason not to do that: your patch would prevent more pings > to be sent, which means less flush LSN updates. If you think that > the extra check makes sense, then I think that the patch should at > least clearly document why it is done this way, and why it makes > sense to do so. > I also think adding a comment there would be good. > > > Also, what's up with those useless returns? > > Yes, let's rip them out. > +1. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Hi, On 2019-11-11 13:53:40 -0300, Alvaro Herrera wrote: > On 2019-Nov-11, Amit Kapila wrote: > > > On Mon, Nov 11, 2019 at 7:53 AM Michael Paquier <michael@paquier.xyz> wrote: > > > > So your suggestion would be to call GetFlushRecPtr() before the first > > > check on RecentFlushPtr and before entering the loop? > > > > No. What I meant was to keep the current code as-is and have an > > additional check on RecentFlushPtr before entering the loop. > > I noticed that the "return" at the bottom of the function does a > SetLatch(), but the other returns do not. Isn't that a bug? I don't think it is - We never reset the latch in that case. I don't see what we'd gain from setting it explicitly, other than unnecessarily performing more work? > /* > * Fast path to avoid acquiring the spinlock in case we already know we > * have enough WAL available. This is particularly interesting if we're > * far behind. > */ > if (RecentFlushPtr != InvalidXLogRecPtr && > loc <= RecentFlushPtr) > + { > + SetLatch(MyLatch); > return RecentFlushPtr; > + } I.e. let's not do this. > /* Get a more recent flush pointer. */ > if (!RecoveryInProgress()) > RecentFlushPtr = GetFlushRecPtr(); > else > RecentFlushPtr = GetXLogReplayRecPtr(NULL); > > + if (loc <= RecentFlushPtr) > + { > + SetLatch(MyLatch); > + return RecentFlushPtr; > + } Hm. I'm doubtful this is a good idea - it essentially means we'd not check for interrupts, protocol replies, etc. for an unbounded amount of time. Whereas the existing fast-path does so for a bounded - although not necessarily short - amount of time. It seems to me it'd be better to just remove the "get a more recent flush pointer" block - it doesn't seem to currently surve a meaningful purpose. > for (;;) > { > long sleeptime; > > /* Clear any already-pending wakeups */ > ResetLatch(MyLatch); > > @@ -2267,15 +2276,14 @@ WalSndLoop(WalSndSendDataCallback send_data) > > /* Sleep until something happens or we time out */ > (void) WaitLatchOrSocket(MyLatch, wakeEvents, > MyProcPort->sock, sleeptime, > WAIT_EVENT_WAL_SENDER_MAIN); > } > } > - return; > } Having dug into history, the reason this exists is that there used to be the following below the return: - -send_failure: - - /* - * Get here on send failure. Clean up and exit. - * - * Reset whereToSendOutput to prevent ereport from attempting to send any - * more messages to the standby. - */ - if (whereToSendOutput == DestRemote) - whereToSendOutput = DestNone; - - proc_exit(0); - abort(); /* keep the compiler quiet */ but when 5a991ef8692ed (Allow logical decoding via the walsender interface) moved the shutdown code into its own function, WalSndShutdown(), we left the returns in place. We still have the curious proc_exit(0); abort(); /* keep the compiler quiet */ pattern in WalSndShutdown() - wouldn't the right approach to instead proc_exit() with pg_attribute_noreturn()? Greetings, Andres Freund
On Tue, Nov 12, 2019 at 11:27:16AM -0800, Andres Freund wrote: > It seems to me it'd be better to just remove the "get a more recent > flush pointer" block - it doesn't seem to currently surve a meaningful > purpose. +1. That was actually my suggestion upthread :) -- Michael
Attachment
Ah, my stupid. At Wed, 13 Nov 2019 16:34:49 +0900, Michael Paquier <michael@paquier.xyz> wrote in > On Tue, Nov 12, 2019 at 11:27:16AM -0800, Andres Freund wrote: > > It seems to me it'd be better to just remove the "get a more recent > > flush pointer" block - it doesn't seem to currently surve a meaningful > > purpose. > > +1. That was actually my suggestion upthread :) Actually it is useless as it is. But the code still seems to me an incomplete fast path (that lacks immediate return after it) for the case where just one call to GetFlushRecPtr advances RecentFlushPtr is enough. However, I'm not confident taht removing the (intended) fast path impacts perforance significantly. So I don't object to remove it. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, Nov 13, 2019 at 12:57 AM Andres Freund <andres@anarazel.de> wrote: > > On 2019-11-11 13:53:40 -0300, Alvaro Herrera wrote: > > > /* Get a more recent flush pointer. */ > > if (!RecoveryInProgress()) > > RecentFlushPtr = GetFlushRecPtr(); > > else > > RecentFlushPtr = GetXLogReplayRecPtr(NULL); > > > > + if (loc <= RecentFlushPtr) > > + { > > + SetLatch(MyLatch); > > + return RecentFlushPtr; > > + } > > Hm. I'm doubtful this is a good idea - it essentially means we'd not > check for interrupts, protocol replies, etc. for an unbounded amount of > time. > I think this function (WalSndWaitForWal) will be called from WalSndLoop which checks for interrupts and protocol replies, so it might not miss checking those things in that context. In which case it will miss to check those things for an unbounded amount of time? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
At Wed, 13 Nov 2019 14:21:13 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > On Wed, Nov 13, 2019 at 12:57 AM Andres Freund <andres@anarazel.de> wrote: > > > > On 2019-11-11 13:53:40 -0300, Alvaro Herrera wrote: > > > > > /* Get a more recent flush pointer. */ > > > if (!RecoveryInProgress()) > > > RecentFlushPtr = GetFlushRecPtr(); > > > else > > > RecentFlushPtr = GetXLogReplayRecPtr(NULL); > > > > > > + if (loc <= RecentFlushPtr) > > > + { > > > + SetLatch(MyLatch); > > > + return RecentFlushPtr; > > > + } > > > > Hm. I'm doubtful this is a good idea - it essentially means we'd not > > check for interrupts, protocol replies, etc. for an unbounded amount of > > time. > > > > I think this function (WalSndWaitForWal) will be called from > WalSndLoop which checks for interrupts and protocol replies, so it > might not miss checking those things in that context. In which case > it will miss to check those things for an unbounded amount of time? I think so for the first part, but I'm not sure for the second. But it should be avoided if it can be happen. # the walreader's callback structure makes such things less clear :p I remember that there was a fixed bug that logical replication code fails to send a reply for a longer time than timeout on a very fast connection, running through a fast path without checking the need for reply. I couldn't find where it is, though.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2019-Nov-12, Andres Freund wrote: > We still have the curious > proc_exit(0); > abort(); /* keep the compiler quiet */ > > pattern in WalSndShutdown() - wouldn't the right approach to instead > proc_exit() with pg_attribute_noreturn()? proc_exit() is already marked noreturn ... and has been marked as such since commit eeece9e60984 (2012), which is the same that added abort() after some proc_exit calls as well as other routines that were already marked noreturn, such as WalSenderMain(). However, back then we were using the GCC-specific notation of __attribute__((noreturn)), so perhaps the reason we kept the abort() (and a few breaks, etc) after proc_exit() was to satisfy compilers other than GCC. In modern times, we define pg_attribute_noreturn() like this: /* GCC, Sunpro and XLC support aligned, packed and noreturn */ #if defined(__GNUC__) || defined(__SUNPRO_C) || defined(__IBMC__) #define pg_attribute_noreturn() __attribute__((noreturn)) #define HAVE_PG_ATTRIBUTE_NORETURN 1 #else #define pg_attribute_noreturn() #endif I suppose this will cause warnings in compilers other than those, but I'm not sure if we care. What about MSVC for example? With the attached patch, everything compiles cleanly in my setup, no warnings, but then it's GCC. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > In modern times, we define pg_attribute_noreturn() like this: > /* GCC, Sunpro and XLC support aligned, packed and noreturn */ > #if defined(__GNUC__) || defined(__SUNPRO_C) || defined(__IBMC__) > #define pg_attribute_noreturn() __attribute__((noreturn)) > #define HAVE_PG_ATTRIBUTE_NORETURN 1 > #else > #define pg_attribute_noreturn() > #endif > I suppose this will cause warnings in compilers other than those, but > I'm not sure if we care. What about MSVC for example? Yeah, the lack of coverage for MSVC seems like the main reason not to assume this works "everywhere of interest". > With the attached patch, everything compiles cleanly in my setup, no > warnings, but then it's GCC. Meh ... I'm not really convinced that any of those changes are improvements. Particularly not the removals of switch-case breaks. regards, tom lane
On 2020-Jan-09, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > In modern times, we define pg_attribute_noreturn() like this: > > > /* GCC, Sunpro and XLC support aligned, packed and noreturn */ > > #if defined(__GNUC__) || defined(__SUNPRO_C) || defined(__IBMC__) > > #define pg_attribute_noreturn() __attribute__((noreturn)) > > #define HAVE_PG_ATTRIBUTE_NORETURN 1 > > #else > > #define pg_attribute_noreturn() > > #endif > > > I suppose this will cause warnings in compilers other than those, but > > I'm not sure if we care. What about MSVC for example? > > Yeah, the lack of coverage for MSVC seems like the main reason not > to assume this works "everywhere of interest". That would easy to add as __declspec(noreturn) ... except that in MSVC the decoration goes *before* the prototype rather after it, so this seems difficult to achieve without invasive surgery. https://docs.microsoft.com/en-us/cpp/cpp/noreturn?view=vs-2015 > > With the attached patch, everything compiles cleanly in my setup, no > > warnings, but then it's GCC. > > Meh ... I'm not really convinced that any of those changes are > improvements. Particularly not the removals of switch-case breaks. However, we already have a large number of proc_exit() calls in switch blocks that are not followed by breaks. In fact, the majority are already like that. I can easily leave this well enough alone, though. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > However, we already have a large number of proc_exit() calls in switch > blocks that are not followed by breaks. In fact, the majority are > already like that. Oh, hmm ... consistency is good. regards, tom lane