Re: Reducing WaitEventSet syscall churn - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Reducing WaitEventSet syscall churn
Date
Msg-id CA+hUKG+=sEsxM847hkDM37CtXZbAqi+Q6un2uUpPPkhuKmAHzQ@mail.gmail.com
Whole thread Raw
In response to Re: Reducing WaitEventSet syscall churn  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Reducing WaitEventSet syscall churn
List pgsql-hackers
On Sun, Jul 12, 2020 at 7:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> [...complaints about 0005 and 0006...] We already have
> PGEVT_CONNRESET and PGEVT_CONNDESTROY as application-visible connection
> state change events, and I don't see why those aren't sufficient.

I think Horiguchi-san's general idea of using event callbacks for this
sounds much more promising than my earlier idea of exposing a change
counter (that really was terrible).  If it can be done with existing
events then that's even better.  Perhaps he and/or I can look into
that for the next CF.

In the meantime, here's a rebase of the more straightforward patches
in the stack.  These are the ones that deal only with fixed sets of
file descriptors, and they survive check-world on Linux,
Linux+EXEC_BACKEND (with ASLR disabled) and FreeBSD, and at least
check on macOS and Windows (my CI recipes need more work to get
check-world working on those two).  There's one user-visible change
that I'd appreciate feedback on: I propose to drop the FATAL error
when the postmaster goes away, to make things more consistent.  See
below for more on that.

Responding to earlier review from Horiguchi-san:

On Tue, Mar 10, 2020 at 12:20 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> > 0002: "Use a long lived WaitEventSet for WaitLatch()."
> >
> > In the last version, I had a new function WaitMyLatch(), but that
> > didn't help with recoveryWakeupLatch.  Switching between latches
> > doesn't require a syscall, so I didn't want to have a separate WES and
> > function just for that.  So I went back to using plain old
> > WaitLatch(), and made it "modify" the latch every time before waiting
> > on CommonWaitSet.  An alternative would be to get rid of the concept
> > of latches other than MyLatch, and change the function to
> > WaitMyLatch().  A similar thing happens for exit_on_postmaster_death,
> > for which I didn't want to have a separate WES, so I just set that
> > flag every time.  Thoughts?
>
> It is surely an improvement from that we create a full-fledged WES
> every time. The name CommonWaitSet gives an impression as if it is
> used for a variety of waitevent sets, but it is used only by
> WaitLatch. So I would name it LatchWaitSet. With that name I won't be
> surprised by that the function is pointing WL_LATCH_SET by index 0
> without any explanation when calling ModifyWaitSet.

Ok, I changed it to LatchWaitSet.  I also replaced the index 0 with a
symbolic name LatchWaitSetLatchPos, to make that clearer.

> @@ -700,7 +739,11 @@ FreeWaitEventSet(WaitEventSet *set)
>         ReleaseExternalFD();
>  #elif defined(WAIT_USE_KQUEUE)
>         close(set->kqueue_fd);
> -       ReleaseExternalFD();
> +       if (set->kqueue_fd >= 0)
> +       {
> +               close(set->kqueue_fd);
> +               ReleaseExternalFD();
> +       }
>
> Did you forget to remove the close() outside the if block?
> Don't we need the same amendment for epoll_fd with kqueue_fd?

Hmm, maybe I screwed that up when resolving a conflict with the
ReleaseExternalFD() stuff.  Fixed.

> WaitLatch is defined as "If the latch is already set (and WL_LATCH_SET
> is given), the function returns immediately.". But now the function
> reacts to latch even if WL_LATCH_SET is not set. I think actually it
> is alwys set so I think we need to modify Assert and function comment
> following the change.

It seems a bit silly to call WaitLatch() if you don't want to wait for
a latch, but I think we can keep that comment and logic by assigning
set->latch = NULL when you wait without WL_LATCH_SET.  I tried that in
the attached.

> > 0004: "Introduce RemoveWaitEvent()."
> >
> > We'll need that to be able to handle connections that are closed and
> > reopened under the covers by libpq (replication, postgres_fdw).  We
> > also wanted this on a couple of other threads for multiplexing FDWs,
> > to be able to adjust the wait set dynamically for a proposed async
> > Append feature.
> >
> > The implementation is a little naive, and leaves holes in the
> > "pollfds" and "handles" arrays (for poll() and win32 implementations).
> > That could be improved with a bit more footwork in a later patch.
> >
> > XXX The Windows version is based on reading documentation.  I'd be
> > very interested to know if check-world passes (especially
> > contrib/postgres_fdw and src/test/recovery).  Unfortunately my
> > appveyor.yml fu is not yet strong enough.
>
> I didn't find the documentation about INVALID_HANDLE_VALUE in
> lpHandles. Could you give me the URL for that?

I was looking for how you do the equivalent of Unix file descriptor -1
in a call to poll(), and somewhere I read that INVALID_HANDLE_VALUE
has the right effect.  I can't find that reference now.  Apparently it
works because that's the pseudo-handle value -1 that is returned by
GetCurrentProcess(), and waiting for your own process waits forever so
it's a suitable value for holes in an array of event handles.  We
should probably call GetCurrentProcess() instead, but really that is
just stand-in code: we should rewrite it so that we don't need holes!
That might require a second array for use by the poll and win32
implementations.  Let's return to that in a later CF?

On Mon, Mar 30, 2020 at 6:15 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> > 0008: "Use WL_EXIT_ON_PM_DEATH in FeBeWaitSet."
> >
> > The FATAL message you get if you happen to be waiting for IO rather
> > than waiting somewhere else seems arbitrarily different.  By switching
> > to a standard automatic exit, it opens the possibility of using
> > FeBeWaitSet in a couple more places that would otherwise need to
> > create their own WES (see also [1]).  Thoughts?
>
> Don't we really need any message on backend disconnection due to
> postmaster death?  As for authentication code, it seems to me the
> rationale for not writing the log is that the connection has not been
> established at the time. (That depends on what we think the
> "connection" is.)

To me, it looks like this variation is just from a time when
postmaster death handling was less standardised.  The comment (removed
by this patch) even introduces the topic of postmaster exit as if
you've never heard of it, in this arbitrary location in the tree, one
wait point among many (admittedly one that is often reached).  I don't
think there is any good reason for random timing to affect whether or
not you get a hand crafted FATAL message.

However, if others don't like this change, we could drop this patch
and still use FeBeWaitSet for walsender.c.  It'd just need to handle
postmaster death explicitly.  It felt weird to be adding new code that
has to handle postmaster death explicitly, which is what led me to
notice that the existing FeBeWaitSet coding (and resulting user
experience) is different from other code.

> > 0009: "Use FeBeWaitSet for walsender.c."
> >
> > Enabled by 0008.
>
> It works and doesn't change behavior. But I found it a bit difficult
> to find what events the WaitEventSetWait waits.  Maybe a comment at
> the caller sites would be sufficient. I think any comment about the
> bare number "0" as the event position of ModifyWaitEvent is also
> needed.

Agreed.  I changed it to use symbolic names
FeBeWaitSet{Socket,Latch}Pos in the new code and also in the
pre-existing code like this.

> By the way I found that pqcomm.c uses -1 instead of PGINVALID_SOCKET
> for AddWaitEventToSet.

Oh yeah, that's a pre-existing problem.  Fixed, since that code was
changed by the patch anyway.

Attachment

pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Improvements in Copy From
Next
From: Peter Geoghegan
Date:
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)