Re: Reducing WaitEventSet syscall churn - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Reducing WaitEventSet syscall churn
Date
Msg-id 20200310.081924.1757484278318516911.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Reducing WaitEventSet syscall churn  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
Hello.

I looked this.

At Thu, 27 Feb 2020 12:17:45 +1300, Thomas Munro <thomas.munro@gmail.com> wrote in 
> On Sat, Feb 8, 2020 at 10:15 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > > > Here are some patches to get rid of frequent system calls.
> 
> Here's a new version of this patch set.  It gets rid of all temporary
> WaitEventSets except a couple I mentioned in another thread[1].
> WaitLatch() uses CommonWaitSet, and calls to WaitLatchOrSocket() are
> replaced by either the existing FeBeWaitSet (walsender, gssapi/openssl
> auth are also candidates) or a special purpose long lived WaitEventSet
> (replication, postgres_fdw, pgstats).  It passes make check-world with
> WAIT_USE_POLL, WAIT_USE_KQUEUE, WAIT_USE_EPOLL, all with and without
> -DEXEC_BACKEND, and make check with WAIT_USE_WIN32 (Appveyor).
> 
> 0001: "Don't use EV_CLEAR for kqueue events."
> 
> This fixes a problem in the kqueue implementation that only shows up
> once you switch to long lived WaitEventSets.  It needs to be
> level-triggered like the other implementations, for example because
> there's a place in the recovery tests where we wait twice in a row
> without trying to do I/O in between.  (This is a bit like commit
> 3b790d256f8 that fixed a similar problem on Windows.)

It looks fine in the light of document of kqueue.

> 0002: "Use a long lived WaitEventSet for WaitLatch()."
> 
> In the last version, I had a new function WaitMyLatch(), but that
> didn't help with recoveryWakeupLatch.  Switching between latches
> doesn't require a syscall, so I didn't want to have a separate WES and
> function just for that.  So I went back to using plain old
> WaitLatch(), and made it "modify" the latch every time before waiting
> on CommonWaitSet.  An alternative would be to get rid of the concept
> of latches other than MyLatch, and change the function to
> WaitMyLatch().  A similar thing happens for exit_on_postmaster_death,
> for which I didn't want to have a separate WES, so I just set that
> flag every time.  Thoughts?

It is surely an improvement from that we create a full-fledged WES
every time. The name CommonWaitSet gives an impression as if it is
used for a variety of waitevent sets, but it is used only by
WaitLatch. So I would name it LatchWaitSet. With that name I won't be
surprised by that the function is pointing WL_LATCH_SET by index 0
without any explanation when calling ModifyWaitSet.

@@ -700,7 +739,11 @@ FreeWaitEventSet(WaitEventSet *set)
     ReleaseExternalFD();
 #elif defined(WAIT_USE_KQUEUE)
     close(set->kqueue_fd);
-    ReleaseExternalFD();
+    if (set->kqueue_fd >= 0)
+    {
+        close(set->kqueue_fd);
+        ReleaseExternalFD();
+    }

Did you forget to remove the close() outside the if block?
Don't we need the same amendment for epoll_fd with kqueue_fd?

WaitLatch is defined as "If the latch is already set (and WL_LATCH_SET
is given), the function returns immediately.". But now the function
reacts to latch even if WL_LATCH_SET is not set. I think actually it
is alwys set so I think we need to modify Assert and function comment
following the change.

> 0003: "Use regular WaitLatch() for condition variables."
> 
> That mechanism doesn't need its own WES anymore.

Looks fine.

> 0004: "Introduce RemoveWaitEvent()."
> 
> We'll need that to be able to handle connections that are closed and
> reopened under the covers by libpq (replication, postgres_fdw).  We
> also wanted this on a couple of other threads for multiplexing FDWs,
> to be able to adjust the wait set dynamically for a proposed async
> Append feature.
> 
> The implementation is a little naive, and leaves holes in the
> "pollfds" and "handles" arrays (for poll() and win32 implementations).
> That could be improved with a bit more footwork in a later patch.
> 
> XXX The Windows version is based on reading documentation.  I'd be
> very interested to know if check-world passes (especially
> contrib/postgres_fdw and src/test/recovery).  Unfortunately my
> appveyor.yml fu is not yet strong enough.

I didn't find the documentation about INVALID_HANDLE_VALUE in
lpHandles. Could you give me the URL for that?

I didn't run recoverycheck because because I couldn't install IPC::Run
for ActivePerl.. But contribcheck succeeded.

+    for (int i = 0; i < nevents; ++i)
+        set->handles[i + 1] = INVALID_HANDLE_VALUE;

It accesses set->handles[nevents], which is overrun.

+    /* Set up the free list. */
+    for (int i = 0; i < nevents; ++i)
+        set->events[i].next_free = i + 1;
+    set->events[nevents - 1].next_free = -1;

It sets the last element twice. (harmless but useless).

     set->handles = (HANDLE) data;
     data += MAXALIGN(sizeof(HANDLE) * nevents);

It is not an issue of thie patch, but does hadles need nevents + 1
elements?

WaitEventSetSize is not checking its parameter range.

I'l continue reviewing in later mail.

> 0005: "libpq: Add PQsocketChangeCount to advertise socket changes."
.... 

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: James Coleman
Date:
Subject: Re: Nicer error when connecting to standby with hot_standby=off
Next
From: Thomas Munro
Date:
Subject: Re: effective_io_concurrency's steampunk spindle maths