Re: Reducing WaitEventSet syscall churn - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Reducing WaitEventSet syscall churn
Date
Msg-id CA+hUKG+28mbtzihc9zhVp_BOB9v0p1OFos3COxhtFzvxRvN3pg@mail.gmail.com
Whole thread Raw
In response to Re: Reducing WaitEventSet syscall churn  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Reducing WaitEventSet syscall churn
Re: Reducing WaitEventSet syscall churn
Re: Reducing WaitEventSet syscall churn
List pgsql-hackers
On Sat, Feb 8, 2020 at 10:15 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > > Here are some patches to get rid of frequent system calls.

Here's a new version of this patch set.  It gets rid of all temporary
WaitEventSets except a couple I mentioned in another thread[1].
WaitLatch() uses CommonWaitSet, and calls to WaitLatchOrSocket() are
replaced by either the existing FeBeWaitSet (walsender, gssapi/openssl
auth are also candidates) or a special purpose long lived WaitEventSet
(replication, postgres_fdw, pgstats).  It passes make check-world with
WAIT_USE_POLL, WAIT_USE_KQUEUE, WAIT_USE_EPOLL, all with and without
-DEXEC_BACKEND, and make check with WAIT_USE_WIN32 (Appveyor).

0001: "Don't use EV_CLEAR for kqueue events."

This fixes a problem in the kqueue implementation that only shows up
once you switch to long lived WaitEventSets.  It needs to be
level-triggered like the other implementations, for example because
there's a place in the recovery tests where we wait twice in a row
without trying to do I/O in between.  (This is a bit like commit
3b790d256f8 that fixed a similar problem on Windows.)

0002: "Use a long lived WaitEventSet for WaitLatch()."

In the last version, I had a new function WaitMyLatch(), but that
didn't help with recoveryWakeupLatch.  Switching between latches
doesn't require a syscall, so I didn't want to have a separate WES and
function just for that.  So I went back to using plain old
WaitLatch(), and made it "modify" the latch every time before waiting
on CommonWaitSet.  An alternative would be to get rid of the concept
of latches other than MyLatch, and change the function to
WaitMyLatch().  A similar thing happens for exit_on_postmaster_death,
for which I didn't want to have a separate WES, so I just set that
flag every time.  Thoughts?

0003: "Use regular WaitLatch() for condition variables."

That mechanism doesn't need its own WES anymore.

0004: "Introduce RemoveWaitEvent()."

We'll need that to be able to handle connections that are closed and
reopened under the covers by libpq (replication, postgres_fdw).  We
also wanted this on a couple of other threads for multiplexing FDWs,
to be able to adjust the wait set dynamically for a proposed async
Append feature.

The implementation is a little naive, and leaves holes in the
"pollfds" and "handles" arrays (for poll() and win32 implementations).
That could be improved with a bit more footwork in a later patch.

XXX The Windows version is based on reading documentation.  I'd be
very interested to know if check-world passes (especially
contrib/postgres_fdw and src/test/recovery).  Unfortunately my
appveyor.yml fu is not yet strong enough.

0005: "libpq: Add PQsocketChangeCount to advertise socket changes."

To support a long lived WES, libpq needs a way tell us when the socket
changes underneath our feet.  This is the simplest thing I could think
of; better ideas welcome.

0006: "Reuse a WaitEventSet in libpqwalreceiver.c."

Rather than having all users of libpqwalreceiver.c deal with the
complicated details of wait set management, have libpqwalreceiver
expose a waiting interface that understands socket changes.

Unfortunately, I couldn't figure out how to use CommonWaitSet for this
(ie adding and removing sockets to that as required), due to
complications with the bookkeeping required to provide the fd_closed
flag to RemoveWaitEvent().  So it creates its own internal long lived
WaitEventSet.

0007: "Use a WaitEventSet for postgres_fdw."

Create a single WaitEventSet and use it for all FDW connections.  By
having our own dedicated WES, we can do the bookkeeping required to
know when sockets have been closed or need to removed from kernel wait
sets explicitly (which would be much harder to achieve if
CommonWaitSet were to be used like that; you need to know when sockets
are closed by other code, so you can provide fd_closed to
RemoveWaitEvent()).

Concretely, if you use just one postgres_fdw connection, you'll see
just epoll_wait()/kevent() calls for waits, but whever you switch
between different connections, you'll see eg EPOLL_DEL/EV_DELETE
followed by EPOLL_ADD/EV_ADD when the set is adjusted (in the kqueue
implementation these could be collapse into the following wait, but I
haven't done the work for that).  An alternative would be to have one
WES per FDW connection, but that seemed wasteful of file descriptors.

0008: "Use WL_EXIT_ON_PM_DEATH in FeBeWaitSet."

The FATAL message you get if you happen to be waiting for IO rather
than waiting somewhere else seems arbitrarily different.  By switching
to a standard automatic exit, it opens the possibility of using
FeBeWaitSet in a couple more places that would otherwise need to
create their own WES (see also [1]).  Thoughts?

0009: "Use FeBeWaitSet for walsender.c."

Enabled by 0008.

0010: "Introduce a WaitEventSet for the stats collector."

[1] https://www.postgresql.org/message-id/flat/CA%2BhUKGK%3Dm9dLrq42oWQ4XfK9iDjGiZVwpQ1HkHrAPfG7Kh681g%40mail.gmail.com

Attachment

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Quoting issues with createdb
Next
From: Thomas Munro
Date:
Subject: Re: Shouldn't GSSAPI and SSL code use FeBeWaitSet?