Thread: modifying WaitEventSets (was: Performance degradation in commit ac1d794)

modifying WaitEventSets (was: Performance degradation in commit ac1d794)

From
Robert Haas
Date:
On Sun, Mar 20, 2016 at 8:31 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> One thing that I want to do but can't with this interface is remove an
> fd from the set.  I can AddWaitEventToSet returning a position, and I
> can ModifyWait to provide new event mask by position including zero
> mask, I can't actually remove the fd (for example to avoid future
> error events that can't be masked, or to allow that fd to be closed
> and perhaps allow that fd number to coincidentally be readded later,
> and just generally to free the slot).  There is an underlying way to
> remove an fd from a set with poll (sort of), epoll, kqueue.  (Not sure
> about Windows.  But surely...).  I wonder if there should be
> RemoveWaitEventFromSet(set, position) which recycles event slots,
> sticking them on a freelist (and setting corresponding pollfd structs'
> fd to -1).

In practice, you're not likely to want to completely forget about a
file descriptor and wait for some totally new file descriptor, or at
least not very often.  You're much more likely to have a set of fds
that you care about, but the events that you care about for each one
vary over time, and it may often by the empty set for some of those
fds.  For example, in the case of Append -> {Foreign Scan x many}, you
will have an fd for each foreign scan.  When you dispatch a query to a
particular server, you're now interested in whether that fd is
ready-ready (or, if SSL is involved, maybe sometimes you care about
write-ready instead) until you get back all the query results from
that server.  At that point, you no longer care about events on that
fd until you dispatch the next query - and then you again do.

So what's the best API for that?  One idea is to change
ModifyWaitEvent to accept events = 0 instead of failing an assertion
inside WaitEventAdjustEpoll.  We don't want to wait for EPOLLERR |
EPOLLHUP in that case since we'd have to wait event to return if one
of those things occurred.  It would be easy enough to rejigger that
code so that we pass epoll_ev.events as 0 in that case, but I think it
doesn't help: epoll_ctl(2) says epoll_wait(2) always waits for those
events.  Instead I think we'd have to use EPOLL_CTL_DEL in that case,
which is a problem: when the user next calls ModifyWaitEvent, we would
need to use EPOLL_CTL_ADD rather than EPOLL_CTL_MOD, but how would we
know that?  We'd have to add state for that somewhere.

Alternatively, we could implement RemoveWaitEventFromSet(set,
position), as you propose.  That, too, needs some kind of additional
state, because we've got to track which positions are unused.  I'd be
inclined to guess that a bitmap would be better than a linked list;
for the storage space of one pointer, we could handle all
WaitEventSets with nevents <= 64, which is hard to beat.

Yet another idea is to have a new event WL_SOCKET_ERROR which waits
for only EPOLLERR | EPOLLHUP.  So, if you don't care about the socket
being readable or writeable at the moment, but you still vaguely care
about the file descriptor, you can do ModifyWaitEvent(set, pos,
WL_SOCKET_ERROR, NULL).  That's actually kind of nice, because now you
can still detect error/eof conditions on sockets that you are not
otherwise waiting for, and report the errors promptly.  At the moment,
I'm inclined to think this might be the best option...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Hi,

On 2016-05-03 23:09:28 -0400, Robert Haas wrote:
> So what's the best API for that?  One idea is to change
> ModifyWaitEvent to accept events = 0 instead of failing an assertion
> inside WaitEventAdjustEpoll.  We don't want to wait for EPOLLERR |
> EPOLLHUP in that case since we'd have to wait event to return if one
> of those things occurred.  It would be easy enough to rejigger that
> code so that we pass epoll_ev.events as 0 in that case, but I think it
> doesn't help: epoll_ctl(2) says epoll_wait(2) always waits for those
> events.  Instead I think we'd have to use EPOLL_CTL_DEL in that case,
> which is a problem: when the user next calls ModifyWaitEvent, we would
> need to use EPOLL_CTL_ADD rather than EPOLL_CTL_MOD, but how would we
> know that?  We'd have to add state for that somewhere.

Right.


> Yet another idea is to have a new event WL_SOCKET_ERROR which waits
> for only EPOLLERR | EPOLLHUP.  So, if you don't care about the socket
> being readable or writeable at the moment, but you still vaguely care
> about the file descriptor, you can do ModifyWaitEvent(set, pos,
> WL_SOCKET_ERROR, NULL).  That's actually kind of nice, because now you
> can still detect error/eof conditions on sockets that you are not
> otherwise waiting for, and report the errors promptly.  At the moment,
> I'm inclined to think this might be the best option...

I generally agree that this is the best option, but there's a noticeable
catch: I'm afraid select() doesn't really generally support this kind of
operation; all the other providers do.

But after trawling the buildfarm logs, that isn't necessarily a problem:
It appears that we haven't had a single !windows animal reporting a
successfull configure run without HAVE_POLL support this year going by
the buildfarm status logs. Running for a longer period now, but it'll
take a long while.

It appears that OSX with 10.3 was one of the latest to introduce poll()
support.

Given that poll() has been introduced in SRV3 - which IIRC was below our
usual baseline - and windows is not an issue for latch, I think it'd
be ok to rely on it.

Greetings,

Andres Freund



Andres Freund <andres@anarazel.de> writes:
> Given that poll() has been introduced in SRV3 - which IIRC was below our
> usual baseline - and windows is not an issue for latch, I think it'd
> be ok to rely on it.

I think it's entirely reasonable to say that "if you want high performance
you should have poll(3)".  Failing to build without it would be a harder
sell, probably.
        regards, tom lane



Re: modifying WaitEventSets (was: Performance degradation in commit ac1d794)

From
Merlin Moncure
Date:
On Wed, May 4, 2016 at 2:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
>> Given that poll() has been introduced in SRV3 - which IIRC was below our
>> usual baseline - and windows is not an issue for latch, I think it'd
>> be ok to rely on it.
>
> I think it's entirely reasonable to say that "if you want high performance
> you should have poll(3)".  Failing to build without it would be a harder
> sell, probably.

There are some decent cross platform libraries that expose high
performance polling.  In particular, libev.
http://software.schmorp.de/pkg/libev.html.  I would definitely look
there before contemplating direct epoll calls.  (note, aside from
seeing epoll, I haven't been following this thread in detail).

merlin



I wrote:
> Andres Freund <andres@anarazel.de> writes:
>> Given that poll() has been introduced in SRV3 - which IIRC was below our
>> usual baseline - and windows is not an issue for latch, I think it'd
>> be ok to rely on it.

> I think it's entirely reasonable to say that "if you want high performance
> you should have poll(3)".  Failing to build without it would be a harder
> sell, probably.

Hmm ... wait, I take that back.  poll() is required by SUS v2, which has
been our minimum baseline spec for a long time (even my pet dinosaur HPUX
has it).  As long as we have an answer for Windows, it's hard to argue
we can't require poll() elsewhere.
        regards, tom lane



On Wed, May 4, 2016 at 3:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> Andres Freund <andres@anarazel.de> writes:
>>> Given that poll() has been introduced in SRV3 - which IIRC was below our
>>> usual baseline - and windows is not an issue for latch, I think it'd
>>> be ok to rely on it.
>
>> I think it's entirely reasonable to say that "if you want high performance
>> you should have poll(3)".  Failing to build without it would be a harder
>> sell, probably.
>
> Hmm ... wait, I take that back.  poll() is required by SUS v2, which has
> been our minimum baseline spec for a long time (even my pet dinosaur HPUX
> has it).  As long as we have an answer for Windows, it's hard to argue
> we can't require poll() elsewhere.

I don't think we'd necessarily need to completely de-support people
who still depend on select().  We'd just need to say, well,
WL_SOCKET_ERROR *may* report exceptional events on the socket, or it
may not, depending on how modern your platform is.  In the use cases I
foresee, that would occasionally result in less-timely detection of
FDW connection loss, but nothing worse.  I'm not prepared to get very
excited about that.

But if we are confident that everything supports poll() and it's
always better than select(), another, possibly superior option is to
remove support for select() and see if anything breaks.  If not, then
we only need to support three platform-specific implementations
instead of four, which I would find it difficult to complain about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, May 4, 2016 at 3:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hmm ... wait, I take that back.  poll() is required by SUS v2, which has
>> been our minimum baseline spec for a long time (even my pet dinosaur HPUX
>> has it).  As long as we have an answer for Windows, it's hard to argue
>> we can't require poll() elsewhere.

> I don't think we'd necessarily need to completely de-support people
> who still depend on select().  We'd just need to say, well,
> WL_SOCKET_ERROR *may* report exceptional events on the socket, or it
> may not, depending on how modern your platform is.  In the use cases I
> foresee, that would occasionally result in less-timely detection of
> FDW connection loss, but nothing worse.  I'm not prepared to get very
> excited about that.

I'm not either, but ...

> But if we are confident that everything supports poll() and it's
> always better than select(), another, possibly superior option is to
> remove support for select() and see if anything breaks.  If not, then
> we only need to support three platform-specific implementations
> instead of four, which I would find it difficult to complain about.

... the evidence available suggests that the select() code path has
probably received zero buildfarm testing.  Do we really want to ship
a fourth implementation that we can't even vouch for?
        regards, tom lane



On Wed, May 4, 2016 at 3:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, May 4, 2016 at 3:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Hmm ... wait, I take that back.  poll() is required by SUS v2, which has
>>> been our minimum baseline spec for a long time (even my pet dinosaur HPUX
>>> has it).  As long as we have an answer for Windows, it's hard to argue
>>> we can't require poll() elsewhere.
>
>> I don't think we'd necessarily need to completely de-support people
>> who still depend on select().  We'd just need to say, well,
>> WL_SOCKET_ERROR *may* report exceptional events on the socket, or it
>> may not, depending on how modern your platform is.  In the use cases I
>> foresee, that would occasionally result in less-timely detection of
>> FDW connection loss, but nothing worse.  I'm not prepared to get very
>> excited about that.
>
> I'm not either, but ...
>
>> But if we are confident that everything supports poll() and it's
>> always better than select(), another, possibly superior option is to
>> remove support for select() and see if anything breaks.  If not, then
>> we only need to support three platform-specific implementations
>> instead of four, which I would find it difficult to complain about.
>
> ... the evidence available suggests that the select() code path has
> probably received zero buildfarm testing.  Do we really want to ship
> a fourth implementation that we can't even vouch for?

I'm more than happy to rip it out, either now or after the tree opens
for 9.7 development.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Hi,

On 2016-05-04 15:54:32 -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Wed, May 4, 2016 at 3:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Hmm ... wait, I take that back.  poll() is required by SUS v2, which has
> >> been our minimum baseline spec for a long time (even my pet dinosaur HPUX
> >> has it).  As long as we have an answer for Windows, it's hard to argue
> >> we can't require poll() elsewhere.

Yea, I think that's the case. I just ran the query for a longer period,
and there's not been any report since 2014 of an !windows animal without
poll support.


> > I don't think we'd necessarily need to completely de-support people
> > who still depend on select().  We'd just need to say, well,
> > WL_SOCKET_ERROR *may* report exceptional events on the socket, or it
> > may not, depending on how modern your platform is.  In the use cases I
> > foresee, that would occasionally result in less-timely detection of
> > FDW connection loss, but nothing worse.  I'm not prepared to get very
> > excited about that.
> 
> I'm not either, but ...

I'm not worried about slightly degraded capabilities there either, more
about the (lack of) ability to test that reasonably.


> > But if we are confident that everything supports poll() and it's
> > always better than select(), another, possibly superior option is to
> > remove support for select() and see if anything breaks.  If not, then
> > we only need to support three platform-specific implementations
> > instead of four, which I would find it difficult to complain about.
> 
> ... the evidence available suggests that the select() code path has
> probably received zero buildfarm testing.  Do we really want to ship
> a fourth implementation that we can't even vouch for?

I mean I added code to make it easier to test the select path in 9.6,
but yea, outside of my machines running the latest linux kernel there
is, that code path appears to have had barely any testing over the last
few years.


Andres



On 2016-05-04 16:05:04 -0400, Robert Haas wrote:
> I'm more than happy to rip it out, either now or after the tree opens
> for 9.7 development.

Let's rip the select support out in 9.7 then; given the relevant code
was already written and tested there's no hurry.  But if you'd rather do
so earlier (as in *now*) I'm not going to protest either.