Thread: kqueue

kqueue

From
Thomas Munro
Date:
Hi,

On the WaitEventSet thread I posted a small patch to add kqueue
support[1].  Since then I peeked at how some other software[2]
interacts with kqueue and discovered that there are platforms
including NetBSD where kevent.udata is an intptr_t instead of a void
*.  Here's a version which should compile there.  Would any NetBSD
user be interested in testing this?  (An alternative would be to make
configure to test for this with some kind of AC_COMPILE_IFELSE
incantation but the steamroller cast is simpler.)

[1] http://www.postgresql.org/message-id/CAEepm=1dZ_mC+V3YtB79zf27280nign8MKOLxy2FKhvc1RzN=g@mail.gmail.com
[2] https://github.com/libevent/libevent/commit/5602e451ce872d7d60c640590113c5a81c3fc389

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: kqueue

From
Robert Haas
Date:
On Tue, Mar 29, 2016 at 7:53 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On the WaitEventSet thread I posted a small patch to add kqueue
> support[1].  Since then I peeked at how some other software[2]
> interacts with kqueue and discovered that there are platforms
> including NetBSD where kevent.udata is an intptr_t instead of a void
> *.  Here's a version which should compile there.  Would any NetBSD
> user be interested in testing this?  (An alternative would be to make
> configure to test for this with some kind of AC_COMPILE_IFELSE
> incantation but the steamroller cast is simpler.)

Did you code this up blind or do you have a NetBSD machine yourself?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: kqueue

From
Andres Freund
Date:
On 2016-04-21 14:15:53 -0400, Robert Haas wrote:
> On Tue, Mar 29, 2016 at 7:53 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
> > On the WaitEventSet thread I posted a small patch to add kqueue
> > support[1].  Since then I peeked at how some other software[2]
> > interacts with kqueue and discovered that there are platforms
> > including NetBSD where kevent.udata is an intptr_t instead of a void
> > *.  Here's a version which should compile there.  Would any NetBSD
> > user be interested in testing this?  (An alternative would be to make
> > configure to test for this with some kind of AC_COMPILE_IFELSE
> > incantation but the steamroller cast is simpler.)
> 
> Did you code this up blind or do you have a NetBSD machine yourself?

RMT, what do you think, should we try to get this into 9.6? It's
feasible that the performance problem 98a64d0bd713c addressed is also
present on free/netbsd.

- Andres



Re: kqueue

From
Robert Haas
Date:
On Thu, Apr 21, 2016 at 2:22 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-04-21 14:15:53 -0400, Robert Haas wrote:
>> On Tue, Mar 29, 2016 at 7:53 PM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>> > On the WaitEventSet thread I posted a small patch to add kqueue
>> > support[1].  Since then I peeked at how some other software[2]
>> > interacts with kqueue and discovered that there are platforms
>> > including NetBSD where kevent.udata is an intptr_t instead of a void
>> > *.  Here's a version which should compile there.  Would any NetBSD
>> > user be interested in testing this?  (An alternative would be to make
>> > configure to test for this with some kind of AC_COMPILE_IFELSE
>> > incantation but the steamroller cast is simpler.)
>>
>> Did you code this up blind or do you have a NetBSD machine yourself?
>
> RMT, what do you think, should we try to get this into 9.6? It's
> feasible that the performance problem 98a64d0bd713c addressed is also
> present on free/netbsd.

My personal opinion is that it would be a reasonable thing to do if
somebody can demonstrate that it actually solves a real problem.
Absent that, I don't think we should rush it in.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: kqueue

From
Alvaro Herrera
Date:
Robert Haas wrote:
> On Thu, Apr 21, 2016 at 2:22 PM, Andres Freund <andres@anarazel.de> wrote:
> > On 2016-04-21 14:15:53 -0400, Robert Haas wrote:
> >> On Tue, Mar 29, 2016 at 7:53 PM, Thomas Munro
> >> <thomas.munro@enterprisedb.com> wrote:
> >> > On the WaitEventSet thread I posted a small patch to add kqueue
> >> > support[1].  Since then I peeked at how some other software[2]
> >> > interacts with kqueue and discovered that there are platforms
> >> > including NetBSD where kevent.udata is an intptr_t instead of a void
> >> > *.  Here's a version which should compile there.  Would any NetBSD
> >> > user be interested in testing this?  (An alternative would be to make
> >> > configure to test for this with some kind of AC_COMPILE_IFELSE
> >> > incantation but the steamroller cast is simpler.)
> >>
> >> Did you code this up blind or do you have a NetBSD machine yourself?
> >
> > RMT, what do you think, should we try to get this into 9.6? It's
> > feasible that the performance problem 98a64d0bd713c addressed is also
> > present on free/netbsd.
> 
> My personal opinion is that it would be a reasonable thing to do if
> somebody can demonstrate that it actually solves a real problem.
> Absent that, I don't think we should rush it in.

My first question is whether there are platforms that use kqueue on
which the WaitEventSet stuff proves to be a bottleneck.  I vaguely
recall that MacOS X in particular doesn't scale terribly well for other
reasons, and I don't know if anybody runs *BSD in large machines.

On the other hand, there's plenty of hackers running their laptops on
MacOS X these days, so presumably any platform dependent problem would
be discovered quickly enough.  As for NetBSD, it seems mostly a fringe
platform, doesn't it?  We would discover serious dependency problems
quickly enough on the buildfarm ... except that the only netbsd
buildfarm member hasn't reported in over two weeks.

Am I mistaken in any of these points?

(Our coverage of the BSD platforms leaves much to be desired FWIW.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: kqueue

From
Robert Haas
Date:
On Thu, Apr 21, 2016 at 3:31 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Robert Haas wrote:
>> On Thu, Apr 21, 2016 at 2:22 PM, Andres Freund <andres@anarazel.de> wrote:
>> > On 2016-04-21 14:15:53 -0400, Robert Haas wrote:
>> >> On Tue, Mar 29, 2016 at 7:53 PM, Thomas Munro
>> >> <thomas.munro@enterprisedb.com> wrote:
>> >> > On the WaitEventSet thread I posted a small patch to add kqueue
>> >> > support[1].  Since then I peeked at how some other software[2]
>> >> > interacts with kqueue and discovered that there are platforms
>> >> > including NetBSD where kevent.udata is an intptr_t instead of a void
>> >> > *.  Here's a version which should compile there.  Would any NetBSD
>> >> > user be interested in testing this?  (An alternative would be to make
>> >> > configure to test for this with some kind of AC_COMPILE_IFELSE
>> >> > incantation but the steamroller cast is simpler.)
>> >>
>> >> Did you code this up blind or do you have a NetBSD machine yourself?
>> >
>> > RMT, what do you think, should we try to get this into 9.6? It's
>> > feasible that the performance problem 98a64d0bd713c addressed is also
>> > present on free/netbsd.
>>
>> My personal opinion is that it would be a reasonable thing to do if
>> somebody can demonstrate that it actually solves a real problem.
>> Absent that, I don't think we should rush it in.
>
> My first question is whether there are platforms that use kqueue on
> which the WaitEventSet stuff proves to be a bottleneck.  I vaguely
> recall that MacOS X in particular doesn't scale terribly well for other
> reasons, and I don't know if anybody runs *BSD in large machines.
>
> On the other hand, there's plenty of hackers running their laptops on
> MacOS X these days, so presumably any platform dependent problem would
> be discovered quickly enough.  As for NetBSD, it seems mostly a fringe
> platform, doesn't it?  We would discover serious dependency problems
> quickly enough on the buildfarm ... except that the only netbsd
> buildfarm member hasn't reported in over two weeks.
>
> Am I mistaken in any of these points?
>
> (Our coverage of the BSD platforms leaves much to be desired FWIW.)

My impression is that the Linux problem only manifested itself on
large machines.  I might be wrong about that.  But if that's true,
then we might not see regressions on other platforms just because
people aren't running those operating systems on big enough hardware.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: kqueue

From
Andres Freund
Date:
On 2016-04-21 14:25:06 -0400, Robert Haas wrote:
> On Thu, Apr 21, 2016 at 2:22 PM, Andres Freund <andres@anarazel.de> wrote:
> > On 2016-04-21 14:15:53 -0400, Robert Haas wrote:
> >> On Tue, Mar 29, 2016 at 7:53 PM, Thomas Munro
> >> <thomas.munro@enterprisedb.com> wrote:
> >> > On the WaitEventSet thread I posted a small patch to add kqueue
> >> > support[1].  Since then I peeked at how some other software[2]
> >> > interacts with kqueue and discovered that there are platforms
> >> > including NetBSD where kevent.udata is an intptr_t instead of a void
> >> > *.  Here's a version which should compile there.  Would any NetBSD
> >> > user be interested in testing this?  (An alternative would be to make
> >> > configure to test for this with some kind of AC_COMPILE_IFELSE
> >> > incantation but the steamroller cast is simpler.)
> >>
> >> Did you code this up blind or do you have a NetBSD machine yourself?
> >
> > RMT, what do you think, should we try to get this into 9.6? It's
> > feasible that the performance problem 98a64d0bd713c addressed is also
> > present on free/netbsd.
> 
> My personal opinion is that it would be a reasonable thing to do if
> somebody can demonstrate that it actually solves a real problem.
> Absent that, I don't think we should rush it in.

On linux you needed a 2 socket machine to demonstrate the problem, but
both old ones (my 2009 workstation) and new ones were sufficient. I'd be
surprised if the situation on freebsd is any better, except that you
might hit another scalability bottleneck earlier.

I doubt there's many real postgres instances operating on bigger
hardware on freebsd, with sufficient throughput to show the problem. So
I think the argument for including is more along trying to be "nice" to
more niche-y OSs.

I really don't have any opinion either way.

- Andres



Re: kqueue

From
Thomas Munro
Date:
On Fri, Apr 22, 2016 at 12:21 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-04-21 14:25:06 -0400, Robert Haas wrote:
>> On Thu, Apr 21, 2016 at 2:22 PM, Andres Freund <andres@anarazel.de> wrote:
>> > On 2016-04-21 14:15:53 -0400, Robert Haas wrote:
>> >> On Tue, Mar 29, 2016 at 7:53 PM, Thomas Munro
>> >> <thomas.munro@enterprisedb.com> wrote:
>> >> > On the WaitEventSet thread I posted a small patch to add kqueue
>> >> > support[1].  Since then I peeked at how some other software[2]
>> >> > interacts with kqueue and discovered that there are platforms
>> >> > including NetBSD where kevent.udata is an intptr_t instead of a void
>> >> > *.  Here's a version which should compile there.  Would any NetBSD
>> >> > user be interested in testing this?  (An alternative would be to make
>> >> > configure to test for this with some kind of AC_COMPILE_IFELSE
>> >> > incantation but the steamroller cast is simpler.)
>> >>
>> >> Did you code this up blind or do you have a NetBSD machine yourself?
>> >
>> > RMT, what do you think, should we try to get this into 9.6? It's
>> > feasible that the performance problem 98a64d0bd713c addressed is also
>> > present on free/netbsd.
>>
>> My personal opinion is that it would be a reasonable thing to do if
>> somebody can demonstrate that it actually solves a real problem.
>> Absent that, I don't think we should rush it in.
>
> On linux you needed a 2 socket machine to demonstrate the problem, but
> both old ones (my 2009 workstation) and new ones were sufficient. I'd be
> surprised if the situation on freebsd is any better, except that you
> might hit another scalability bottleneck earlier.
>
> I doubt there's many real postgres instances operating on bigger
> hardware on freebsd, with sufficient throughput to show the problem. So
> I think the argument for including is more along trying to be "nice" to
> more niche-y OSs.

What has BSD ever done for us?!  (Joke...)

I vote to leave this patch in the next commitfest where it is, and
reconsider if someone shows up with a relevant problem report on large
systems.  I can't see any measurable performance difference on a 4
core laptop running FreeBSD 10.3.  Maybe kqueue will make more
difference even on smaller systems in future releases if we start
using big wait sets for distributed/asynchronous work, in-core
pooling/admission control etc.

Here's a new version of the patch that fixes some stupid bugs.  I have
run regression tests and some basic sanity checks on OSX 10.11.4,
FreeBSD 10.3, NetBSD 7.0 and OpenBSD 5.8.  There is still room to make
an improvement that would drop the syscall from AddWaitEventToSet and
ModifyWaitEvent, compressing wait set modifications and waiting into a
single syscall (kqueue's claimed advantage over the competition).

While doing that I discovered that unpatched master doesn't actually
build on recent NetBSD systems because our static function strtoi
clashes with a non-standard libc function of the same name[1] declared
in inttypes.h.  Maybe we should rename it, like in the attached?

[1] http://netbsd.gw.com/cgi-bin/man-cgi?strtoi++NetBSD-current

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: kqueue

From
Andres Freund
Date:
On 2016-04-22 20:39:27 +1200, Thomas Munro wrote:
> I vote to leave this patch in the next commitfest where it is, and
> reconsider if someone shows up with a relevant problem report on large
> systems.

Sounds good!


> Here's a new version of the patch that fixes some stupid bugs.  I have
> run regression tests and some basic sanity checks on OSX 10.11.4,
> FreeBSD 10.3, NetBSD 7.0 and OpenBSD 5.8.  There is still room to make
> an improvement that would drop the syscall from AddWaitEventToSet and
> ModifyWaitEvent, compressing wait set modifications and waiting into a
> single syscall (kqueue's claimed advantage over the competition).

I find that not to be particularly interesting, and would rather want to
avoid adding complexity for it.


> While doing that I discovered that unpatched master doesn't actually
> build on recent NetBSD systems because our static function strtoi
> clashes with a non-standard libc function of the same name[1] declared
> in inttypes.h.  Maybe we should rename it, like in the attached?

Yuck. That's a new function they introduced? That code hasn't changed in
a while....

Andres



Re: kqueue

From
Thomas Munro
Date:
On Sat, Apr 23, 2016 at 4:36 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-04-22 20:39:27 +1200, Thomas Munro wrote:
>> While doing that I discovered that unpatched master doesn't actually
>> build on recent NetBSD systems because our static function strtoi
>> clashes with a non-standard libc function of the same name[1] declared
>> in inttypes.h.  Maybe we should rename it, like in the attached?
>
> Yuck. That's a new function they introduced? That code hasn't changed in
> a while....

Yes, according to the man page it appeared in NetBSD 7.0.  That was
released in September 2015, and our buildfarm has only NetBSD 5.x
systems.  I see that the maintainers of the NetBSD pg package deal
with this with a preprocessor kludge:


http://cvsweb.netbsd.org/bsdweb.cgi/pkgsrc/databases/postgresql95/patches/patch-src_backend_utils_adt_datetime.c?rev=1.1

What is the policy for that kind of thing -- do nothing until someone
cares enough about the platform to supply a buildfarm animal?

-- 
Thomas Munro
http://www.enterprisedb.com



Re: kqueue

From
Alvaro Herrera
Date:
Thomas Munro wrote:
> On Sat, Apr 23, 2016 at 4:36 AM, Andres Freund <andres@anarazel.de> wrote:
> > On 2016-04-22 20:39:27 +1200, Thomas Munro wrote:
> >> While doing that I discovered that unpatched master doesn't actually
> >> build on recent NetBSD systems because our static function strtoi
> >> clashes with a non-standard libc function of the same name[1] declared
> >> in inttypes.h.  Maybe we should rename it, like in the attached?
> >
> > Yuck. That's a new function they introduced? That code hasn't changed in
> > a while....
> 
> Yes, according to the man page it appeared in NetBSD 7.0.  That was
> released in September 2015, and our buildfarm has only NetBSD 5.x
> systems.  I see that the maintainers of the NetBSD pg package deal
> with this with a preprocessor kludge:
> 
>
http://cvsweb.netbsd.org/bsdweb.cgi/pkgsrc/databases/postgresql95/patches/patch-src_backend_utils_adt_datetime.c?rev=1.1
> 
> What is the policy for that kind of thing -- do nothing until someone
> cares enough about the platform to supply a buildfarm animal?

Well, if the platform is truly alive, we would have gotten complaints
already.  Since we haven't, maybe nobody cares, so why should we?  I
would rename our function nonetheless FWIW; the name seems far too
generic to me.  pg_strtoi?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: kqueue

From
Andres Freund
Date:
On 2016-04-23 10:12:12 +1200, Thomas Munro wrote:
> What is the policy for that kind of thing -- do nothing until someone
> cares enough about the platform to supply a buildfarm animal?

I think we should fix it, I just want to make sure we understand why the
error is appearing now. Since we now do...

- Andres



Re: kqueue

From
Andres Freund
Date:
On 2016-04-22 19:25:06 -0300, Alvaro Herrera wrote:
> Since we haven't, maybe nobody cares, so why should we?

I guess it's to a good degree because netbsd has pg packages, and it's
fixed there?

> would rename our function nonetheless FWIW; the name seems far too
> generic to me.

Yea.

> pg_strtoi?

I think that's what Thomas did upthread. Are you taking this one then?


Greetings,

Andres Freund



Re: kqueue

From
Tom Lane
Date:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> On Sat, Apr 23, 2016 at 4:36 AM, Andres Freund <andres@anarazel.de> wrote:
>> On 2016-04-22 20:39:27 +1200, Thomas Munro wrote:
>>> While doing that I discovered that unpatched master doesn't actually
>>> build on recent NetBSD systems because our static function strtoi
>>> clashes with a non-standard libc function of the same name[1] declared
>>> in inttypes.h.  Maybe we should rename it, like in the attached?

>> Yuck. That's a new function they introduced? That code hasn't changed in
>> a while....

> Yes, according to the man page it appeared in NetBSD 7.0.  That was
> released in September 2015, and our buildfarm has only NetBSD 5.x
> systems.  I see that the maintainers of the NetBSD pg package deal
> with this with a preprocessor kludge:

>
http://cvsweb.netbsd.org/bsdweb.cgi/pkgsrc/databases/postgresql95/patches/patch-src_backend_utils_adt_datetime.c?rev=1.1

> What is the policy for that kind of thing -- do nothing until someone
> cares enough about the platform to supply a buildfarm animal?

There's no set policy, but certainly a promise to put up a buildfarm
animal would establish that somebody actually cares about keeping
Postgres running on the platform.  Without one, we might fix a specific
problem when reported, but we'd have no way to know about new problems.

Rooting through that patches directory reveals quite a number of
random-looking patches, most of which we certainly wouldn't take
without a lot more than zero explanation.  It's hard to tell which
are actually needed, but at least some don't seem to have anything
to do with building for NetBSD.
        regards, tom lane



Re: kqueue

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
>> pg_strtoi?

> I think that's what Thomas did upthread. Are you taking this one then?

I'd go with just "strtoint".  We have "strtoint64" elsewhere.
        regards, tom lane



Re: kqueue

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> >> pg_strtoi?
> 
> > I think that's what Thomas did upthread. Are you taking this one then?
> 
> I'd go with just "strtoint".  We have "strtoint64" elsewhere.

For closure of this subthread: this rename was committed by Tom as
0ab3595e5bb5.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: kqueue

From
Thomas Munro
Date:
On Fri, Jun 3, 2016 at 4:02 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>> >> pg_strtoi?
>>
>> > I think that's what Thomas did upthread. Are you taking this one then?
>>
>> I'd go with just "strtoint".  We have "strtoint64" elsewhere.
>
> For closure of this subthread: this rename was committed by Tom as
> 0ab3595e5bb5.

Thanks.  And here is a new version of the kqueue patch.  The previous
version doesn't apply on top of recent commit
a3b30763cc8686f5b4cd121ef0bf510c1533ac22, which sprinkled some
MAXALIGN macros nearby.  I've now done the same thing with the kevent
struct because it's cheap, uniform with the other cases and could
matter on some platforms for the same reason.

It's in the September commitfest here: https://commitfest.postgresql.org/10/597/

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: kqueue

From
Marko Tiikkaja
Date:
On 2016-06-03 01:45, Thomas Munro wrote:
> On Fri, Jun 3, 2016 at 4:02 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> Tom Lane wrote:
>>> Andres Freund <andres@anarazel.de> writes:
>>>>> pg_strtoi?
>>>
>>>> I think that's what Thomas did upthread. Are you taking this one then?
>>>
>>> I'd go with just "strtoint".  We have "strtoint64" elsewhere.
>>
>> For closure of this subthread: this rename was committed by Tom as
>> 0ab3595e5bb5.
>
> Thanks.  And here is a new version of the kqueue patch.  The previous
> version doesn't apply on top of recent commit
> a3b30763cc8686f5b4cd121ef0bf510c1533ac22, which sprinkled some
> MAXALIGN macros nearby.  I've now done the same thing with the kevent
> struct because it's cheap, uniform with the other cases and could
> matter on some platforms for the same reason.

I've tested and reviewed this, and it looks good to me, other than this 
part:

+   /*
+    * kevent guarantees that the change list has been processed in the 
EINTR
+    * case.  Here we are only applying a change list so EINTR counts as
+    * success.
+    */

this doesn't seem to be guaranteed on old versions of FreeBSD or any 
other BSD flavors, so I don't think it's a good idea to bake the 
assumption into this code.  Or what do you think?


.m



Re: kqueue

From
Thomas Munro
Date:
On Wed, Sep 7, 2016 at 12:32 AM, Marko Tiikkaja <marko@joh.to> wrote:
> I've tested and reviewed this, and it looks good to me, other than this
> part:
>
> +   /*
> +    * kevent guarantees that the change list has been processed in the
> EINTR
> +    * case.  Here we are only applying a change list so EINTR counts as
> +    * success.
> +    */
>
> this doesn't seem to be guaranteed on old versions of FreeBSD or any other
> BSD flavors, so I don't think it's a good idea to bake the assumption into
> this code.  Or what do you think?

Thanks for the testing and review!

Hmm.  Well spotted.  I wrote that because the man page from FreeBSD 10.3 says:

  When kevent() call fails with EINTR error, all changes in the changelist
  have been applied.

This sentence is indeed missing from the OpenBSD, NetBSD and OSX man
pages.  It was introduced by FreeBSD commit r280818[1] which made
kevent a Pthread cancellation point.  I investigated whether it is
also true in older FreeBSD and the rest of the BSD family.  I believe
the answer is yes.

1.  That commit doesn't do anything that would change the situation:
it just adds thread cancellation wrapper code to libc and libthr which
exits under certain conditions but otherwise lets EINTR through to the
caller.  So I think the new sentence is documentation of the existing
behaviour of the syscall.

2.  I looked at the code in FreeBSD 4.1[2] (the original kqueue
implementation from which all others derive) and the four modern
OSes[3][4][5][6].  They vary a bit but in all cases, the first place
that can produce EINTR appears to be in kqueue_scan when the
(variously named) kernel sleep routine is invoked, which can return
EINTR or ERESTART  (later translated to EINTR because kevent doesn't
support restarting).  That comes after all changes have been applied.
In fact it's unreachable if nevents is 0: OSX doesn't call kqueue_scan
in that case, and the others return early from kqueue_scan in that
case.

3.  An old email[7] from Jonathan Lemon (creator of kqueue) seems to
support that at least in respect of ancient FreeBSD.  He wrote:
"Technically, an EINTR is returned when a signal interrupts the
process after it goes to sleep (that is, after it calls tsleep).  So
if (as an example) you call kevent() with a zero valued timespec,
you'll never get EINTR, since there's no possibility of it sleeping."

So if I've understood correctly, what I wrote in the v4 patch is
universally true, but it's also moot in this case: kevent cannot fail
with errno == EINTR because nevents == 0.  On that basis, here is a
new version with the comment and special case for EINTR removed.

[1] https://svnweb.freebsd.org/base?view=revision&revision=280818
[2] https://github.com/freebsd/freebsd/blob/release/4.1.0/sys/kern/kern_event.c
[3] https://github.com/freebsd/freebsd/blob/master/sys/kern/kern_event.c
[4] https://github.com/IIJ-NetBSD/netbsd-src/blob/master/sys/kern/kern_event.c
[5] https://github.com/openbsd/src/blob/master/sys/kern/kern_event.c
[6] https://github.com/opensource-apple/xnu/blob/master/bsd/kern/kern_event.c
[7] http://marc.info/?l=freebsd-arch&m=98147346707952&w=2

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: kqueue

From
Heikki Linnakangas
Date:
So, if I've understood correctly, the purpose of this patch is to 
improve performance on a multi-CPU system, which has the kqueue() 
function. Most notably, FreeBSD?

I launched a FreeBSD 10.3 instance on Amazon EC2 (ami-e0682b80), on a 
m4.10xlarge instance. That's a 40 core system, biggest available, I 
believe. I built PostgreSQL master on it, and ran pgbench to benchmark:

pgbench -i -s 200 postgres
pgbench -M prepared  -j 36 -c 36 -S postgres -T20 -P1

I set shared_buffers to 10 GB, so that the whole database fits in cache. 
I tested that with and without kqueue-v5.patch

Result: I don't see any difference in performance. pgbench reports 
between 80,000 and 97,000 TPS, with or without the patch:

[ec2-user@ip-172-31-17-174 ~/postgresql]$ ~/pgsql-install/bin/pgbench -M 
prepared  -j 36 -c 36 -S postgres -T20 -P1
starting vacuum...end.
progress: 1.0 s, 94537.1 tps, lat 0.368 ms stddev 0.145
progress: 2.0 s, 96745.9 tps, lat 0.368 ms stddev 0.143
progress: 3.0 s, 93870.1 tps, lat 0.380 ms stddev 0.146
progress: 4.0 s, 89482.9 tps, lat 0.399 ms stddev 0.146
progress: 5.0 s, 87815.0 tps, lat 0.406 ms stddev 0.148
progress: 6.0 s, 86415.5 tps, lat 0.413 ms stddev 0.145
progress: 7.0 s, 86011.0 tps, lat 0.415 ms stddev 0.147
progress: 8.0 s, 84923.0 tps, lat 0.420 ms stddev 0.147
progress: 9.0 s, 84596.6 tps, lat 0.422 ms stddev 0.146
progress: 10.0 s, 84537.7 tps, lat 0.422 ms stddev 0.146
progress: 11.0 s, 83910.5 tps, lat 0.425 ms stddev 0.150
progress: 12.0 s, 83738.2 tps, lat 0.426 ms stddev 0.150
progress: 13.0 s, 83837.5 tps, lat 0.426 ms stddev 0.147
progress: 14.0 s, 83578.4 tps, lat 0.427 ms stddev 0.147
progress: 15.0 s, 83609.5 tps, lat 0.427 ms stddev 0.148
progress: 16.0 s, 83423.5 tps, lat 0.428 ms stddev 0.151
progress: 17.0 s, 83318.2 tps, lat 0.428 ms stddev 0.149
progress: 18.0 s, 82992.7 tps, lat 0.430 ms stddev 0.149
progress: 19.0 s, 83155.9 tps, lat 0.429 ms stddev 0.151
progress: 20.0 s, 83209.0 tps, lat 0.429 ms stddev 0.152
transaction type: <builtin: select only>
scaling factor: 200
query mode: prepared
number of clients: 36
number of threads: 36
duration: 20 s
number of transactions actually processed: 1723759
latency average = 0.413 ms
latency stddev = 0.149 ms
tps = 86124.484867 (including connections establishing)
tps = 86208.458034 (excluding connections establishing)


Is this test setup reasonable? I know very little about FreeBSD, I'm 
afraid, so I don't know how to profile or test that further than that.

If there's no measurable difference in performance, between kqueue() and 
poll(), I think we should forget about this. If there's a FreeBSD hacker 
out there that can demonstrate better results, I'm all for committing 
this, but I'm reluctant to add code if no-one can show the benefit.

- Heikki




Re: kqueue

From
Tom Lane
Date:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> So, if I've understood correctly, the purpose of this patch is to 
> improve performance on a multi-CPU system, which has the kqueue() 
> function. Most notably, FreeBSD?

OS X also has this, so it might be worth trying on a multi-CPU Mac.

> If there's no measurable difference in performance, between kqueue() and 
> poll(), I think we should forget about this.

I agree that we shouldn't add this unless it's demonstrably a win.
No opinion on whether your test is adequate.
        regards, tom lane



Re: kqueue

From
Heikki Linnakangas
Date:
On 09/13/2016 04:33 PM, Tom Lane wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> So, if I've understood correctly, the purpose of this patch is to
>> improve performance on a multi-CPU system, which has the kqueue()
>> function. Most notably, FreeBSD?
>
> OS X also has this, so it might be worth trying on a multi-CPU Mac.
>
>> If there's no measurable difference in performance, between kqueue() and
>> poll(), I think we should forget about this.
>
> I agree that we shouldn't add this unless it's demonstrably a win.
> No opinion on whether your test is adequate.

I'm marking this as "Returned with Feedback", waiting for someone to 
post test results that show a positive performance benefit from this.

- Heikki





Re: kqueue

From
Andres Freund
Date:
Hi,


On 2016-09-13 16:08:39 +0300, Heikki Linnakangas wrote:
> So, if I've understood correctly, the purpose of this patch is to improve
> performance on a multi-CPU system, which has the kqueue() function. Most
> notably, FreeBSD?

I think it's not necessarily about the current system, but more about
future uses of the WaitEventSet stuff. Some of that is going to use a
lot more sockets. E.g. doing a parallel append over FDWs.


> I launched a FreeBSD 10.3 instance on Amazon EC2 (ami-e0682b80), on a
> m4.10xlarge instance. That's a 40 core system, biggest available, I believe.
> I built PostgreSQL master on it, and ran pgbench to benchmark:
> 
> pgbench -i -s 200 postgres
> pgbench -M prepared  -j 36 -c 36 -S postgres -T20 -P1

This seems likely to actually seldomly exercise the relevant code
path. We only do the poll()/epoll_wait()/... when a read() doesn't
return anything, but that seems likely to seldomly occur here.  Using a
lower thread count and a lot higher client count might change that.

Note that the case where poll vs. epoll made a large difference (after
the regression due to ac1d7945f86) on linux was only on fairly large
machines, with high clients counts.

Greetings,

Andres Freund



Re: kqueue

From
Simon Riggs
Date:
On 13 September 2016 at 08:08, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

> So, if I've understood correctly, the purpose of this patch is to improve
> performance on a multi-CPU system, which has the kqueue() function. Most
> notably, FreeBSD?

I'm getting a little fried from "self-documenting" patches, from
multiple sources.

I think we should make it a firm requirement to explain what a patch
is actually about, with extra points for including with it a test that
allows us to validate that. We don't have enough committer time to
waste on such things.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: kqueue

From
Robert Haas
Date:
On Tue, Sep 13, 2016 at 11:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 13 September 2016 at 08:08, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> So, if I've understood correctly, the purpose of this patch is to improve
>> performance on a multi-CPU system, which has the kqueue() function. Most
>> notably, FreeBSD?
>
> I'm getting a little fried from "self-documenting" patches, from
> multiple sources.
>
> I think we should make it a firm requirement to explain what a patch
> is actually about, with extra points for including with it a test that
> allows us to validate that. We don't have enough committer time to
> waste on such things.

You've complained about this a whole bunch of times recently, but in
most of those cases I didn't think there was any real unclarity.  I
agree that it's a good idea for a patch to be submitted with suitable
submission notes, but it also isn't reasonable to expect those
submission notes to be reposted with every single version of every
patch.  Indeed, I'd find that pretty annoying.  Thomas linked back to
the previous thread where this was discussed, which seems more or less
sufficient.  If committers are too busy to click on links in the patch
submission emails, they have no business committing anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: kqueue

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2016-09-13 16:08:39 +0300, Heikki Linnakangas wrote:
>> So, if I've understood correctly, the purpose of this patch is to improve
>> performance on a multi-CPU system, which has the kqueue() function. Most
>> notably, FreeBSD?

> I think it's not necessarily about the current system, but more about
> future uses of the WaitEventSet stuff. Some of that is going to use a
> lot more sockets. E.g. doing a parallel append over FDWs.

All fine, but the burden of proof has to be on the patch to show that
it does something significant.  We don't want to be carrying around
platform-specific code, which necessarily has higher maintenance cost
than other code, without a darn good reason.

Also, if it's only a win on machines with dozens of CPUs, how many
people are running *BSD on that kind of iron?  I think Linux is by
far the dominant kernel for such hardware.  For sure Apple isn't
selling any machines like that.
        regards, tom lane



Re: kqueue

From
Andres Freund
Date:
On 2016-09-13 12:43:36 -0400, Tom Lane wrote:
> > I think it's not necessarily about the current system, but more about
> > future uses of the WaitEventSet stuff. Some of that is going to use a
> > lot more sockets. E.g. doing a parallel append over FDWs.

(note that I'm talking about network sockets not cpu sockets here)


> All fine, but the burden of proof has to be on the patch to show that
> it does something significant.  We don't want to be carrying around
> platform-specific code, which necessarily has higher maintenance cost
> than other code, without a darn good reason.

No argument there.


> Also, if it's only a win on machines with dozens of CPUs, how many
> people are running *BSD on that kind of iron?  I think Linux is by
> far the dominant kernel for such hardware.  For sure Apple isn't
> selling any machines like that.

I'm not sure you need quite that big a machine, if you test a workload
that currently reaches the poll().

Regards,

Andres



Re: kqueue

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2016-09-13 12:43:36 -0400, Tom Lane wrote:
>> Also, if it's only a win on machines with dozens of CPUs, how many
>> people are running *BSD on that kind of iron?  I think Linux is by
>> far the dominant kernel for such hardware.  For sure Apple isn't
>> selling any machines like that.

> I'm not sure you need quite that big a machine, if you test a workload
> that currently reaches the poll().

Well, Thomas stated in
https://www.postgresql.org/message-id/CAEepm%3D1CwuAq35FtVBTZO-mnGFH1xEFtDpKQOf_b6WoEmdZZHA%40mail.gmail.com
that he hadn't been able to measure any performance difference, and
I assume he was trying test cases from the WaitEventSet thread.

Also I notice that the WaitEventSet thread started with a simple
pgbench test, so I don't really buy the claim that that's not a
way that will reach the problem.

I'd be happy to see this go in if it can be shown to provide a measurable
performance improvement, but so far we have only guesses that someday
it *might* make a difference.  That's not good enough to add to our
maintenance burden IMO.

Anyway, the patch is in the archives now, so it won't be hard to resurrect
if the situation changes.
        regards, tom lane



Re: kqueue

From
Andres Freund
Date:
On 2016-09-13 14:47:08 -0400, Tom Lane wrote:
> Also I notice that the WaitEventSet thread started with a simple
> pgbench test, so I don't really buy the claim that that's not a
> way that will reach the problem.

You can reach it, but not when using 1 core:one pgbench thread:one
client connection, there need to be more connections than that. At least
that was my observation on x86 / linux.

Andres



Re: kqueue

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2016-09-13 14:47:08 -0400, Tom Lane wrote:
>> Also I notice that the WaitEventSet thread started with a simple
>> pgbench test, so I don't really buy the claim that that's not a
>> way that will reach the problem.

> You can reach it, but not when using 1 core:one pgbench thread:one
> client connection, there need to be more connections than that. At least
> that was my observation on x86 / linux.

Well, that original test was 

>> I tried to run pgbench -s 1000 -j 48 -c 48 -S -M prepared on 70 CPU-core
>> machine:

so no, not 1 client ;-)

Anyway, I decided to put my money where my mouth was and run my own
benchmark.  On my couple-year-old Macbook Pro running OS X 10.11.6,
using a straight build of today's HEAD, asserts disabled, fsync off
but no other parameters changed, I did "pgbench -i -s 100" and then
did this a few times:pgbench -T 60 -j 4 -c 4 -M prepared -S bench
(It's a 4-core CPU so I saw little point in pressing harder than
that.)  Median of 3 runs was 56028 TPS.  Repeating the runs with
kqueue-v5.patch applied, I got a median of 58975 TPS, or 5% better.
Run-to-run variation was only around 1% in each case.

So that's not a huge improvement, but it's clearly above the noise
floor, and this laptop is not what anyone would use for production
work eh?  Presumably you could show even better results on something
closer to server-grade hardware with more active clients.

So at this point I'm wondering why Thomas and Heikki could not measure
any win.  Based on my results it should be easy.  Is it possible that
OS X is better tuned for multi-CPU hardware than FreeBSD?
        regards, tom lane



Re: kqueue

From
Andres Freund
Date:
On 2016-09-13 15:37:22 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2016-09-13 14:47:08 -0400, Tom Lane wrote:
> >> Also I notice that the WaitEventSet thread started with a simple
> >> pgbench test, so I don't really buy the claim that that's not a
> >> way that will reach the problem.
> 
> > You can reach it, but not when using 1 core:one pgbench thread:one
> > client connection, there need to be more connections than that. At least
> > that was my observation on x86 / linux.
> 
> Well, that original test was 
> 
> >> I tried to run pgbench -s 1000 -j 48 -c 48 -S -M prepared on 70 CPU-core
> >> machine:
> 
> so no, not 1 client ;-)

What I meant wasn't one client, but less than one client per cpu, and
using a pgbench thread per backend. That way usually, at least on linux,
there'll be a relatively small amount of poll/epoll/whatever, because
the recvmsg()s will always have data available.


> Anyway, I decided to put my money where my mouth was and run my own
> benchmark.

Cool.


> (It's a 4-core CPU so I saw little point in pressing harder than
> that.)

I think in reality most busy machines, were performance and scalability
matter, are overcommitted in the number of connections vs. cores.  And
if you look at throughput graphs that makes sense; they tend to increase
considerably after reaching #hardware-threads, even if all connections
are full throttle busy.  It might not make sense if you just run large
analytics queries, or if you want the lowest latency possible, but in
everything else, the reality is that machines are often overcommitted
for good reason.


> So at this point I'm wondering why Thomas and Heikki could not measure
> any win.  Based on my results it should be easy.  Is it possible that
> OS X is better tuned for multi-CPU hardware than FreeBSD?

Hah!


Greetings,

Andres Freund



Re: kqueue

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2016-09-13 15:37:22 -0400, Tom Lane wrote:
>> (It's a 4-core CPU so I saw little point in pressing harder than
>> that.)

> I think in reality most busy machines, were performance and scalability
> matter, are overcommitted in the number of connections vs. cores.  And
> if you look at throughput graphs that makes sense; they tend to increase
> considerably after reaching #hardware-threads, even if all connections
> are full throttle busy.

At -j 10 -c 10, all else the same, I get 84928 TPS on HEAD and 90357
with the patch, so about 6% better.

>> So at this point I'm wondering why Thomas and Heikki could not measure
>> any win.  Based on my results it should be easy.  Is it possible that
>> OS X is better tuned for multi-CPU hardware than FreeBSD?

> Hah!

Well, there must be some reason why this patch improves matters on OS X
and not FreeBSD ...
        regards, tom lane



Re: kqueue

From
Tom Lane
Date:
I wrote:
> At -j 10 -c 10, all else the same, I get 84928 TPS on HEAD and 90357
> with the patch, so about 6% better.

And at -j 1 -c 1, I get 22390 and 24040 TPS, or about 7% better with
the patch.  So what I am seeing on OS X isn't contention of any sort,
but just a straight speedup that's independent of the number of clients
(at least up to 10).  Probably this represents less setup/teardown cost
for kqueue() waits than poll() waits.

So you could spin this as "FreeBSD's poll() implementation is better than
OS X's", or as "FreeBSD's kqueue() implementation is worse than OS X's",
but either way I do not think we're seeing the same issue that was
originally reported against Linux, where there was no visible problem at
all till you got to a couple dozen clients, cf

https://www.postgresql.org/message-id/CAB-SwXbPmfpgL6N4Ro4BbGyqXEqqzx56intHHBCfvpbFUx1DNA%40mail.gmail.com

I'm inclined to think the kqueue patch is worth applying just on the
grounds that it makes things better on OS X and doesn't seem to hurt
on FreeBSD.  Whether anyone would ever get to the point of seeing
intra-kernel contention on these platforms is hard to predict, but
we'd be ahead of the curve if so.

It would be good for someone else to reproduce my results though.
For one thing, 5%-ish is not that far above the noise level; maybe
what I'm measuring here is just good luck from relocation of critical
loops into more cache-line-friendly locations.
        regards, tom lane



Re: kqueue

From
Thomas Munro
Date:
On Wed, Sep 14, 2016 at 12:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> At -j 10 -c 10, all else the same, I get 84928 TPS on HEAD and 90357
>> with the patch, so about 6% better.
>
> And at -j 1 -c 1, I get 22390 and 24040 TPS, or about 7% better with
> the patch.  So what I am seeing on OS X isn't contention of any sort,
> but just a straight speedup that's independent of the number of clients
> (at least up to 10).  Probably this represents less setup/teardown cost
> for kqueue() waits than poll() waits.

Thanks for running all these tests.  I hadn't considered OS X performance.

> So you could spin this as "FreeBSD's poll() implementation is better than
> OS X's", or as "FreeBSD's kqueue() implementation is worse than OS X's",
> but either way I do not think we're seeing the same issue that was
> originally reported against Linux, where there was no visible problem at
> all till you got to a couple dozen clients, cf
>
> https://www.postgresql.org/message-id/CAB-SwXbPmfpgL6N4Ro4BbGyqXEqqzx56intHHBCfvpbFUx1DNA%40mail.gmail.com
>
> I'm inclined to think the kqueue patch is worth applying just on the
> grounds that it makes things better on OS X and doesn't seem to hurt
> on FreeBSD.  Whether anyone would ever get to the point of seeing
> intra-kernel contention on these platforms is hard to predict, but
> we'd be ahead of the curve if so.

I was originally thinking of this as simply the obvious missing
implementation of Andres's WaitEventSet API which would surely pay off
later as we do more with that API (asynchronous execution with many
remote nodes for sharding, built-in connection pooling/admission
control for large numbers of sockets?, ...).  I wasn't really
expecting it to show performance increases in simple one or two
pipe/socket cases on small core count machines, and it's interesting
that it clearly does on OS X.

> It would be good for someone else to reproduce my results though.
> For one thing, 5%-ish is not that far above the noise level; maybe
> what I'm measuring here is just good luck from relocation of critical
> loops into more cache-line-friendly locations.

Similar results here on a 4 core 2.2GHz Core i7 MacBook Pro running OS
X 10.11.5.  With default settings except fsync = off, I ran pgbench -i
-s 100, then took the median result of three runs of pgbench -T 60 -j
4 -c 4 -M prepared -S.  I used two different compilers in case it
helps to see results with different random instruction cache effects,
and got the following numbers:

Apple clang 703.0.31: 51654 TPS -> 55739 TPS = 7.9% improvement
GCC 6.1.0 from MacPorts: 52552 TPS -> 55143 TPS = 4.9% improvement

I reran the tests under FreeBSD 10.3 on a 4 core laptop and again saw
absolutely no measurable difference at 1, 4 or 24 clients.  Maybe a
big enough server could be made to contend on the postmaster pipe's
selinfo->si_mtx, in selrecord(), in pipe_poll() -- maybe that'd be
directly equivalent to what happened on multi-socket Linux with
poll(), but I don't know.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: kqueue

From
Michael Paquier
Date:
On Wed, Sep 14, 2016 at 7:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It would be good for someone else to reproduce my results though.
> For one thing, 5%-ish is not that far above the noise level; maybe
> what I'm measuring here is just good luck from relocation of critical
> loops into more cache-line-friendly locations.

From an OSX laptop with -S, -c 1 and -M prepared (9 runs, removed the
three best and three worst):
- HEAD: 9356/9343/9369
- HEAD + patch: 9433/9413/9461.071168
This laptop has a lot of I/O overhead... Still there is a slight
improvement here as well. Looking at the progress report, per-second
TPS gets easier more frequently into 9500~9600 TPS with the patch. So
at least I am seeing something.
-- 
Michael



Re: kqueue

From
Tom Lane
Date:
Michael Paquier <michael.paquier@gmail.com> writes:
> From an OSX laptop with -S, -c 1 and -M prepared (9 runs, removed the
> three best and three worst):
> - HEAD: 9356/9343/9369
> - HEAD + patch: 9433/9413/9461.071168
> This laptop has a lot of I/O overhead... Still there is a slight
> improvement here as well. Looking at the progress report, per-second
> TPS gets easier more frequently into 9500~9600 TPS with the patch. So
> at least I am seeing something.

Which OSX version exactly?
        regards, tom lane



Re: kqueue

From
Michael Paquier
Date:
On Wed, Sep 14, 2016 at 3:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> From an OSX laptop with -S, -c 1 and -M prepared (9 runs, removed the
>> three best and three worst):
>> - HEAD: 9356/9343/9369
>> - HEAD + patch: 9433/9413/9461.071168
>> This laptop has a lot of I/O overhead... Still there is a slight
>> improvement here as well. Looking at the progress report, per-second
>> TPS gets easier more frequently into 9500~9600 TPS with the patch. So
>> at least I am seeing something.
>
> Which OSX version exactly?

El Capitan 10.11.6. With -s 20 (300MB) and 1GB of shared_buffers so as
everything is on memory. Actually re-running the tests now with no VMs
around and no apps, I am getting close to 9650~9700TPS with patch, and
9300~9400TPS on HEAD, so that's unlikely only noise.
-- 
Michael



Re: kqueue

From
Matteo Beccati
Date:
Hi,

On 14/09/2016 00:06, Tom Lane wrote:
> I'm inclined to think the kqueue patch is worth applying just on the
> grounds that it makes things better on OS X and doesn't seem to hurt
> on FreeBSD.  Whether anyone would ever get to the point of seeing
> intra-kernel contention on these platforms is hard to predict, but
> we'd be ahead of the curve if so.
>
> It would be good for someone else to reproduce my results though.
> For one thing, 5%-ish is not that far above the noise level; maybe
> what I'm measuring here is just good luck from relocation of critical
> loops into more cache-line-friendly locations.

FWIW, I've tested HEAD vs patch on a 2-cpu low end NetBSD 7.0 i386 machine.

HEAD: 1890/1935/1889 tps
kqueue: 1905/1957/1932 tps

no weird surprises, and basically no differences either.


Cheers
-- 
Matteo Beccati

Development & Consulting - http://www.beccati.com/



Re: kqueue

From
Keith Fiske
Date:

On Wed, Sep 14, 2016 at 9:09 AM, Matteo Beccati <php@beccati.com> wrote:
Hi,

On 14/09/2016 00:06, Tom Lane wrote:
I'm inclined to think the kqueue patch is worth applying just on the
grounds that it makes things better on OS X and doesn't seem to hurt
on FreeBSD.  Whether anyone would ever get to the point of seeing
intra-kernel contention on these platforms is hard to predict, but
we'd be ahead of the curve if so.

It would be good for someone else to reproduce my results though.
For one thing, 5%-ish is not that far above the noise level; maybe
what I'm measuring here is just good luck from relocation of critical
loops into more cache-line-friendly locations.

FWIW, I've tested HEAD vs patch on a 2-cpu low end NetBSD 7.0 i386 machine.

HEAD: 1890/1935/1889 tps
kqueue: 1905/1957/1932 tps

no weird surprises, and basically no differences either.


Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/


Thomas Munro brought up in #postgresql on freenode needing someone to test a patch on a larger FreeBSD server. I've got a pretty decent machine (3.1Ghz Quad Core Xeon E3-1220V3, 16GB ECC RAM, ZFS mirror on WD Red HDD) so offered to give it a try.

Bench setup was:
pgbench -i -s 100 -d postgres

I ran this against 96rc1 instead of HEAD like most of the others in this thread seem to have done. Not sure if that makes a difference and can re-run if needed.
With higher concurrency, this seems to cause decreased performance. You can tell which of the runs is the kqueue patch by looking at the path to pgbench.

SINGLE PROCESS
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S postgres -p 5496                                                
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1547387
latency average: 0.039 ms
tps = 25789.750236 (including connections establishing)
tps = 25791.018293 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1549442
latency average: 0.039 ms
tps = 25823.981255 (including connections establishing)
tps = 25825.189871 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1547936
latency average: 0.039 ms
tps = 25798.572583 (including connections establishing)
tps = 25799.917170 (excluding connections establishing)


[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S postgres -p 5496                                                       
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1520722
latency average: 0.039 ms
tps = 25343.122533 (including connections establishing)
tps = 25344.357116 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S postgres -p 5496~
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1549282
latency average: 0.039 ms
tps = 25821.107595 (including connections establishing)
tps = 25822.407310 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S postgres -p 5496~
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1541907
latency average: 0.039 ms
tps = 25698.025983 (including connections establishing)
tps = 25699.270663 (excluding connections establishing)


FOUR
/home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 4282185
latency average: 0.056 ms
tps = 71369.146931 (including connections establishing)
tps = 71372.646243 (excluding connections establishing)
[keith@corpus ~/postgresql-9.6rc1_kqueue]$ /home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 4777596
latency average: 0.050 ms
tps = 79625.214521 (including connections establishing)
tps = 79629.800123 (excluding connections establishing)
[keith@corpus ~/postgresql-9.6rc1_kqueue]$ /home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 4809132
latency average: 0.050 ms
tps = 80151.803249 (including connections establishing)
tps = 80155.903203 (excluding connections establishing)


/home/keith/pgsql96rc1/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5114286
latency average: 0.047 ms
tps = 85236.858383 (including connections establishing)
tps = 85241.847800 (excluding connections establishing)
/home/keith/pgsql96rc1/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5600194
latency average: 0.043 ms
tps = 93335.508864 (including connections establishing)
tps = 93340.970416 (excluding connections establishing)
/home/keith/pgsql96rc1/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5606962
latency average: 0.043 ms
tps = 93447.905764 (including connections establishing)
tps = 93454.077142 (excluding connections establishing)


SIXTY-FOUR
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S postgres -p 5496                                                          
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4084213
latency average: 0.940 ms
tps = 67633.476871 (including connections establishing)
tps = 67751.865998 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4119994
latency average: 0.932 ms
tps = 68474.847365 (including connections establishing)
tps = 68540.221835 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4068071
latency average: 0.944 ms
tps = 67192.603129 (including connections establishing)
tps = 67254.760177 (excluding connections establishing)


[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S postgres -p 5496                                                                 
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4281302
latency average: 0.897 ms
tps = 70147.847337 (including connections establishing)
tps = 70389.283564 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4573114
latency average: 0.840 ms
tps = 74848.884475 (including connections establishing)
tps = 75102.862539 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4341447
latency average: 0.884 ms
tps = 72350.152281 (including connections establishing)
tps = 72421.831179 (excluding connections establishing)

--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com

Re: kqueue

From
Thomas Munro
Date:
On Thu, Sep 15, 2016 at 10:48 AM, Keith Fiske <keith@omniti.com> wrote:
> Thomas Munro brought up in #postgresql on freenode needing someone to test a
> patch on a larger FreeBSD server. I've got a pretty decent machine (3.1Ghz
> Quad Core Xeon E3-1220V3, 16GB ECC RAM, ZFS mirror on WD Red HDD) so offered
> to give it a try.
>
> Bench setup was:
> pgbench -i -s 100 -d postgres
>
> I ran this against 96rc1 instead of HEAD like most of the others in this
> thread seem to have done. Not sure if that makes a difference and can re-run
> if needed.
> With higher concurrency, this seems to cause decreased performance. You can
> tell which of the runs is the kqueue patch by looking at the path to
> pgbench.

Thanks Keith.  So to summarise, you saw no change with 1 client, but
with 4 clients you saw a significant drop in performance (~93K TPS ->
~80K TPS), and a smaller drop for 64 clients (~72 TPS -> ~68K TPS).
These results seem to be a nail in the coffin for this patch for now.

Thanks to everyone who tested.  I might be back in a later commitfest
if I can figure out why and how to fix it.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: kqueue

From
Thomas Munro
Date:
On Thu, Sep 15, 2016 at 11:04 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Thu, Sep 15, 2016 at 10:48 AM, Keith Fiske <keith@omniti.com> wrote:
>> Thomas Munro brought up in #postgresql on freenode needing someone to test a
>> patch on a larger FreeBSD server. I've got a pretty decent machine (3.1Ghz
>> Quad Core Xeon E3-1220V3, 16GB ECC RAM, ZFS mirror on WD Red HDD) so offered
>> to give it a try.
>>
>> Bench setup was:
>> pgbench -i -s 100 -d postgres
>>
>> I ran this against 96rc1 instead of HEAD like most of the others in this
>> thread seem to have done. Not sure if that makes a difference and can re-run
>> if needed.
>> With higher concurrency, this seems to cause decreased performance. You can
>> tell which of the runs is the kqueue patch by looking at the path to
>> pgbench.
>
> Thanks Keith.  So to summarise, you saw no change with 1 client, but
> with 4 clients you saw a significant drop in performance (~93K TPS ->
> ~80K TPS), and a smaller drop for 64 clients (~72 TPS -> ~68K TPS).
> These results seem to be a nail in the coffin for this patch for now.
>
> Thanks to everyone who tested.  I might be back in a later commitfest
> if I can figure out why and how to fix it.

Ok, here's a version tweaked to use EVFILT_PROC for postmaster death
detection instead of the pipe, as Tom Lane suggested in another
thread[1].

The pipe still exists and is used for PostmasterIsAlive(), and also
for the race case where kevent discovers that the PID doesn't exist
when you try to add it (presumably it died already, but we want to
defer the report of that until you call EventSetWait, so in that case
we stick the traditional pipe into the kqueue set as before so that
it'll fire a readable-because-EOF event then).

Still no change measurable on my laptop.  Keith, would you be able to
test this on your rig and see if it sucks any less than the last one?

[1] https://www.postgresql.org/message-id/13774.1473972000%40sss.pgh.pa.us

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: kqueue

From
Matteo Beccati
Date:
Hi,

On 16/09/2016 05:11, Thomas Munro wrote:
> Still no change measurable on my laptop.  Keith, would you be able to
> test this on your rig and see if it sucks any less than the last one?

I've tested kqueue-v6.patch on the Celeron NetBSD machine and numbers 
were constantly lower by about 5-10% vs fairly recent HEAD (same as my 
last pgbench runs).


Cheers
-- 
Matteo Beccati

Development & Consulting - http://www.beccati.com/



Re: kqueue

From
Keith Fiske
Date:


On Thu, Sep 15, 2016 at 11:11 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Thu, Sep 15, 2016 at 11:04 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Thu, Sep 15, 2016 at 10:48 AM, Keith Fiske <keith@omniti.com> wrote:
>> Thomas Munro brought up in #postgresql on freenode needing someone to test a
>> patch on a larger FreeBSD server. I've got a pretty decent machine (3.1Ghz
>> Quad Core Xeon E3-1220V3, 16GB ECC RAM, ZFS mirror on WD Red HDD) so offered
>> to give it a try.
>>
>> Bench setup was:
>> pgbench -i -s 100 -d postgres
>>
>> I ran this against 96rc1 instead of HEAD like most of the others in this
>> thread seem to have done. Not sure if that makes a difference and can re-run
>> if needed.
>> With higher concurrency, this seems to cause decreased performance. You can
>> tell which of the runs is the kqueue patch by looking at the path to
>> pgbench.
>
> Thanks Keith.  So to summarise, you saw no change with 1 client, but
> with 4 clients you saw a significant drop in performance (~93K TPS ->
> ~80K TPS), and a smaller drop for 64 clients (~72 TPS -> ~68K TPS).
> These results seem to be a nail in the coffin for this patch for now.
>
> Thanks to everyone who tested.  I might be back in a later commitfest
> if I can figure out why and how to fix it.

Ok, here's a version tweaked to use EVFILT_PROC for postmaster death
detection instead of the pipe, as Tom Lane suggested in another
thread[1].

The pipe still exists and is used for PostmasterIsAlive(), and also
for the race case where kevent discovers that the PID doesn't exist
when you try to add it (presumably it died already, but we want to
defer the report of that until you call EventSetWait, so in that case
we stick the traditional pipe into the kqueue set as before so that
it'll fire a readable-because-EOF event then).

Still no change measurable on my laptop.  Keith, would you be able to
test this on your rig and see if it sucks any less than the last one?

[1] https://www.postgresql.org/message-id/13774.1473972000%40sss.pgh.pa.us


Ran benchmarks on unaltered 96rc1 again just to be safe. Those are first. Decided to throw a 32 process test in there as well to see if there's anything going on between 4 and 64

~/pgsql96rc1/bin/pgbench -i -s 100 -d pgbench -p 5496

[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1543809
latency average: 0.039 ms
tps = 25729.749474 (including connections establishing)
tps = 25731.006414 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1548340
latency average: 0.039 ms
tps = 25796.928387 (including connections establishing)
tps = 25798.275891 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1535072
latency average: 0.039 ms
tps = 25584.182830 (including connections establishing)
tps = 25585.487246 (excluding connections establishing)

[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5621013
latency average: 0.043 ms
tps = 93668.594248 (including connections establishing)
tps = 93674.730914 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5659929
latency average: 0.042 ms
tps = 94293.572928 (including connections establishing)
tps = 94300.500395 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5649572
latency average: 0.042 ms
tps = 94115.854165 (including connections establishing)
tps = 94123.436211 (excluding connections establishing)

[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 5196336
latency average: 0.369 ms
tps = 86570.696138 (including connections establishing)
tps = 86608.648579 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 5202443
latency average: 0.369 ms
tps = 86624.724577 (including connections establishing)
tps = 86664.848857 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 5198412
latency average: 0.369 ms
tps = 86637.730825 (including connections establishing)
tps = 86668.706105 (excluding connections establishing)

[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4790285
latency average: 0.802 ms
tps = 79800.369679 (including connections establishing)
tps = 79941.243428 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4852921
latency average: 0.791 ms
tps = 79924.873678 (including connections establishing)
tps = 80179.182200 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4672965
latency average: 0.822 ms
tps = 77871.911528 (including connections establishing)
tps = 77961.614345 (excluding connections establishing)



~/pgsql96rc1_kqueue_v6/bin/pgbench -i -s 100 -d pgbench -p 5496

Ran more than 3 times on occasion since results were coming out differently by larger than expected values sometimes. Probably just something else running on the server at the time.

Again, no real noticeable difference for single process
For 4 processes, things are mostly the same and only very, very slightly lower, which is better than before.
For thirty-two processes, I saw a slight increase in performance for v6.
But, again, for 64 the results were slightly worse. Although the last run did almost match, most runs were lower. They're better than they were last time, but still not as good as the unchanged 96rc1

I can try running against HEAD if you'd like.

SINGLE
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1508745
latency average: 0.040 ms
tps = 25145.524948 (including connections establishing)
tps = 25146.433564 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1346454
latency average: 0.045 ms
tps = 22440.692798 (including connections establishing)
tps = 22441.527989 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1426906
latency average: 0.042 ms
tps = 23781.710780 (including connections establishing)
tps = 23782.523744 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1546252
latency average: 0.039 ms
tps = 25770.468513 (including connections establishing)
tps = 25771.352027 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1542366
latency average: 0.039 ms
tps = 25705.706274 (including connections establishing)
tps = 25706.577285 (excluding connections establishing)

FOUR
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5606159
latency average: 0.043 ms
tps = 93435.464767 (including connections establishing)
tps = 93442.716270 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5602564
latency average: 0.043 ms
tps = 93375.528201 (including connections establishing)
tps = 93381.999147 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5608675
latency average: 0.043 ms
tps = 93474.081114 (including connections establishing)
tps = 93481.634509 (excluding connections establishing)

THIRTY-TWO
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 5273952
latency average: 0.364 ms
tps = 87855.483112 (including connections establishing)
tps = 87880.762662 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 5294039
latency average: 0.363 ms
tps = 88126.254862 (including connections establishing)
tps = 88151.282371 (excluding connections establishing)
[keith@corpus ~]$
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 5279444
latency average: 0.364 ms
tps = 87867.500628 (including connections establishing)
tps = 87891.856414 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 5286405
latency average: 0.363 ms
tps = 88049.742194 (including connections establishing)
tps = 88077.409809 (excluding connections establishing)

SIXTY-FOUR
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4426565
latency average: 0.867 ms
tps = 72142.306576 (including connections establishing)
tps = 72305.201516 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4070048
latency average: 0.943 ms
tps = 66587.264608 (including connections establishing)
tps = 66711.820878 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4478535
latency average: 0.857 ms
tps = 72768.961061 (including connections establishing)
tps = 72930.488922 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4051086
latency average: 0.948 ms
tps = 66540.741821 (including connections establishing)
tps = 66601.943062 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4374049
latency average: 0.878 ms
tps = 72093.025134 (including connections establishing)
tps = 72271.145559 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4762663
latency average: 0.806 ms
tps = 79372.610362 (including connections establishing)
tps = 79535.601194 (excluding connections establishing)


As a sanity check I went back and ran the pgbench from the v5 patch to see if it was still lower. It is. So v6 seems to have a slight improvement in some cases.

[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v5/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 4618814
latency average: 0.416 ms
tps = 76960.608378 (including connections establishing)
tps = 76981.609781 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v5/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 4649745
latency average: 0.413 ms
tps = 77491.094077 (including connections establishing)
tps = 77525.443941 (excluding connections establishing)
 

Re: kqueue

From
Thomas Munro
Date:
On Thu, Sep 29, 2016 at 9:09 AM, Keith Fiske <keith@omniti.com> wrote:
> On Thu, Sep 15, 2016 at 11:11 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> Ok, here's a version tweaked to use EVFILT_PROC for postmaster death
>> detection instead of the pipe, as Tom Lane suggested in another
>> thread[1].
>>
>> [...]
>
> Ran benchmarks on unaltered 96rc1 again just to be safe. Those are first.
> Decided to throw a 32 process test in there as well to see if there's
> anything going on between 4 and 64

Thanks!  A summary:

┌──────────────────┬─────────┬───────────┬────────────────────┬───────────┐
│       code       │ clients │  average  │ standard_deviation │  median   │
├──────────────────┼─────────┼───────────┼────────────────────┼───────────┤
│ 9.6rc1           │       1 │ 25704.923 │            108.766 │ 25731.006 │
│ 9.6rc1           │       4 │ 94032.889 │            322.562 │ 94123.436 │
│ 9.6rc1           │      32 │ 86647.401 │             33.616 │ 86664.849 │
│ 9.6rc1           │      64 │ 79360.680 │           1217.453 │ 79941.243 │
│ 9.6rc1/kqueue-v6 │       1 │ 24569.683 │           1433.339 │ 25146.434 │
│ 9.6rc1/kqueue-v6 │       4 │ 93435.450 │             50.214 │ 93442.716 │
│ 9.6rc1/kqueue-v6 │      32 │ 88000.328 │            135.143 │ 87891.856 │
│ 9.6rc1/kqueue-v6 │      64 │ 71726.034 │           4784.794 │ 72271.146 │
└──────────────────┴─────────┴───────────┴────────────────────┴───────────┘

┌─────────┬───────────┬───────────┬──────────────────────────┐
│ clients │ unpatched │  patched  │      percent_change      │
├─────────┼───────────┼───────────┼──────────────────────────┤
│       1 │ 25731.006 │ 25146.434 │ -2.271858317548874692000 │
│       4 │ 94123.436 │ 93442.716 │ -0.723220516514080510000 │
│      32 │ 86664.849 │ 87891.856 │  1.415807001521458833000 │
│      64 │ 79941.243 │ 72271.146 │ -9.594668173973727179000 │
└─────────┴───────────┴───────────┴──────────────────────────┘

The variation in the patched 64 client numbers is quite large, ranging
from ~66.5k to ~79.5k.  The highest number matched the unpatched
numbers which ranged 77.9k to 80k.  I wonder if that is noise and we
need to run longer (in which case the best outcome might be 'this
patch is neutral on FreeBSD'), or if something the patch does is doing
is causing that (for example maybe EVFILT_PROC proc filters causes
contention on the process table lock).

Matteo's results with the v6 patch on a low end NetBSD machine were
not good.  But the report at [1] implies that larger NetBSD and
OpenBSD systems have terrible problems with the
poll-postmaster-alive-pipe approach, which this EVFILT_PROC approach
would seem to address pretty well.

It's difficult to draw any conclusions at this point.

[1] https://www.postgresql.org/message-id/flat/20160915135755.GC19008%40genua.de

-- 
Thomas Munro
http://www.enterprisedb.com

Re: kqueue

From
Torsten Zuehlsdorff
Date:
On 28.09.2016 23:39, Thomas Munro wrote:
> On Thu, Sep 29, 2016 at 9:09 AM, Keith Fiske <keith@omniti.com> wrote:
>> On Thu, Sep 15, 2016 at 11:11 PM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>>> Ok, here's a version tweaked to use EVFILT_PROC for postmaster death
>>> detection instead of the pipe, as Tom Lane suggested in another
>>> thread[1].
>>>
>>> [...]
>>
>> Ran benchmarks on unaltered 96rc1 again just to be safe. Those are first.
>> Decided to throw a 32 process test in there as well to see if there's
>> anything going on between 4 and 64
>
> Thanks!  A summary:
>
> [summary]
>
> The variation in the patched 64 client numbers is quite large, ranging
> from ~66.5k to ~79.5k.  The highest number matched the unpatched
> numbers which ranged 77.9k to 80k.  I wonder if that is noise and we
> need to run longer (in which case the best outcome might be 'this
> patch is neutral on FreeBSD'), or if something the patch does is doing
> is causing that (for example maybe EVFILT_PROC proc filters causes
> contention on the process table lock).
>
> [..]
>
> It's difficult to draw any conclusions at this point.

I'm currently setting up a new FreeBSD machine. Its a FreeBSD 11 with 
ZFS, 64 GB RAM and Quad Core. If you're interested in i can give you 
access for more tests this week. Maybe this will help to draw any 
conclusion.

Greetings,
Torsten



Re: [HACKERS] kqueue

From
Thomas Munro
Date:
On Tue, Oct 11, 2016 at 8:08 PM, Torsten Zuehlsdorff
<mailinglists@toco-domains.de> wrote:
> On 28.09.2016 23:39, Thomas Munro wrote:
>> It's difficult to draw any conclusions at this point.
>
> I'm currently setting up a new FreeBSD machine. Its a FreeBSD 11 with ZFS,
> 64 GB RAM and Quad Core. If you're interested in i can give you access for
> more tests this week. Maybe this will help to draw any conclusion.

I don't plan to resubmit this patch myself, but I was doing some
spring cleaning and rebasing today and I figured it might be worth
quietly leaving a working patch here just in case anyone from the
various BSD communities is interested in taking the idea further.

Some thoughts:  We could decide to make it the default on FooBSD but
not BarBSD according to experimental results... for example several
people reported that macOS developer machines run pgbench a bit
faster.  Also, we didn't ever get to the bottom of the complaint that
NetBSD and OpenBSD systems wake up every waiting backend when anyone
calls PostmasterIsAlive[1], which this patch should in theory fix (by
using EVFILT_PROC instead of waiting on that pipe).  On the other
hand, the fix for that may be to stop calling PostmasterIsAlive in
loops[2]!

[1] https://www.postgresql.org/message-id/CAEepm%3D27K-2AP1th97kiVvKpTuria9ocbjT0cXCJqnt4if5rJQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAEepm%3D3FW33PeRxt0jE4N0truJqOepp72R6W-zyM5mu1bxnZRw%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] kqueue

From
Thomas Munro
Date:
On Thu, Jun 22, 2017 at 7:19 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> I don't plan to resubmit this patch myself, but I was doing some
> spring cleaning and rebasing today and I figured it might be worth
> quietly leaving a working patch here just in case anyone from the
> various BSD communities is interested in taking the idea further.

Since there was a mention of kqueue on -hackers today, here's another
rebase.  I got curious just now and ran a very quick test on an AWS 64
vCPU m4.16xlarge instance running image "FreeBSD
11.1-STABLE-amd64-2017-08-08 - ami-00608178".  I set shared_buffers =
10GB and ran pgbench approximately the same way Heikki and Keith did
upthread:

pgbench -i -s 200 postgres
pgbench -M prepared  -j 6 -c 6 -S postgres -T60 -P1
pgbench -M prepared  -j 12 -c 12 -S postgres -T60 -P1
pgbench -M prepared  -j 24 -c 24 -S postgres -T60 -P1
pgbench -M prepared  -j 36 -c 36 -S postgres -T60 -P1
pgbench -M prepared  -j 48 -c 48 -S postgres -T60 -P1

The TPS numbers I got (including connections establishing) were:

clients    master    patched
      6   146,215    147,535 (+0.9%)
     12   273,056    280,505 (+2.7%)
     24   360,751    369,965 (+2.5%)
     36   413,147    420,769 (+1.8%)
     48   416,189    444,537 (+6.8%)

The patch appears to be doing something positive on this particular
system and that effect was stable over a few runs.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: [HACKERS] kqueue

From
Thomas Munro
Date:
On Wed, Dec 6, 2017 at 12:53 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Thu, Jun 22, 2017 at 7:19 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> I don't plan to resubmit this patch myself, but I was doing some
>> spring cleaning and rebasing today and I figured it might be worth
>> quietly leaving a working patch here just in case anyone from the
>> various BSD communities is interested in taking the idea further.

I heard through the grapevine of some people currently investigating
performance problems on busy FreeBSD systems, possibly related to the
postmaster pipe.  I suspect this patch might be a part of the solution
(other patches probably needed to get maximum value out of this patch:
reuse WaitEventSet objects in some key places, and get rid of high
frequency PostmasterIsAlive() read() calls).  The autoconf-fu in the
last version bit-rotted so it seemed like a good time to post a
rebased patch.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: [HACKERS] kqueue

From
Thomas Munro
Date:
On Wed, Apr 11, 2018 at 1:05 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> I heard through the grapevine of some people currently investigating
> performance problems on busy FreeBSD systems, possibly related to the
> postmaster pipe.  I suspect this patch might be a part of the solution
> (other patches probably needed to get maximum value out of this patch:
> reuse WaitEventSet objects in some key places, and get rid of high
> frequency PostmasterIsAlive() read() calls).  The autoconf-fu in the
> last version bit-rotted so it seemed like a good time to post a
> rebased patch.

Once I knew how to get a message resent to someone who wasn't
subscribed to our mailing list at the time it was sent[1] so they
could join an existing thread.  I don't know how to do that with the
new mailing list software, so I'm CC'ing Mateusz so he can share his
results on-thread.  Sorry for the noise.

[1] https://www.postgresql.org/message-id/CAEepm=0-KsV4Sj-0Qd4rMCg7UYdOQA=TUjLkEZOX7h_qiQQaCA@mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com


Re: [HACKERS] kqueue

From
Mateusz Guzik
Date:
On Mon, May 21, 2018 at 9:03 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Wed, Apr 11, 2018 at 1:05 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> I heard through the grapevine of some people currently investigating
> performance problems on busy FreeBSD systems, possibly related to the
> postmaster pipe.  I suspect this patch might be a part of the solution
> (other patches probably needed to get maximum value out of this patch:
> reuse WaitEventSet objects in some key places, and get rid of high
> frequency PostmasterIsAlive() read() calls).  The autoconf-fu in the
> last version bit-rotted so it seemed like a good time to post a
> rebased patch.


Hi everyone,

I have benchmarked the change on a FreeBSD box and found an big
performance win once the number of clients goes beyond the number of
hardware threads on the target machine. For smaller number of clients
the win was very modest.

The test was performed few weeks ago.

For convenience PostgreSQL 10.3 as found in the ports tree was used.

3 variants were tested:
- stock 10.3
- stock 10.3 + pdeathsig
- stock 10.3 + pdeathsig + kqueue

Appropriate patches were provided by Thomas.

In order to keep this message PG-13 I'm not going to show the actual
script, but a mere outline:

for i in $(seq 1 10): do
        for t in vanilla pdeathsig pdeathsig_kqueue; do
                start up the relevant version
                for c in 32 64 96; do
                        pgbench -j 96 -c $c -T 120 -M prepared -S -U bench -h 172.16.0.2 -P1 bench > ${t}-${c}-out-warmup 2>&1
                        pgbench -j 96 -c $c -T 120 -M prepared -S -U bench -h 172.16.0.2 -P1 bench > ${t}-${c}-out 2>&1
                done
                shutdown the relevant version
done

Data from the warmup is not used. All the data was pre-read prior to the
test.

PostgreSQL was configured with 32GB of shared buffers and 200 max
connections, otherwise it was the default.

The server is:
Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz
2 package(s) x 8 core(s) x 2 hardware threads

i.e. 32 threads in total.

running FreeBSD -head with 'options NUMA' in kernel config and
sysctl net.inet.tcp.per_cpu_timers=1 on top of zfs.

The load was generated from a different box over a 100Gbit ethernet link.

x cumulative-tps-vanilla-32
+ cumulative-tps-pdeathsig-32
* cumulative-tps-pdeathsig_kqueue-32
+------------------------------------------------------------------------+
|+   + x+*     x+  *  x       *        + * *       * * **  *  **        *|
|   |_____|__M_A___M_A_____|____|             |________MA________|       |
+------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  10     442898.77     448476.81     444805.17     445062.08     1679.7169
+  10      442057.2     447835.46     443840.28     444235.01     1771.2254
No difference proven at 95.0% confidence
*  10     448138.07     452786.41     450274.56     450311.51     1387.2927
Difference at 95.0% confidence
        5249.43 +/- 1447.41
        1.17948% +/- 0.327501%
        (Student's t, pooled s = 1540.46)
x cumulative-tps-vanilla-64
+ cumulative-tps-pdeathsig-64
* cumulative-tps-pdeathsig_kqueue-64
+------------------------------------------------------------------------+
|                                                                     ** |
|                                                                     ** |
|  xx  x +                                                            ***|
|++**x *+*++                                                          ***|
|  ||_A|M_|                                                           |A |
+------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  10     411849.26      422145.5     416043.77      416061.9     3763.2545
+  10     407123.74     425727.84     419908.73      417480.7     6817.5549
No difference proven at 95.0% confidence
*  10     542032.71     546106.93     543948.05     543874.06     1234.1788
Difference at 95.0% confidence
        127812 +/- 2631.31
        30.7195% +/- 0.809892%
        (Student's t, pooled s = 2800.47)
x cumulative-tps-vanilla-96
+ cumulative-tps-pdeathsig-96
* cumulative-tps-pdeathsig_kqueue-96
+------------------------------------------------------------------------+
|                                                                      * |
|                                                                      * |
|                                                                      * |
|                                                                      * |
|  + x                                                                 * |
|  *xxx+                                                               **|
|+ *****+                                                            * **|
|  |MA||                                                              |A||
+------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x  10      325263.7        336338     332399.16     331321.82     3571.2478
+  10     321213.33     338669.66     329553.78     330903.58      5652.008
No difference proven at 95.0% confidence
*  10     503877.22     511449.96     508708.41     508808.51     2016.9483
Difference at 95.0% confidence
        177487 +/- 2724.98
        53.5693% +/- 1.17178%
        (Student's t, pooled s = 2900.16)


--
Mateusz Guzik <mjguzik gmail.com>

Re: [HACKERS] kqueue

From
Thomas Munro
Date:
On Mon, May 21, 2018 at 7:27 PM, Mateusz Guzik <mjguzik@gmail.com> wrote:
> I have benchmarked the change on a FreeBSD box and found an big
> performance win once the number of clients goes beyond the number of
> hardware threads on the target machine. For smaller number of clients
> the win was very modest.

Thanks for the report!  This is good news for the patch, if we can
explain a few mysteries.

> 3 variants were tested:
> - stock 10.3
> - stock 10.3 + pdeathsig
> - stock 10.3 + pdeathsig + kqueue

For the record, "pdeathsig" refers to another patch of mine[1] that is
not relevant to this test (it's a small change in the recovery loop,
important for replication but not even reached here).

> [a bunch of neat output from ministat]

So to summarise your results:

32 connections: ~445k -> ~450k = +1.2%
64 connections: ~416k -> ~544k = +30.7%
96 connections: ~331k -> ~508k = +53.6%

As you added more connections above your thread count, stock 10.3's
TPS number went down, but with the patch it went up.  So now we have
to explain why you see a huge performance boost but others reported a
modest gain or in some cases loss.  The main things that jump out:

1.  You used TCP sockets and ran pgbench on another machine, while
others used Unix domain sockets.
2.  You're running a newer/bleeding edge kernel.
3.  You used more CPUs than most reporters.

For the record, Mateusz and others discovered some fixable global lock
contention in the Unix domain socket layer that is now being hacked
on[2], though it's not clear if that'd affect the results reported
earlier or not.

[1] https://www.postgresql.org/message-id/CAEepm%3D0w9AAHAH73-tkZ8VS2Lg6JzY4ii3TG7t-R%2B_MWyUAk9g%40mail.gmail.com
[2] https://reviews.freebsd.org/D15430

-- 
Thomas Munro
http://www.enterprisedb.com


Re: [HACKERS] kqueue

From
Thomas Munro
Date:
On Tue, May 22, 2018 at 12:07 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Mon, May 21, 2018 at 7:27 PM, Mateusz Guzik <mjguzik@gmail.com> wrote:
> > I have benchmarked the change on a FreeBSD box and found an big
> > performance win once the number of clients goes beyond the number of
> > hardware threads on the target machine. For smaller number of clients
> > the win was very modest.
>
> So to summarise your results:
>
> 32 connections: ~445k -> ~450k = +1.2%
> 64 connections: ~416k -> ~544k = +30.7%
> 96 connections: ~331k -> ~508k = +53.6%

I would like to commit this patch for PostgreSQL 12, based on this
report.  We know it helps performance on macOS developer machines and
big FreeBSD servers, and it is the right kernel interface for the job
on principle.  Matteo Beccati reported a 5-10% performance drop on a
low-end Celeron NetBSD box which we have no explanation for, and we
have no reports from server-class machines on that OS -- so perhaps we
(or the NetBSD port?) should consider building with WAIT_USE_POLL on
NetBSD until someone can figure out what needs to be fixed there
(possibly on the NetBSD side)?

Here's a rebased patch, which I'm adding to the to November CF to give
people time to retest, object, etc if they want to.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: [HACKERS] kqueue

From
Andres Freund
Date:
Hi,

On 2018-09-28 10:55:13 +1200, Thomas Munro wrote:
> On Tue, May 22, 2018 at 12:07 PM Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
> > On Mon, May 21, 2018 at 7:27 PM, Mateusz Guzik <mjguzik@gmail.com> wrote:
> > > I have benchmarked the change on a FreeBSD box and found an big
> > > performance win once the number of clients goes beyond the number of
> > > hardware threads on the target machine. For smaller number of clients
> > > the win was very modest.
> >
> > So to summarise your results:
> >
> > 32 connections: ~445k -> ~450k = +1.2%
> > 64 connections: ~416k -> ~544k = +30.7%
> > 96 connections: ~331k -> ~508k = +53.6%
> 
> I would like to commit this patch for PostgreSQL 12, based on this
> report.  We know it helps performance on macOS developer machines and
> big FreeBSD servers, and it is the right kernel interface for the job
> on principle.

Seems reasonable.


> Matteo Beccati reported a 5-10% performance drop on a
> low-end Celeron NetBSD box which we have no explanation for, and we
> have no reports from server-class machines on that OS -- so perhaps we
> (or the NetBSD port?) should consider building with WAIT_USE_POLL on
> NetBSD until someone can figure out what needs to be fixed there
> (possibly on the NetBSD side)?

Yea, I'm not too worried about that. It'd be great to test that, but
otherwise I'm also ok to just plonk that into the template.

> @@ -576,6 +592,10 @@ CreateWaitEventSet(MemoryContext context, int nevents)
>      if (fcntl(set->epoll_fd, F_SETFD, FD_CLOEXEC) == -1)
>          elog(ERROR, "fcntl(F_SETFD) failed on epoll descriptor: %m");
>  #endif                            /* EPOLL_CLOEXEC */
> +#elif defined(WAIT_USE_KQUEUE)
> +    set->kqueue_fd = kqueue();
> +    if (set->kqueue_fd < 0)
> +        elog(ERROR, "kqueue failed: %m");
>  #elif defined(WAIT_USE_WIN32)

Is this automatically opened with some FD_CLOEXEC equivalent?


> +static inline void
> +WaitEventAdjustKqueueAdd(struct kevent *k_ev, int filter, int action,
> +                         WaitEvent *event)
> +{
> +    k_ev->ident = event->fd;
> +    k_ev->filter = filter;
> +    k_ev->flags = action | EV_CLEAR;
> +    k_ev->fflags = 0;
> +    k_ev->data = 0;
> +
> +    /*
> +     * On most BSD family systems, udata is of type void * so we could simply
> +     * assign event to it without casting, or use the EV_SET macro instead of
> +     * filling in the struct manually.  Unfortunately, NetBSD and possibly
> +     * others have it as intptr_t, so here we wallpaper over that difference
> +     * with an unsightly lvalue cast.
> +     */
> +    *((WaitEvent **)(&k_ev->udata)) = event;

I'm mildly inclined to hide that behind a macro, so the other places
have a reference, via the macro definition, to this too.

> +    if (rc < 0 && event->events == WL_POSTMASTER_DEATH && errno == ESRCH)
> +    {
> +        /*
> +         * The postmaster is already dead.  Defer reporting this to the caller
> +         * until wait time, for compatibility with the other implementations.
> +         * To do that we will now add the regular alive pipe.
> +         */
> +        WaitEventAdjustKqueueAdd(&k_ev[0], EVFILT_READ, EV_ADD, event);
> +        rc = kevent(set->kqueue_fd, &k_ev[0], count, NULL, 0, NULL);
> +    }

That's, ... not particulary pretty. Kinda wonder if we shouldn't instead
just add a 'pending_events' field, that we can check at wait time.

> diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
> index 90dda8ea050..4bcabc3b381 100644
> --- a/src/include/pg_config.h.in
> +++ b/src/include/pg_config.h.in
> @@ -330,6 +330,9 @@
>  /* Define to 1 if you have isinf(). */
>  #undef HAVE_ISINF
>  
> +/* Define to 1 if you have the `kqueue' function. */
> +#undef HAVE_KQUEUE
> +
>  /* Define to 1 if you have the <langinfo.h> header file. */
>  #undef HAVE_LANGINFO_H
>  
> @@ -598,6 +601,9 @@
>  /* Define to 1 if you have the <sys/epoll.h> header file. */
>  #undef HAVE_SYS_EPOLL_H
>  
> +/* Define to 1 if you have the <sys/event.h> header file. */
> +#undef HAVE_SYS_EVENT_H
> +
>  /* Define to 1 if you have the <sys/ipc.h> header file. */
>  #undef HAVE_SYS_IPC_H

Should adjust pg_config.win32.h too.

Greetings,

Andres Freund


Re: [HACKERS] kqueue

From
Matteo Beccati
Date:
Hi Thomas,

On 28/09/2018 00:55, Thomas Munro wrote:
> I would like to commit this patch for PostgreSQL 12, based on this
> report.  We know it helps performance on macOS developer machines and
> big FreeBSD servers, and it is the right kernel interface for the job
> on principle.  Matteo Beccati reported a 5-10% performance drop on a
> low-end Celeron NetBSD box which we have no explanation for, and we
> have no reports from server-class machines on that OS -- so perhaps we
> (or the NetBSD port?) should consider building with WAIT_USE_POLL on
> NetBSD until someone can figure out what needs to be fixed there
> (possibly on the NetBSD side)?

Thanks for keeping me in the loop.

Out of curiosity (and time permitting) I'll try to spin up a NetBSD 8 VM
and run some benchmarks, but I guess we should leave it up to the pkgsrc
people to eventually change the build flags.


Cheers
-- 
Matteo Beccati

Development & Consulting - http://www.beccati.com/


Re: [HACKERS] kqueue

From
Thomas Munro
Date:
On Fri, Sep 28, 2018 at 11:09 AM Andres Freund <andres@anarazel.de> wrote:
> On 2018-09-28 10:55:13 +1200, Thomas Munro wrote:
> > Matteo Beccati reported a 5-10% performance drop on a
> > low-end Celeron NetBSD box which we have no explanation for, and we
> > have no reports from server-class machines on that OS -- so perhaps we
> > (or the NetBSD port?) should consider building with WAIT_USE_POLL on
> > NetBSD until someone can figure out what needs to be fixed there
> > (possibly on the NetBSD side)?
>
> Yea, I'm not too worried about that. It'd be great to test that, but
> otherwise I'm also ok to just plonk that into the template.

Thanks for the review!  Ok, if we don't get a better idea I'll put
this in src/template/netbsd:

CPPFLAGS="$CPPFLAGS -DWAIT_USE_POLL"

> > @@ -576,6 +592,10 @@ CreateWaitEventSet(MemoryContext context, int nevents)
> >       if (fcntl(set->epoll_fd, F_SETFD, FD_CLOEXEC) == -1)
> >               elog(ERROR, "fcntl(F_SETFD) failed on epoll descriptor: %m");
> >  #endif                                                       /* EPOLL_CLOEXEC */
> > +#elif defined(WAIT_USE_KQUEUE)
> > +     set->kqueue_fd = kqueue();
> > +     if (set->kqueue_fd < 0)
> > +             elog(ERROR, "kqueue failed: %m");
> >  #elif defined(WAIT_USE_WIN32)
>
> Is this automatically opened with some FD_CLOEXEC equivalent?

No.  Hmm, I thought it wasn't necessary because kqueue descriptors are
not inherited and backends don't execve() directly without forking,
but I guess it can't hurt to add a fcntl() call.  Done.

> > +     *((WaitEvent **)(&k_ev->udata)) = event;
>
> I'm mildly inclined to hide that behind a macro, so the other places
> have a reference, via the macro definition, to this too.

Done.

> > +     if (rc < 0 && event->events == WL_POSTMASTER_DEATH && errno == ESRCH)
> > +     {
> > +             /*
> > +              * The postmaster is already dead.  Defer reporting this to the caller
> > +              * until wait time, for compatibility with the other implementations.
> > +              * To do that we will now add the regular alive pipe.
> > +              */
> > +             WaitEventAdjustKqueueAdd(&k_ev[0], EVFILT_READ, EV_ADD, event);
> > +             rc = kevent(set->kqueue_fd, &k_ev[0], count, NULL, 0, NULL);
> > +     }
>
> That's, ... not particulary pretty. Kinda wonder if we shouldn't instead
> just add a 'pending_events' field, that we can check at wait time.

Done.

> > +/* Define to 1 if you have the `kqueue' function. */
> > +#undef HAVE_KQUEUE
> > +

> Should adjust pg_config.win32.h too.

Done.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: [HACKERS] kqueue

From
Matteo Beccati
Date:
On 28/09/2018 14:19, Thomas Munro wrote:
> On Fri, Sep 28, 2018 at 11:09 AM Andres Freund <andres@anarazel.de> wrote:
>> On 2018-09-28 10:55:13 +1200, Thomas Munro wrote:
>>> Matteo Beccati reported a 5-10% performance drop on a
>>> low-end Celeron NetBSD box which we have no explanation for, and we
>>> have no reports from server-class machines on that OS -- so perhaps we
>>> (or the NetBSD port?) should consider building with WAIT_USE_POLL on
>>> NetBSD until someone can figure out what needs to be fixed there
>>> (possibly on the NetBSD side)?
>>
>> Yea, I'm not too worried about that. It'd be great to test that, but
>> otherwise I'm also ok to just plonk that into the template.
> 
> Thanks for the review!  Ok, if we don't get a better idea I'll put
> this in src/template/netbsd:
> 
> CPPFLAGS="$CPPFLAGS -DWAIT_USE_POLL"

A quick test on a 8 vCPU / 4GB RAM virtual machine running a fresh
install of NetBSD 8.0 again shows that kqueue is consistently slower
running pgbench vs unpatched master on tcp-b like pgbench workloads:

~1200tps vs ~1400tps w/ 96 clients and threads, scale factor 10

while on select only benchmarks the difference is below the noise floor,
with both doing roughly the same ~30k tps.

Out of curiosity, I've installed FreBSD on an identically specced VM,
and the select benchmark was ~75k tps for kqueue vs ~90k tps on
unpatched master, so maybe there's something wrong I'm doing when
benchmarking. Could you please provide proper instructions?


Cheers
-- 
Matteo Beccati

Development & Consulting - http://www.beccati.com/


Re: [HACKERS] kqueue

From
Thomas Munro
Date:
On Sat, Sep 29, 2018 at 7:51 PM Matteo Beccati <php@beccati.com> wrote:
> On 28/09/2018 14:19, Thomas Munro wrote:
> > On Fri, Sep 28, 2018 at 11:09 AM Andres Freund <andres@anarazel.de> wrote:
> >> On 2018-09-28 10:55:13 +1200, Thomas Munro wrote:
> >>> Matteo Beccati reported a 5-10% performance drop on a
> >>> low-end Celeron NetBSD box which we have no explanation for, and we
> >>> have no reports from server-class machines on that OS -- so perhaps we
> >>> (or the NetBSD port?) should consider building with WAIT_USE_POLL on
> >>> NetBSD until someone can figure out what needs to be fixed there
> >>> (possibly on the NetBSD side)?
> >>
> >> Yea, I'm not too worried about that. It'd be great to test that, but
> >> otherwise I'm also ok to just plonk that into the template.
> >
> > Thanks for the review!  Ok, if we don't get a better idea I'll put
> > this in src/template/netbsd:
> >
> > CPPFLAGS="$CPPFLAGS -DWAIT_USE_POLL"
>
> A quick test on a 8 vCPU / 4GB RAM virtual machine running a fresh
> install of NetBSD 8.0 again shows that kqueue is consistently slower
> running pgbench vs unpatched master on tcp-b like pgbench workloads:
>
> ~1200tps vs ~1400tps w/ 96 clients and threads, scale factor 10
>
> while on select only benchmarks the difference is below the noise floor,
> with both doing roughly the same ~30k tps.
>
> Out of curiosity, I've installed FreBSD on an identically specced VM,
> and the select benchmark was ~75k tps for kqueue vs ~90k tps on
> unpatched master, so maybe there's something wrong I'm doing when
> benchmarking. Could you please provide proper instructions?

Ouch.  What kind of virtualisation is this?  Which version of FreeBSD?
 Not sure if it's relevant, but do you happen to see gettimeofday()
showing up as a syscall, if you truss a backend running pgbench?

-- 
Thomas Munro
http://www.enterprisedb.com


Re: [HACKERS] kqueue

From
Matteo Beccati
Date:
Hi Thomas,

On 30/09/2018 04:36, Thomas Munro wrote:
> On Sat, Sep 29, 2018 at 7:51 PM Matteo Beccati <php@beccati.com> wrote:
>> Out of curiosity, I've installed FreBSD on an identically specced VM,
>> and the select benchmark was ~75k tps for kqueue vs ~90k tps on
>> unpatched master, so maybe there's something wrong I'm doing when
>> benchmarking. Could you please provide proper instructions?
> 
> Ouch.  What kind of virtualisation is this?  Which version of FreeBSD?
>  Not sure if it's relevant, but do you happen to see gettimeofday()
> showing up as a syscall, if you truss a backend running pgbench?

I downloaded 11.2 as VHD file in order to run on MS Hyper-V / Win10 Pro.

Yes, I saw plenty of gettimeofday calls when running truss:

> gettimeofday({ 1538297117.071344 },0x0)          = 0 (0x0)
> gettimeofday({ 1538297117.071743 },0x0)          = 0 (0x0)
> gettimeofday({ 1538297117.072021 },0x0)          = 0 (0x0)
> getpid()                                         = 766 (0x2fe)
> __sysctl(0x7fffffffce90,0x4,0x0,0x0,0x801891000,0x2b) = 0 (0x0)
> gettimeofday({ 1538297117.072944 },0x0)          = 0 (0x0)
> getpid()                                         = 766 (0x2fe)
> __sysctl(0x7fffffffce90,0x4,0x0,0x0,0x801891000,0x29) = 0 (0x0)
> gettimeofday({ 1538297117.073682 },0x0)          = 0 (0x0)
> sendto(9,"2\0\0\0\^DT\0\0\0!\0\^Aabalance"...,71,0,NULL,0) = 71 (0x47)
> recvfrom(9,"B\0\0\0\^\\0P0_1\0\0\0\0\^A\0\0"...,8192,0,NULL,0x0) = 51 (0x33)
> gettimeofday({ 1538297117.074955 },0x0)          = 0 (0x0)
> gettimeofday({ 1538297117.075308 },0x0)          = 0 (0x0)
> getpid()                                         = 766 (0x2fe)
> __sysctl(0x7fffffffce90,0x4,0x0,0x0,0x801891000,0x29) = 0 (0x0)
> gettimeofday({ 1538297117.076252 },0x0)          = 0 (0x0)
> gettimeofday({ 1538297117.076431 },0x0)          = 0 (0x0)
> gettimeofday({ 1538297117.076678 },0x0^C)                = 0 (0x0)



Cheers
-- 
Matteo Beccati

Development & Consulting - http://www.beccati.com/


Re: [HACKERS] kqueue

From
Thomas Munro
Date:
On Sun, Sep 30, 2018 at 9:49 PM Matteo Beccati <php@beccati.com> wrote:
> On 30/09/2018 04:36, Thomas Munro wrote:
> > On Sat, Sep 29, 2018 at 7:51 PM Matteo Beccati <php@beccati.com> wrote:
> >> Out of curiosity, I've installed FreBSD on an identically specced VM,
> >> and the select benchmark was ~75k tps for kqueue vs ~90k tps on
> >> unpatched master, so maybe there's something wrong I'm doing when
> >> benchmarking. Could you please provide proper instructions?
> >
> > Ouch.  What kind of virtualisation is this?  Which version of FreeBSD?
> >  Not sure if it's relevant, but do you happen to see gettimeofday()
> > showing up as a syscall, if you truss a backend running pgbench?
>
> I downloaded 11.2 as VHD file in order to run on MS Hyper-V / Win10 Pro.
>
> Yes, I saw plenty of gettimeofday calls when running truss:
>
> > gettimeofday({ 1538297117.071344 },0x0)          = 0 (0x0)
> > gettimeofday({ 1538297117.071743 },0x0)          = 0 (0x0)
> > gettimeofday({ 1538297117.072021 },0x0)          = 0 (0x0)

Ok.  Those syscalls show up depending on your
kern.timecounter.hardware setting and virtualised hardware: just like
on Linux, gettimeofday() can be a cheap userspace operation (vDSO)
that avoids the syscall path, or not.  I'm not seeing any reason to
think that's relevant here.

> > getpid()                                         = 766 (0x2fe)
> > __sysctl(0x7fffffffce90,0x4,0x0,0x0,0x801891000,0x2b) = 0 (0x0)
> > gettimeofday({ 1538297117.072944 },0x0)          = 0 (0x0)
> > getpid()                                         = 766 (0x2fe)
> > __sysctl(0x7fffffffce90,0x4,0x0,0x0,0x801891000,0x29) = 0 (0x0)

That's setproctitle().  Those syscalls go away if you use FreeBSD 12
(which has setproctitle_fast()).  If you fix both of those problems,
you are left with just:

> > sendto(9,"2\0\0\0\^DT\0\0\0!\0\^Aabalance"...,71,0,NULL,0) = 71 (0x47)
> > recvfrom(9,"B\0\0\0\^\\0P0_1\0\0\0\0\^A\0\0"...,8192,0,NULL,0x0) = 51 (0x33)

These are the only syscalls I see for each pgbench -S transaction on
my bare metal machine: just the network round trip.  The funny thing
is ... there are almost no kevent() calls.

I managed to reproduce the regression (~70k -> ~50k) using a prewarmed
scale 10 select-only pgbench with 2GB of shared_buffers (so it all
fits), with -j 96 -c 96 on an 8 vCPU AWS t2.2xlarge running FreeBSD 12
ALPHA8.  Here is what truss -c says, capturing data from one backend
for about 10 seconds:

syscall                     seconds   calls  errors
sendto                  0.396840146    3452       0
recvfrom                0.415802029    3443       6
kevent                  0.000626393       6       0
gettimeofday            2.723923249   24053       0
                      ------------- ------- -------
                        3.537191817   30954       6

(There's no regression with -j 8 -c 8, the problem is when
significantly overloaded, the same circumstances under which Matheusz
reported a great improvement).  So... it's very rarely accessing the
kqueue directly... but its existence somehow slows things down.
Curiously, when using poll() it's actually calling poll() ~90/sec for
me:

syscall                     seconds   calls  errors
sendto                  0.352784808    3226       0
recvfrom                0.614855254    4125     916
poll                    0.319396480     916       0
gettimeofday            2.659035352   22456       0
                      ------------- ------- -------
                        3.946071894   30723     916

I don't know what's going on here.  Based on the reports so far, we
know that kqueue gives a speedup when using bare metal with pgbench
running on a different machine, but a slowdown when using
virtualisation and pgbench running on the same machine (and I just
checked that that's observable with both Unix sockets and TCP
sockets).  That gave me the idea of looking at pgbench itself:

Unpatched:

syscall                     seconds   calls  errors
ppoll                   0.004869268       1       0
sendto                 16.489416911    7033       0
recvfrom               21.137606238    7049       0
                      ------------- ------- -------
                       37.631892417   14083       0

Patched:

syscall                     seconds   calls  errors
ppoll                   0.002773195       1       0
sendto                 16.597880468    7217       0
recvfrom               25.646406008    7238       0
                      ------------- ------- -------
                       42.247059671   14456       0

I don't know why the existence of the kqueue should make recvfrom()
slower on the pgbench side.  That's probably something to look into
off-line with some FreeBSD guru help.  Degraded performance for
clients on the same machine does seem to be a show stopper for this
patch for now.  Thanks for testing!

-- 
Thomas Munro
http://www.enterprisedb.com


Re: [HACKERS] kqueue

From
Matteo Beccati
Date:
Hi Thomas,

On 01/10/2018 01:09, Thomas Munro wrote:
> I don't know why the existence of the kqueue should make recvfrom()
> slower on the pgbench side.  That's probably something to look into
> off-line with some FreeBSD guru help.  Degraded performance for
> clients on the same machine does seem to be a show stopper for this
> patch for now.  Thanks for testing!

Glad to be helpful!

I've tried running pgbench from a separate VM and in fact kqueue 
consistently takes the lead with 5-10% more tps on select/prepared 
pgbench on NetBSD too.

What I have observed is that sys cpu usage is ~65% (35% idle) with 
kqueue, while unpatched master averages at 55% (45% idle): relatively 
speaking that's almost 25% less idle cpu available for a local pgbench 
to do its own stuff.

Running pgbench locally shows an average 47% usr / 53% sys cpu 
distribution w/ kqueue vs more like 50-50 w/ vanilla, so I'm inclined to 
think that's the reason why we see a performance drop instead. Thoguhts?


Cheers
-- 
Matteo Beccati

Development & Consulting - http://www.beccati.com/


Re: [HACKERS] kqueue

From
Andres Freund
Date:
On 2018-10-01 19:25:45 +0200, Matteo Beccati wrote:
> On 01/10/2018 01:09, Thomas Munro wrote:
> > I don't know why the existence of the kqueue should make recvfrom()
> > slower on the pgbench side.  That's probably something to look into
> > off-line with some FreeBSD guru help.  Degraded performance for
> > clients on the same machine does seem to be a show stopper for this
> > patch for now.  Thanks for testing!
> 
> Glad to be helpful!
> 
> I've tried running pgbench from a separate VM and in fact kqueue
> consistently takes the lead with 5-10% more tps on select/prepared pgbench
> on NetBSD too.
> 
> What I have observed is that sys cpu usage is ~65% (35% idle) with kqueue,
> while unpatched master averages at 55% (45% idle): relatively speaking
> that's almost 25% less idle cpu available for a local pgbench to do its own
> stuff.

This suggest that either the the wakeup logic between kqueue and poll,
or the internal locking could be at issue.  Is it possible that poll
triggers a directed wakeup path, but kqueue doesn't?

Greetings,

Andres Freund


Re: [HACKERS] kqueue

From
Thomas Munro
Date:
On Tue, Oct 2, 2018 at 6:28 AM Andres Freund <andres@anarazel.de> wrote:
> On 2018-10-01 19:25:45 +0200, Matteo Beccati wrote:
> > On 01/10/2018 01:09, Thomas Munro wrote:
> > > I don't know why the existence of the kqueue should make recvfrom()
> > > slower on the pgbench side.  That's probably something to look into
> > > off-line with some FreeBSD guru help.  Degraded performance for
> > > clients on the same machine does seem to be a show stopper for this
> > > patch for now.  Thanks for testing!
> >
> > Glad to be helpful!
> >
> > I've tried running pgbench from a separate VM and in fact kqueue
> > consistently takes the lead with 5-10% more tps on select/prepared pgbench
> > on NetBSD too.
> >
> > What I have observed is that sys cpu usage is ~65% (35% idle) with kqueue,
> > while unpatched master averages at 55% (45% idle): relatively speaking
> > that's almost 25% less idle cpu available for a local pgbench to do its own
> > stuff.
>
> This suggest that either the the wakeup logic between kqueue and poll,
> or the internal locking could be at issue.  Is it possible that poll
> triggers a directed wakeup path, but kqueue doesn't?

I am following up with some kernel hackers.  In the meantime, here is
a rebase for the new split-line configure.in, to turn cfbot green.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: [HACKERS] kqueue

From
Rui DeSousa
Date:
> On Apr 10, 2018, at 9:05 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>
> On Wed, Dec 6, 2017 at 12:53 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> On Thu, Jun 22, 2017 at 7:19 PM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>>> I don't plan to resubmit this patch myself, but I was doing some
>>> spring cleaning and rebasing today and I figured it might be worth
>>> quietly leaving a working patch here just in case anyone from the
>>> various BSD communities is interested in taking the idea further.
>
> I heard through the grapevine of some people currently investigating
> performance problems on busy FreeBSD systems, possibly related to the
> postmaster pipe.  I suspect this patch might be a part of the solution
> (other patches probably needed to get maximum value out of this patch:
> reuse WaitEventSet objects in some key places, and get rid of high
> frequency PostmasterIsAlive() read() calls).  The autoconf-fu in the
> last version bit-rotted so it seemed like a good time to post a
> rebased patch.
>
> --
> Thomas Munro
> http://www.enterprisedb.com
> <kqueue-v9.patch>

Hi,

I’m instrested in the kqueue patch and would like to know its current state and possible timeline for inclusion in the
basecode.  I have several large FreeBSD systems running PostgreSQL 11 that I believe currently displays this issue.
Thesystem has 88 vCPUs, 512GB Ram, and very active application with over 1000 connections to the database.  The system
exhibitshigh kernel CPU usage servicing poll() for connections that are idle.    

I’ve being testing pg_bouncer to reduce the number of connections and thus system CPU usage; however, not all
connectionscan go through pg_bouncer.  

Thanks,
Rui.


Re: [HACKERS] kqueue

From
Thomas Munro
Date:
On Fri, Dec 20, 2019 at 12:41 PM Rui DeSousa <rui@crazybean.net> wrote:
> I’m instrested in the kqueue patch and would like to know its current state and possible timeline for inclusion in
thebase code.  I have several large FreeBSD systems running PostgreSQL 11 that I believe currently displays this issue.
The system has 88 vCPUs, 512GB Ram, and very active application with over 1000 connections to the database.  The system
exhibitshigh kernel CPU usage servicing poll() for connections that are idle. 

Hi Rui,

It's still my intention to get this committed eventually, but I got a
bit frazzled by conflicting reports on several operating systems.  For
FreeBSD, performance was improved in many cases, but there were also
some regressions that seemed to be related to ongoing work in the
kernel that seemed worth waiting for.  I don't have the details
swapped into my brain right now, but there was something about a big
kernel lock for Unix domain sockets which possibly explained some
local pgbench problems, and there was also a problem relating to
wakeup priority with some test parameters, which I'd need to go and
dig up.  If you want to test this and let us know how you get on,
that'd be great!  Here's a rebase against PostgreSQL's master branch,
and since you mentioned PostgreSQL 11, here's a rebased version for
REL_11_STABLE in case that's easier for you to test/build via ports or
whatever and test with your production workload (eg on a throwaway
copy of your production system).  You can see it's working by looking
in top: instead of state "select" (which is how poll() is reported)
you see "kqread", which on its own isn't exciting enough to get this
committed :-)

PS Here's a list of slow burner PostgreSQL/FreeBSD projects:
https://wiki.postgresql.org/wiki/FreeBSD

Attachment

Re: [HACKERS] kqueue

From
Thomas Munro
Date:
On Fri, Dec 20, 2019 at 1:26 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Fri, Dec 20, 2019 at 12:41 PM Rui DeSousa <rui@crazybean.net> wrote:
> > PostgreSQL 11

BTW, PostgreSQL 12 has an improvement that may be relevant for your
case: it suppresses a bunch of high frequency reads on the "postmaster
death" pipe in some scenarios, mainly the streaming replica replay
loop (if you build on a system new enough to have PROC_PDEATHSIG_CTL,
namely FreeBSD 11.2+, it doesn't bother reading the pipe unless it's
received a signal).  That pipe is inherited by every process and
included in every poll() set.  The kqueue patch doesn't even bother to
add it to the wait event set, preferring to use an EVFILT_PROC event,
so in theory we could get rid of the death pipe completely on FreeBSD
and rely on EVFILT_PROC (sleeping) and PDEATHSIG (while awake), but I
wouldn't want to make the code diverge from the Linux code too much,
so I figured we should leave the pipe in place but just avoid
accessing it when possible, if that makes sense.



Re: [HACKERS] kqueue

From
Rui DeSousa
Date:
Thanks Thomas,

Just a quick update.

I just deployed this patch into a lower environment yesterday running FreeBSD 12.1 and PostgreSQL 11.6.  I see a
significantreduction is CPU/system load from load highs of 500+ down to the low 20’s.  System CPU time has been reduced
topractically nothing.   

I’m working with our support vendor in testing the patch and will continue to let it burn in.  Hopefully, we can get
thepatched committed.  Thanks. 

> On Dec 19, 2019, at 7:26 PM, Thomas Munro <thomas.munro@gmail.com> wrote:
>
> It's still my intention to get this committed eventually, but I got a
> bit frazzled by conflicting reports on several operating systems.  For
> FreeBSD, performance was improved in many cases, but there were also
> some regressions that seemed to be related to ongoing work in the
> kernel that seemed worth waiting for.  I don't have the details
> swapped into my brain right now, but there was something about a big
> kernel lock for Unix domain sockets which possibly explained some
> local pgbench problems, and there was also a problem relating to
> wakeup priority with some test parameters, which I'd need to go and
> dig up.  If you want to test this and let us know how you get on,
> that'd be great!  Here's a rebase against PostgreSQL's master branch,
> and since you mentioned PostgreSQL 11, here's a rebased version for
> REL_11_STABLE in case that's easier for you to test/build via ports or
> whatever and test with your production workload (eg on a throwaway
> copy of your production system).  You can see it's working by looking
> in top: instead of state "select" (which is how poll() is reported)
> you see "kqread", which on its own isn't exciting enough to get this
> committed :-)
>




Re: [HACKERS] kqueue

From
Peter Eisentraut
Date:
On 2019-12-20 01:26, Thomas Munro wrote:
> It's still my intention to get this committed eventually, but I got a
> bit frazzled by conflicting reports on several operating systems.  For
> FreeBSD, performance was improved in many cases, but there were also
> some regressions that seemed to be related to ongoing work in the
> kernel that seemed worth waiting for.  I don't have the details
> swapped into my brain right now, but there was something about a big
> kernel lock for Unix domain sockets which possibly explained some
> local pgbench problems, and there was also a problem relating to
> wakeup priority with some test parameters, which I'd need to go and
> dig up.  If you want to test this and let us know how you get on,
> that'd be great!  Here's a rebase against PostgreSQL's master branch,

I took this patch for a quick spin on macOS.  The result was that the 
test suite hangs in the test src/test/recovery/t/017_shm.pl.  I didn't 
see any mentions of this anywhere in the thread, but that test is newer 
than the beginning of this thread.  Can anyone confirm or deny this 
issue?  Is it specific to macOS perhaps?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] kqueue

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> I took this patch for a quick spin on macOS.  The result was that the
> test suite hangs in the test src/test/recovery/t/017_shm.pl.  I didn't
> see any mentions of this anywhere in the thread, but that test is newer
> than the beginning of this thread.  Can anyone confirm or deny this
> issue?  Is it specific to macOS perhaps?

Yeah, I duplicated the problem in macOS Catalina (10.15.2), using today's
HEAD.  The core regression tests pass, as do the earlier recovery tests
(I didn't try a full check-world though).  Somewhere early in 017_shm.pl,
things freeze up with four postmaster-child processes stuck in 100%-
CPU-consuming loops.  I captured stack traces:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff6554dbb6 libsystem_kernel.dylib`kqueue + 10
    frame #1: 0x0000000105511533 postgres`CreateWaitEventSet(context=<unavailable>, nevents=<unavailable>) at
latch.c:622:19[opt] 
    frame #2: 0x0000000105511305 postgres`WaitLatchOrSocket(latch=0x0000000112e02da4, wakeEvents=41, sock=-1,
timeout=237000,wait_event_info=83886084) at latch.c:389:22 [opt] 
    frame #3: 0x00000001054a7073 postgres`CheckpointerMain at checkpointer.c:514:10 [opt]
    frame #4: 0x00000001052da390 postgres`AuxiliaryProcessMain(argc=2, argv=0x00007ffeea9dded0) at bootstrap.c:461:4
[opt]

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff6554dbce libsystem_kernel.dylib`kevent + 10
    frame #1: 0x0000000105511ddc postgres`WaitEventAdjustKqueue(set=0x00007fc8e8805920, event=0x00007fc8e8805958,
old_events=<unavailable>)at latch.c:1034:7 [opt] 
    frame #2: 0x0000000105511638 postgres`AddWaitEventToSet(set=<unavailable>, events=<unavailable>, fd=<unavailable>,
latch=<unavailable>,user_data=<unavailable>) at latch.c:778:2 [opt] 
    frame #3: 0x0000000105511342 postgres`WaitLatchOrSocket(latch=0x0000000112e030f4, wakeEvents=41, sock=-1,
timeout=200,wait_event_info=83886083) at latch.c:397:3 [opt] 
    frame #4: 0x00000001054a6d69 postgres`BackgroundWriterMain at bgwriter.c:304:8 [opt]
    frame #5: 0x00000001052da38b postgres`AuxiliaryProcessMain(argc=2, argv=0x00007ffeea9dded0) at bootstrap.c:456:4
[opt]

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff65549c66 libsystem_kernel.dylib`close + 10
    frame #1: 0x0000000105511466 postgres`WaitLatchOrSocket [inlined] FreeWaitEventSet(set=<unavailable>) at
latch.c:660:2[opt] 
    frame #2: 0x000000010551145d postgres`WaitLatchOrSocket(latch=0x0000000112e03444, wakeEvents=<unavailable>,
sock=-1,timeout=5000, wait_event_info=83886093) at latch.c:432 [opt] 
    frame #3: 0x00000001054b8685 postgres`WalWriterMain at walwriter.c:256:10 [opt]
    frame #4: 0x00000001052da39a postgres`AuxiliaryProcessMain(argc=2, argv=0x00007ffeea9dded0) at bootstrap.c:467:4
[opt]

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff655515be libsystem_kernel.dylib`__select + 10
    frame #1: 0x00000001056a6191 postgres`pg_usleep(microsec=<unavailable>) at pgsleep.c:56:10 [opt]
    frame #2: 0x00000001054abe12 postgres`backend_read_statsfile at pgstat.c:5720:3 [opt]
    frame #3: 0x00000001054adcc0 postgres`pgstat_fetch_stat_dbentry(dbid=<unavailable>) at pgstat.c:2431:2 [opt]
    frame #4: 0x00000001054a320c postgres`do_start_worker at autovacuum.c:1248:20 [opt]
    frame #5: 0x00000001054a2639 postgres`AutoVacLauncherMain [inlined] launch_worker(now=632853327674576) at
autovacuum.c:1357:9[opt] 
    frame #6: 0x00000001054a2634 postgres`AutoVacLauncherMain(argc=<unavailable>, argv=<unavailable>) at
autovacuum.c:769[opt] 
    frame #7: 0x00000001054a1ea7 postgres`StartAutoVacLauncher at autovacuum.c:415:4 [opt]

I'm not sure how much faith to put in the last couple of those, as
stopping the earlier processes could perhaps have had side-effects.
But evidently 017_shm.pl is doing something that interferes with
our ability to create kqueue-based WaitEventSets.

            regards, tom lane



Re: [HACKERS] kqueue

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> [ 0001-Add-kqueue-2-support-for-WaitEventSet-v13.patch ]

I haven't read this patch in any detail, but a couple quick notes:

* It needs to be rebased over the removal of pg_config.h.win32
--- it should be touching Solution.pm instead, I believe.

* I'm disturbed by the addition of a hunk to the supposedly
system-API-independent WaitEventSetWait() function.  Is that
a generic bug fix?  If not, can we either get rid of it, or
at least wrap it in "#ifdef WAIT_USE_KQUEUE" so that this
patch isn't inflicting a performance penalty on everyone else?

            regards, tom lane



Re: [HACKERS] kqueue

From
Thomas Munro
Date:
On Tue, Jan 21, 2020 at 2:34 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> I took this patch for a quick spin on macOS.  The result was that the
> test suite hangs in the test src/test/recovery/t/017_shm.pl.  I didn't
> see any mentions of this anywhere in the thread, but that test is newer
> than the beginning of this thread.  Can anyone confirm or deny this
> issue?  Is it specific to macOS perhaps?

Thanks for testing, and sorry I didn't run a full check-world after
that rebase.  What happened here is that after commit cfdf4dc4 landed
on master, every implementation now needs to check for
exit_on_postmaster_death, and this patch didn't get the message.
Those processes are stuck in their main loops having detected
postmaster death, but not having any handling for it.  Will fix.



Re: [HACKERS] kqueue

From
Tom Lane
Date:
I wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> I took this patch for a quick spin on macOS.  The result was that the
>> test suite hangs in the test src/test/recovery/t/017_shm.pl.  I didn't
>> see any mentions of this anywhere in the thread, but that test is newer
>> than the beginning of this thread.  Can anyone confirm or deny this
>> issue?  Is it specific to macOS perhaps?

> Yeah, I duplicated the problem in macOS Catalina (10.15.2), using today's
> HEAD.  The core regression tests pass, as do the earlier recovery tests
> (I didn't try a full check-world though).  Somewhere early in 017_shm.pl,
> things freeze up with four postmaster-child processes stuck in 100%-
> CPU-consuming loops.

I observe very similar behavior on FreeBSD/amd64 12.0-RELEASE-p12,
so it's not just macOS.

I now think that the autovac launcher isn't actually stuck in the way
that the other processes are.  The ones that are actually consuming
CPU are the checkpointer, bgwriter, and walwriter.  On the FreeBSD
box their stack traces are

(gdb) bt
#0  _close () at _close.S:3
#1  0x00000000007b4dd1 in FreeWaitEventSet (set=<optimized out>) at latch.c:660
#2  WaitLatchOrSocket (latch=0x80a1477a8, wakeEvents=<optimized out>, sock=-1,
    timeout=<optimized out>, wait_event_info=83886084) at latch.c:432
#3  0x000000000074a1b0 in CheckpointerMain () at checkpointer.c:514
#4  0x00000000005691e2 in AuxiliaryProcessMain (argc=2, argv=0x7fffffffce90)
    at bootstrap.c:461

(gdb) bt
#0  _fcntl () at _fcntl.S:3
#1  0x0000000800a6cd84 in fcntl (fd=4, cmd=2)
    at /usr/src/lib/libc/sys/fcntl.c:56
#2  0x00000000007b4eb5 in CreateWaitEventSet (context=<optimized out>,
    nevents=<optimized out>) at latch.c:625
#3  0x00000000007b4c82 in WaitLatchOrSocket (latch=0x80a147b00, wakeEvents=41,
    sock=-1, timeout=200, wait_event_info=83886083) at latch.c:389
#4  0x0000000000749ecd in BackgroundWriterMain () at bgwriter.c:304
#5  0x00000000005691dd in AuxiliaryProcessMain (argc=2, argv=0x7fffffffce90)
    at bootstrap.c:456

(gdb) bt
#0  _kevent () at _kevent.S:3
#1  0x00000000007b58a1 in WaitEventAdjustKqueue (set=0x800e6a120,
    event=0x800e6a170, old_events=<optimized out>) at latch.c:1034
#2  0x00000000007b4d87 in AddWaitEventToSet (set=<optimized out>,
    events=<error reading variable: Cannot access memory at address 0x10>,
    fd=-1, latch=<optimized out>, user_data=<optimized out>) at latch.c:778
#3  WaitLatchOrSocket (latch=0x80a147e58, wakeEvents=41, sock=-1,
    timeout=5000, wait_event_info=83886093) at latch.c:410
#4  0x000000000075b349 in WalWriterMain () at walwriter.c:256
#5  0x00000000005691ec in AuxiliaryProcessMain (argc=2, argv=0x7fffffffce90)
    at bootstrap.c:467

Note that these are just snapshots --- it looks like these processes
are repeatedly creating and destroying WaitEventSets, they're not
stuck inside the kernel.

            regards, tom lane



Re: [HACKERS] kqueue

From
Thomas Munro
Date:
On Tue, Jan 21, 2020 at 8:03 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I observe very similar behavior on FreeBSD/amd64 12.0-RELEASE-p12,
> so it's not just macOS.

Thanks for testing.  Fixed by handling the new
exit_on_postmaster_death flag from commit cfdf4dc4.

On Tue, Jan 21, 2020 at 5:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > [ 0001-Add-kqueue-2-support-for-WaitEventSet-v13.patch ]
>
> I haven't read this patch in any detail, but a couple quick notes:
>
> * It needs to be rebased over the removal of pg_config.h.win32
> --- it should be touching Solution.pm instead, I believe.

Done.

> * I'm disturbed by the addition of a hunk to the supposedly
> system-API-independent WaitEventSetWait() function.  Is that
> a generic bug fix?  If not, can we either get rid of it, or
> at least wrap it in "#ifdef WAIT_USE_KQUEUE" so that this
> patch isn't inflicting a performance penalty on everyone else?

Here's a version that adds no new code to non-WAIT_USE_KQUEUE paths.
That code deals with the fact that we sometimes discover the
postmaster is gone before we're in a position to report an event, so
we need an inter-function memory of some kind.  The new coding also
handles a race case where someone reuses the postmaster's pid before
we notice it went away.  In theory, the need for that could be
entirely removed by collapsing the 'adjust' call into the 'wait' call
(a single kevent() invocation can do both things), but I'm not sure if
it's worth the complexity.  As for generally reducing syscalls noise,
for both kqueue and epoll, I think that should be addressed separately
by better reuse of WaitEventSet objects[1].

[1] https://www.postgresql.org/message-id/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com

Attachment

Re: [HACKERS] kqueue

From
Matteo Beccati
Date:
Hi,

On 21/01/2020 02:06, Thomas Munro wrote:
> [1] https://www.postgresql.org/message-id/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com

I had a NetBSD 8.0 VM lying around and I gave the patch a spin on latest
master.

With the kqueue patch, a pgbench -c basically hangs the whole postgres
instance. Not sure if it's a kernel issue, HyperVM issue o what, but
when it hangs, I can't even kill -9 the postgres processes or get the VM
to properly shutdown. The same doesn't happen, of course, with vanilla
postgres.

If the patch gets merged, I'd say it's safer not to enable it on NetBSD
and eventually leave it up to the pkgsrc team.


Cheers
-- 
Matteo Beccati

Development & Consulting - http://www.beccati.com/



Re: [HACKERS] kqueue

From
Tom Lane
Date:
Matteo Beccati <php@beccati.com> writes:
> On 21/01/2020 02:06, Thomas Munro wrote:
>> [1] https://www.postgresql.org/message-id/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com

> I had a NetBSD 8.0 VM lying around and I gave the patch a spin on latest
> master.
> With the kqueue patch, a pgbench -c basically hangs the whole postgres
> instance. Not sure if it's a kernel issue, HyperVM issue o what, but
> when it hangs, I can't even kill -9 the postgres processes or get the VM
> to properly shutdown. The same doesn't happen, of course, with vanilla
> postgres.

I'm a bit confused about what you are testing --- the kqueue patch
as per this thread, or that plus the WaitLatch refactorizations in
the other thread you point to above?

I've gotten through check-world successfully with the v14 kqueue patch
atop yesterday's HEAD on:

* macOS Catalina 10.15.2 (current release)
* FreeBSD/amd64 12.0-RELEASE-p12
* NetBSD/amd64 8.1
* NetBSD/arm 8.99.41
* OpenBSD/amd64 6.5

(These OSes are all on bare metal, no VMs involved)

This just says it doesn't lock up, of course.  I've not attempted
any performance-oriented tests.

            regards, tom lane



Re: [HACKERS] kqueue

From
Matteo Beccati
Date:
On 22/01/2020 17:06, Tom Lane wrote:
> Matteo Beccati <php@beccati.com> writes:
>> On 21/01/2020 02:06, Thomas Munro wrote:
>>> [1]
https://www.postgresql.org/message-id/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
> 
>> I had a NetBSD 8.0 VM lying around and I gave the patch a spin on latest
>> master.
>> With the kqueue patch, a pgbench -c basically hangs the whole postgres
>> instance. Not sure if it's a kernel issue, HyperVM issue o what, but
>> when it hangs, I can't even kill -9 the postgres processes or get the VM
>> to properly shutdown. The same doesn't happen, of course, with vanilla
>> postgres.
> 
> I'm a bit confused about what you are testing --- the kqueue patch
> as per this thread, or that plus the WaitLatch refactorizations in
> the other thread you point to above?

my bad, I tested the v14 patch attached to the email.

The quoted url was just above the patch name in the email client and
somehow my brain thought I was quoting the v14 patch name.


Cheers
-- 
Matteo Beccati

Development & Consulting - http://www.beccati.com/



Re: [HACKERS] kqueue

From
Tom Lane
Date:
Matteo Beccati <php@beccati.com> writes:
> On 22/01/2020 17:06, Tom Lane wrote:
>> Matteo Beccati <php@beccati.com> writes:
>>> I had a NetBSD 8.0 VM lying around and I gave the patch a spin on latest
>>> master.
>>> With the kqueue patch, a pgbench -c basically hangs the whole postgres
>>> instance. Not sure if it's a kernel issue, HyperVM issue o what, but
>>> when it hangs, I can't even kill -9 the postgres processes or get the VM
>>> to properly shutdown. The same doesn't happen, of course, with vanilla
>>> postgres.

>> I'm a bit confused about what you are testing --- the kqueue patch
>> as per this thread, or that plus the WaitLatch refactorizations in
>> the other thread you point to above?

> my bad, I tested the v14 patch attached to the email.

Thanks for clarifying.

FWIW, I can't replicate the problem here using NetBSD 8.1 amd64
on bare metal.  I tried various pgbench parameters up to "-c 20 -j 20"
(on a 4-cores-plus-hyperthreading CPU), and it seems fine.

One theory is that NetBSD fixed something since 8.0, but I trawled
their 8.1 release notes [1], and the only items mentioning kqueue
or kevent are for fixes in the pty and tun drivers, neither of which
seem relevant.  (But wait ... could your VM setup be dependent on
a tunnel network interface for outside-the-VM connectivity?  Still
hard to see the connection though.)

My guess is that what you're seeing is a VM bug.

            regards, tom lane

[1] https://cdn.netbsd.org/pub/NetBSD/NetBSD-8.1/CHANGES-8.1



Re: [HACKERS] kqueue

From
Tom Lane
Date:
I wrote:
> This just says it doesn't lock up, of course.  I've not attempted
> any performance-oriented tests.

I've now done some light performance testing -- just stuff like
pgbench -S -M prepared -c 20 -j 20 -T 60 bench

I cannot see any improvement on either FreeBSD 12 or NetBSD 8.1,
either as to net TPS or as to CPU load.  If anything, the TPS
rate is a bit lower with the patch, though I'm not sure that
that effect is above the noise level.

It's certainly possible that to see any benefit you need stress
levels above what I can manage on the small box I've got these
OSes on.  Still, it'd be nice if a performance patch could show
some improved performance, before we take any portability risks
for it.

            regards, tom lane



Re: [HACKERS] kqueue

From
Rui DeSousa
Date:


On Jan 22, 2020, at 2:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I cannot see any improvement on either FreeBSD 12 or NetBSD 8.1,
either as to net TPS or as to CPU load.  If anything, the TPS
rate is a bit lower with the patch, though I'm not sure that
that effect is above the noise level.

It's certainly possible that to see any benefit you need stress
levels above what I can manage on the small box I've got these
OSes on.  Still, it'd be nice if a performance patch could show
some improved performance, before we take any portability risks
for it.


Tom,

Here is two charts comparing a patched and unpatched system.  These systems are very large and have just shy of thousand connections each with averages of 20 to 30 active queries concurrently running at times including hundreds if not thousand of queries hitting the database in rapid succession.  The effect is the unpatched system generates a lot of system load just handling idle connections where as the patched version is not impacted by idle sessions or sessions that have already received data.  





Attachment

Re: [HACKERS] kqueue

From
Thomas Munro
Date:
On Thu, Jan 23, 2020 at 9:38 AM Rui DeSousa <rui@crazybean.net> wrote:
> On Jan 22, 2020, at 2:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It's certainly possible that to see any benefit you need stress
>> levels above what I can manage on the small box I've got these
>> OSes on.  Still, it'd be nice if a performance patch could show
>> some improved performance, before we take any portability risks
>> for it.

You might need more than one CPU socket, or at least lots more cores
so that you can create enough contention.  That was needed to see the
regression caused by commit ac1d794 on Linux[1].

> Here is two charts comparing a patched and unpatched system.
> These systems are very large and have just shy of thousand
> connections each with averages of 20 to 30 active queries concurrently
> running at times including hundreds if not thousand of queries hitting
> the database in rapid succession.  The effect is the unpatched system
> generates a lot of system load just handling idle connections where as
> the patched version is not impacted by idle sessions or sessions that
> have already received data.

Thanks.  I can reproduce something like this on an Azure 72-vCPU
system, using pgbench -S -c800 -j32.  The point of those settings is
to have many backends, but they're all alternating between work and
sleep.  That creates a stream of poll() syscalls, and system time goes
through the roof (all CPUs pegged, but it's ~half system).  Profiling
the kernel with dtrace, I see the most common stack (by a long way) is
in a poll-related lock, similar to a profile Rui sent me off-list from
his production system.  Patched, there is very little system time and
the TPS number goes from 539k to 781k.

[1]
https://www.postgresql.org/message-id/flat/CAB-SwXZh44_2ybvS5Z67p_CDz%3DXFn4hNAD%3DCnMEF%2BQqkXwFrGg%40mail.gmail.com



Re: [HACKERS] kqueue

From
Thomas Munro
Date:
On Sat, Jan 25, 2020 at 11:29 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Thu, Jan 23, 2020 at 9:38 AM Rui DeSousa <rui@crazybean.net> wrote:
> > Here is two charts comparing a patched and unpatched system.
> > These systems are very large and have just shy of thousand
> > connections each with averages of 20 to 30 active queries concurrently
> > running at times including hundreds if not thousand of queries hitting
> > the database in rapid succession.  The effect is the unpatched system
> > generates a lot of system load just handling idle connections where as
> > the patched version is not impacted by idle sessions or sessions that
> > have already received data.
>
> Thanks.  I can reproduce something like this on an Azure 72-vCPU
> system, using pgbench -S -c800 -j32.  The point of those settings is
> to have many backends, but they're all alternating between work and
> sleep.  That creates a stream of poll() syscalls, and system time goes
> through the roof (all CPUs pegged, but it's ~half system).  Profiling
> the kernel with dtrace, I see the most common stack (by a long way) is
> in a poll-related lock, similar to a profile Rui sent me off-list from
> his production system.  Patched, there is very little system time and
> the TPS number goes from 539k to 781k.

If there are no further objections, I'm planning to commit this sooner
rather than later, so that it gets plenty of air time on developer and
build farm machines.  If problems are discovered on a particular
platform, there's a pretty good escape hatch: you can define
WAIT_USE_POLL, and if it turns out to be necessary, we could always do
something in src/template similar to what we do for semaphores.



Re: [HACKERS] kqueue

From
Mark Wong
Date:
On Sat, Jan 25, 2020 at 11:29:11AM +1300, Thomas Munro wrote:
> On Thu, Jan 23, 2020 at 9:38 AM Rui DeSousa <rui@crazybean.net> wrote:
> > On Jan 22, 2020, at 2:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> It's certainly possible that to see any benefit you need stress
> >> levels above what I can manage on the small box I've got these
> >> OSes on.  Still, it'd be nice if a performance patch could show
> >> some improved performance, before we take any portability risks
> >> for it.
> 
> You might need more than one CPU socket, or at least lots more cores
> so that you can create enough contention.  That was needed to see the
> regression caused by commit ac1d794 on Linux[1].
> 
> > Here is two charts comparing a patched and unpatched system.
> > These systems are very large and have just shy of thousand
> > connections each with averages of 20 to 30 active queries concurrently
> > running at times including hundreds if not thousand of queries hitting
> > the database in rapid succession.  The effect is the unpatched system
> > generates a lot of system load just handling idle connections where as
> > the patched version is not impacted by idle sessions or sessions that
> > have already received data.
> 
> Thanks.  I can reproduce something like this on an Azure 72-vCPU
> system, using pgbench -S -c800 -j32.  The point of those settings is
> to have many backends, but they're all alternating between work and
> sleep.  That creates a stream of poll() syscalls, and system time goes
> through the roof (all CPUs pegged, but it's ~half system).  Profiling
> the kernel with dtrace, I see the most common stack (by a long way) is
> in a poll-related lock, similar to a profile Rui sent me off-list from
> his production system.  Patched, there is very little system time and
> the TPS number goes from 539k to 781k.
> 
> [1]
https://www.postgresql.org/message-id/flat/CAB-SwXZh44_2ybvS5Z67p_CDz%3DXFn4hNAD%3DCnMEF%2BQqkXwFrGg%40mail.gmail.com

Just to add some data...

I tried the kqueue v14 patch on a AWS EC2 m5a.24xlarge (96 vCPU) with
FreeBSD 12.1, driving from a m5.8xlarge (32 vCPU) CentOS 7 system.

I also use pgbench with a scale factor of 1000, with -S -c800 -j32.

Comparing pg 12.1 vs 13-devel (30012a04):

* TPS increased from ~93,000 to ~140,000, ~ 32% increase
* system time dropped from ~ 78% to ~ 70%, ~ 8% decrease
* user time increased from ~16% to ~ 23%, ~7% increase

I don't have any profile data, but I've attached a couple chart showing
the processor utilization over a 15 minute interval from the database
system.

Regards,
Mark
-- 
Mark Wong
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/

Attachment

Re: [HACKERS] kqueue

From
Thomas Munro
Date:
On Wed, Jan 29, 2020 at 11:54 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> If there are no further objections, I'm planning to commit this sooner
> rather than later, so that it gets plenty of air time on developer and
> build farm machines.  If problems are discovered on a particular
> platform, there's a pretty good escape hatch: you can define
> WAIT_USE_POLL, and if it turns out to be necessary, we could always do
> something in src/template similar to what we do for semaphores.

I updated the error messages to match the new "unified" style, adjust
a couple of comments, and pushed.  Thanks to all the people who
tested.  I'll keep an eye on the build farm.