Thread: kqueue
Hi, On the WaitEventSet thread I posted a small patch to add kqueue support[1]. Since then I peeked at how some other software[2] interacts with kqueue and discovered that there are platforms including NetBSD where kevent.udata is an intptr_t instead of a void *. Here's a version which should compile there. Would any NetBSD user be interested in testing this? (An alternative would be to make configure to test for this with some kind of AC_COMPILE_IFELSE incantation but the steamroller cast is simpler.) [1] http://www.postgresql.org/message-id/CAEepm=1dZ_mC+V3YtB79zf27280nign8MKOLxy2FKhvc1RzN=g@mail.gmail.com [2] https://github.com/libevent/libevent/commit/5602e451ce872d7d60c640590113c5a81c3fc389 -- Thomas Munro http://www.enterprisedb.com
Attachment
On Tue, Mar 29, 2016 at 7:53 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On the WaitEventSet thread I posted a small patch to add kqueue > support[1]. Since then I peeked at how some other software[2] > interacts with kqueue and discovered that there are platforms > including NetBSD where kevent.udata is an intptr_t instead of a void > *. Here's a version which should compile there. Would any NetBSD > user be interested in testing this? (An alternative would be to make > configure to test for this with some kind of AC_COMPILE_IFELSE > incantation but the steamroller cast is simpler.) Did you code this up blind or do you have a NetBSD machine yourself? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2016-04-21 14:15:53 -0400, Robert Haas wrote: > On Tue, Mar 29, 2016 at 7:53 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: > > On the WaitEventSet thread I posted a small patch to add kqueue > > support[1]. Since then I peeked at how some other software[2] > > interacts with kqueue and discovered that there are platforms > > including NetBSD where kevent.udata is an intptr_t instead of a void > > *. Here's a version which should compile there. Would any NetBSD > > user be interested in testing this? (An alternative would be to make > > configure to test for this with some kind of AC_COMPILE_IFELSE > > incantation but the steamroller cast is simpler.) > > Did you code this up blind or do you have a NetBSD machine yourself? RMT, what do you think, should we try to get this into 9.6? It's feasible that the performance problem 98a64d0bd713c addressed is also present on free/netbsd. - Andres
On Thu, Apr 21, 2016 at 2:22 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-04-21 14:15:53 -0400, Robert Haas wrote: >> On Tue, Mar 29, 2016 at 7:53 PM, Thomas Munro >> <thomas.munro@enterprisedb.com> wrote: >> > On the WaitEventSet thread I posted a small patch to add kqueue >> > support[1]. Since then I peeked at how some other software[2] >> > interacts with kqueue and discovered that there are platforms >> > including NetBSD where kevent.udata is an intptr_t instead of a void >> > *. Here's a version which should compile there. Would any NetBSD >> > user be interested in testing this? (An alternative would be to make >> > configure to test for this with some kind of AC_COMPILE_IFELSE >> > incantation but the steamroller cast is simpler.) >> >> Did you code this up blind or do you have a NetBSD machine yourself? > > RMT, what do you think, should we try to get this into 9.6? It's > feasible that the performance problem 98a64d0bd713c addressed is also > present on free/netbsd. My personal opinion is that it would be a reasonable thing to do if somebody can demonstrate that it actually solves a real problem. Absent that, I don't think we should rush it in. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > On Thu, Apr 21, 2016 at 2:22 PM, Andres Freund <andres@anarazel.de> wrote: > > On 2016-04-21 14:15:53 -0400, Robert Haas wrote: > >> On Tue, Mar 29, 2016 at 7:53 PM, Thomas Munro > >> <thomas.munro@enterprisedb.com> wrote: > >> > On the WaitEventSet thread I posted a small patch to add kqueue > >> > support[1]. Since then I peeked at how some other software[2] > >> > interacts with kqueue and discovered that there are platforms > >> > including NetBSD where kevent.udata is an intptr_t instead of a void > >> > *. Here's a version which should compile there. Would any NetBSD > >> > user be interested in testing this? (An alternative would be to make > >> > configure to test for this with some kind of AC_COMPILE_IFELSE > >> > incantation but the steamroller cast is simpler.) > >> > >> Did you code this up blind or do you have a NetBSD machine yourself? > > > > RMT, what do you think, should we try to get this into 9.6? It's > > feasible that the performance problem 98a64d0bd713c addressed is also > > present on free/netbsd. > > My personal opinion is that it would be a reasonable thing to do if > somebody can demonstrate that it actually solves a real problem. > Absent that, I don't think we should rush it in. My first question is whether there are platforms that use kqueue on which the WaitEventSet stuff proves to be a bottleneck. I vaguely recall that MacOS X in particular doesn't scale terribly well for other reasons, and I don't know if anybody runs *BSD in large machines. On the other hand, there's plenty of hackers running their laptops on MacOS X these days, so presumably any platform dependent problem would be discovered quickly enough. As for NetBSD, it seems mostly a fringe platform, doesn't it? We would discover serious dependency problems quickly enough on the buildfarm ... except that the only netbsd buildfarm member hasn't reported in over two weeks. Am I mistaken in any of these points? (Our coverage of the BSD platforms leaves much to be desired FWIW.) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Apr 21, 2016 at 3:31 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Robert Haas wrote: >> On Thu, Apr 21, 2016 at 2:22 PM, Andres Freund <andres@anarazel.de> wrote: >> > On 2016-04-21 14:15:53 -0400, Robert Haas wrote: >> >> On Tue, Mar 29, 2016 at 7:53 PM, Thomas Munro >> >> <thomas.munro@enterprisedb.com> wrote: >> >> > On the WaitEventSet thread I posted a small patch to add kqueue >> >> > support[1]. Since then I peeked at how some other software[2] >> >> > interacts with kqueue and discovered that there are platforms >> >> > including NetBSD where kevent.udata is an intptr_t instead of a void >> >> > *. Here's a version which should compile there. Would any NetBSD >> >> > user be interested in testing this? (An alternative would be to make >> >> > configure to test for this with some kind of AC_COMPILE_IFELSE >> >> > incantation but the steamroller cast is simpler.) >> >> >> >> Did you code this up blind or do you have a NetBSD machine yourself? >> > >> > RMT, what do you think, should we try to get this into 9.6? It's >> > feasible that the performance problem 98a64d0bd713c addressed is also >> > present on free/netbsd. >> >> My personal opinion is that it would be a reasonable thing to do if >> somebody can demonstrate that it actually solves a real problem. >> Absent that, I don't think we should rush it in. > > My first question is whether there are platforms that use kqueue on > which the WaitEventSet stuff proves to be a bottleneck. I vaguely > recall that MacOS X in particular doesn't scale terribly well for other > reasons, and I don't know if anybody runs *BSD in large machines. > > On the other hand, there's plenty of hackers running their laptops on > MacOS X these days, so presumably any platform dependent problem would > be discovered quickly enough. As for NetBSD, it seems mostly a fringe > platform, doesn't it? We would discover serious dependency problems > quickly enough on the buildfarm ... except that the only netbsd > buildfarm member hasn't reported in over two weeks. > > Am I mistaken in any of these points? > > (Our coverage of the BSD platforms leaves much to be desired FWIW.) My impression is that the Linux problem only manifested itself on large machines. I might be wrong about that. But if that's true, then we might not see regressions on other platforms just because people aren't running those operating systems on big enough hardware. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2016-04-21 14:25:06 -0400, Robert Haas wrote: > On Thu, Apr 21, 2016 at 2:22 PM, Andres Freund <andres@anarazel.de> wrote: > > On 2016-04-21 14:15:53 -0400, Robert Haas wrote: > >> On Tue, Mar 29, 2016 at 7:53 PM, Thomas Munro > >> <thomas.munro@enterprisedb.com> wrote: > >> > On the WaitEventSet thread I posted a small patch to add kqueue > >> > support[1]. Since then I peeked at how some other software[2] > >> > interacts with kqueue and discovered that there are platforms > >> > including NetBSD where kevent.udata is an intptr_t instead of a void > >> > *. Here's a version which should compile there. Would any NetBSD > >> > user be interested in testing this? (An alternative would be to make > >> > configure to test for this with some kind of AC_COMPILE_IFELSE > >> > incantation but the steamroller cast is simpler.) > >> > >> Did you code this up blind or do you have a NetBSD machine yourself? > > > > RMT, what do you think, should we try to get this into 9.6? It's > > feasible that the performance problem 98a64d0bd713c addressed is also > > present on free/netbsd. > > My personal opinion is that it would be a reasonable thing to do if > somebody can demonstrate that it actually solves a real problem. > Absent that, I don't think we should rush it in. On linux you needed a 2 socket machine to demonstrate the problem, but both old ones (my 2009 workstation) and new ones were sufficient. I'd be surprised if the situation on freebsd is any better, except that you might hit another scalability bottleneck earlier. I doubt there's many real postgres instances operating on bigger hardware on freebsd, with sufficient throughput to show the problem. So I think the argument for including is more along trying to be "nice" to more niche-y OSs. I really don't have any opinion either way. - Andres
On Fri, Apr 22, 2016 at 12:21 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-04-21 14:25:06 -0400, Robert Haas wrote: >> On Thu, Apr 21, 2016 at 2:22 PM, Andres Freund <andres@anarazel.de> wrote: >> > On 2016-04-21 14:15:53 -0400, Robert Haas wrote: >> >> On Tue, Mar 29, 2016 at 7:53 PM, Thomas Munro >> >> <thomas.munro@enterprisedb.com> wrote: >> >> > On the WaitEventSet thread I posted a small patch to add kqueue >> >> > support[1]. Since then I peeked at how some other software[2] >> >> > interacts with kqueue and discovered that there are platforms >> >> > including NetBSD where kevent.udata is an intptr_t instead of a void >> >> > *. Here's a version which should compile there. Would any NetBSD >> >> > user be interested in testing this? (An alternative would be to make >> >> > configure to test for this with some kind of AC_COMPILE_IFELSE >> >> > incantation but the steamroller cast is simpler.) >> >> >> >> Did you code this up blind or do you have a NetBSD machine yourself? >> > >> > RMT, what do you think, should we try to get this into 9.6? It's >> > feasible that the performance problem 98a64d0bd713c addressed is also >> > present on free/netbsd. >> >> My personal opinion is that it would be a reasonable thing to do if >> somebody can demonstrate that it actually solves a real problem. >> Absent that, I don't think we should rush it in. > > On linux you needed a 2 socket machine to demonstrate the problem, but > both old ones (my 2009 workstation) and new ones were sufficient. I'd be > surprised if the situation on freebsd is any better, except that you > might hit another scalability bottleneck earlier. > > I doubt there's many real postgres instances operating on bigger > hardware on freebsd, with sufficient throughput to show the problem. So > I think the argument for including is more along trying to be "nice" to > more niche-y OSs. What has BSD ever done for us?! (Joke...) I vote to leave this patch in the next commitfest where it is, and reconsider if someone shows up with a relevant problem report on large systems. I can't see any measurable performance difference on a 4 core laptop running FreeBSD 10.3. Maybe kqueue will make more difference even on smaller systems in future releases if we start using big wait sets for distributed/asynchronous work, in-core pooling/admission control etc. Here's a new version of the patch that fixes some stupid bugs. I have run regression tests and some basic sanity checks on OSX 10.11.4, FreeBSD 10.3, NetBSD 7.0 and OpenBSD 5.8. There is still room to make an improvement that would drop the syscall from AddWaitEventToSet and ModifyWaitEvent, compressing wait set modifications and waiting into a single syscall (kqueue's claimed advantage over the competition). While doing that I discovered that unpatched master doesn't actually build on recent NetBSD systems because our static function strtoi clashes with a non-standard libc function of the same name[1] declared in inttypes.h. Maybe we should rename it, like in the attached? [1] http://netbsd.gw.com/cgi-bin/man-cgi?strtoi++NetBSD-current -- Thomas Munro http://www.enterprisedb.com
Attachment
On 2016-04-22 20:39:27 +1200, Thomas Munro wrote: > I vote to leave this patch in the next commitfest where it is, and > reconsider if someone shows up with a relevant problem report on large > systems. Sounds good! > Here's a new version of the patch that fixes some stupid bugs. I have > run regression tests and some basic sanity checks on OSX 10.11.4, > FreeBSD 10.3, NetBSD 7.0 and OpenBSD 5.8. There is still room to make > an improvement that would drop the syscall from AddWaitEventToSet and > ModifyWaitEvent, compressing wait set modifications and waiting into a > single syscall (kqueue's claimed advantage over the competition). I find that not to be particularly interesting, and would rather want to avoid adding complexity for it. > While doing that I discovered that unpatched master doesn't actually > build on recent NetBSD systems because our static function strtoi > clashes with a non-standard libc function of the same name[1] declared > in inttypes.h. Maybe we should rename it, like in the attached? Yuck. That's a new function they introduced? That code hasn't changed in a while.... Andres
On Sat, Apr 23, 2016 at 4:36 AM, Andres Freund <andres@anarazel.de> wrote: > On 2016-04-22 20:39:27 +1200, Thomas Munro wrote: >> While doing that I discovered that unpatched master doesn't actually >> build on recent NetBSD systems because our static function strtoi >> clashes with a non-standard libc function of the same name[1] declared >> in inttypes.h. Maybe we should rename it, like in the attached? > > Yuck. That's a new function they introduced? That code hasn't changed in > a while.... Yes, according to the man page it appeared in NetBSD 7.0. That was released in September 2015, and our buildfarm has only NetBSD 5.x systems. I see that the maintainers of the NetBSD pg package deal with this with a preprocessor kludge: http://cvsweb.netbsd.org/bsdweb.cgi/pkgsrc/databases/postgresql95/patches/patch-src_backend_utils_adt_datetime.c?rev=1.1 What is the policy for that kind of thing -- do nothing until someone cares enough about the platform to supply a buildfarm animal? -- Thomas Munro http://www.enterprisedb.com
Thomas Munro wrote: > On Sat, Apr 23, 2016 at 4:36 AM, Andres Freund <andres@anarazel.de> wrote: > > On 2016-04-22 20:39:27 +1200, Thomas Munro wrote: > >> While doing that I discovered that unpatched master doesn't actually > >> build on recent NetBSD systems because our static function strtoi > >> clashes with a non-standard libc function of the same name[1] declared > >> in inttypes.h. Maybe we should rename it, like in the attached? > > > > Yuck. That's a new function they introduced? That code hasn't changed in > > a while.... > > Yes, according to the man page it appeared in NetBSD 7.0. That was > released in September 2015, and our buildfarm has only NetBSD 5.x > systems. I see that the maintainers of the NetBSD pg package deal > with this with a preprocessor kludge: > > http://cvsweb.netbsd.org/bsdweb.cgi/pkgsrc/databases/postgresql95/patches/patch-src_backend_utils_adt_datetime.c?rev=1.1 > > What is the policy for that kind of thing -- do nothing until someone > cares enough about the platform to supply a buildfarm animal? Well, if the platform is truly alive, we would have gotten complaints already. Since we haven't, maybe nobody cares, so why should we? I would rename our function nonetheless FWIW; the name seems far too generic to me. pg_strtoi? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2016-04-23 10:12:12 +1200, Thomas Munro wrote: > What is the policy for that kind of thing -- do nothing until someone > cares enough about the platform to supply a buildfarm animal? I think we should fix it, I just want to make sure we understand why the error is appearing now. Since we now do... - Andres
On 2016-04-22 19:25:06 -0300, Alvaro Herrera wrote: > Since we haven't, maybe nobody cares, so why should we? I guess it's to a good degree because netbsd has pg packages, and it's fixed there? > would rename our function nonetheless FWIW; the name seems far too > generic to me. Yea. > pg_strtoi? I think that's what Thomas did upthread. Are you taking this one then? Greetings, Andres Freund
Thomas Munro <thomas.munro@enterprisedb.com> writes: > On Sat, Apr 23, 2016 at 4:36 AM, Andres Freund <andres@anarazel.de> wrote: >> On 2016-04-22 20:39:27 +1200, Thomas Munro wrote: >>> While doing that I discovered that unpatched master doesn't actually >>> build on recent NetBSD systems because our static function strtoi >>> clashes with a non-standard libc function of the same name[1] declared >>> in inttypes.h. Maybe we should rename it, like in the attached? >> Yuck. That's a new function they introduced? That code hasn't changed in >> a while.... > Yes, according to the man page it appeared in NetBSD 7.0. That was > released in September 2015, and our buildfarm has only NetBSD 5.x > systems. I see that the maintainers of the NetBSD pg package deal > with this with a preprocessor kludge: > http://cvsweb.netbsd.org/bsdweb.cgi/pkgsrc/databases/postgresql95/patches/patch-src_backend_utils_adt_datetime.c?rev=1.1 > What is the policy for that kind of thing -- do nothing until someone > cares enough about the platform to supply a buildfarm animal? There's no set policy, but certainly a promise to put up a buildfarm animal would establish that somebody actually cares about keeping Postgres running on the platform. Without one, we might fix a specific problem when reported, but we'd have no way to know about new problems. Rooting through that patches directory reveals quite a number of random-looking patches, most of which we certainly wouldn't take without a lot more than zero explanation. It's hard to tell which are actually needed, but at least some don't seem to have anything to do with building for NetBSD. regards, tom lane
Andres Freund <andres@anarazel.de> writes: >> pg_strtoi? > I think that's what Thomas did upthread. Are you taking this one then? I'd go with just "strtoint". We have "strtoint64" elsewhere. regards, tom lane
Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > >> pg_strtoi? > > > I think that's what Thomas did upthread. Are you taking this one then? > > I'd go with just "strtoint". We have "strtoint64" elsewhere. For closure of this subthread: this rename was committed by Tom as 0ab3595e5bb5. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jun 3, 2016 at 4:02 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Tom Lane wrote: >> Andres Freund <andres@anarazel.de> writes: >> >> pg_strtoi? >> >> > I think that's what Thomas did upthread. Are you taking this one then? >> >> I'd go with just "strtoint". We have "strtoint64" elsewhere. > > For closure of this subthread: this rename was committed by Tom as > 0ab3595e5bb5. Thanks. And here is a new version of the kqueue patch. The previous version doesn't apply on top of recent commit a3b30763cc8686f5b4cd121ef0bf510c1533ac22, which sprinkled some MAXALIGN macros nearby. I've now done the same thing with the kevent struct because it's cheap, uniform with the other cases and could matter on some platforms for the same reason. It's in the September commitfest here: https://commitfest.postgresql.org/10/597/ -- Thomas Munro http://www.enterprisedb.com
Attachment
On 2016-06-03 01:45, Thomas Munro wrote: > On Fri, Jun 3, 2016 at 4:02 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> Tom Lane wrote: >>> Andres Freund <andres@anarazel.de> writes: >>>>> pg_strtoi? >>> >>>> I think that's what Thomas did upthread. Are you taking this one then? >>> >>> I'd go with just "strtoint". We have "strtoint64" elsewhere. >> >> For closure of this subthread: this rename was committed by Tom as >> 0ab3595e5bb5. > > Thanks. And here is a new version of the kqueue patch. The previous > version doesn't apply on top of recent commit > a3b30763cc8686f5b4cd121ef0bf510c1533ac22, which sprinkled some > MAXALIGN macros nearby. I've now done the same thing with the kevent > struct because it's cheap, uniform with the other cases and could > matter on some platforms for the same reason. I've tested and reviewed this, and it looks good to me, other than this part: + /* + * kevent guarantees that the change list has been processed in the EINTR + * case. Here we are only applying a change list so EINTR counts as + * success. + */ this doesn't seem to be guaranteed on old versions of FreeBSD or any other BSD flavors, so I don't think it's a good idea to bake the assumption into this code. Or what do you think? .m
On Wed, Sep 7, 2016 at 12:32 AM, Marko Tiikkaja <marko@joh.to> wrote: > I've tested and reviewed this, and it looks good to me, other than this > part: > > + /* > + * kevent guarantees that the change list has been processed in the > EINTR > + * case. Here we are only applying a change list so EINTR counts as > + * success. > + */ > > this doesn't seem to be guaranteed on old versions of FreeBSD or any other > BSD flavors, so I don't think it's a good idea to bake the assumption into > this code. Or what do you think? Thanks for the testing and review! Hmm. Well spotted. I wrote that because the man page from FreeBSD 10.3 says: When kevent() call fails with EINTR error, all changes in the changelist have been applied. This sentence is indeed missing from the OpenBSD, NetBSD and OSX man pages. It was introduced by FreeBSD commit r280818[1] which made kevent a Pthread cancellation point. I investigated whether it is also true in older FreeBSD and the rest of the BSD family. I believe the answer is yes. 1. That commit doesn't do anything that would change the situation: it just adds thread cancellation wrapper code to libc and libthr which exits under certain conditions but otherwise lets EINTR through to the caller. So I think the new sentence is documentation of the existing behaviour of the syscall. 2. I looked at the code in FreeBSD 4.1[2] (the original kqueue implementation from which all others derive) and the four modern OSes[3][4][5][6]. They vary a bit but in all cases, the first place that can produce EINTR appears to be in kqueue_scan when the (variously named) kernel sleep routine is invoked, which can return EINTR or ERESTART (later translated to EINTR because kevent doesn't support restarting). That comes after all changes have been applied. In fact it's unreachable if nevents is 0: OSX doesn't call kqueue_scan in that case, and the others return early from kqueue_scan in that case. 3. An old email[7] from Jonathan Lemon (creator of kqueue) seems to support that at least in respect of ancient FreeBSD. He wrote: "Technically, an EINTR is returned when a signal interrupts the process after it goes to sleep (that is, after it calls tsleep). So if (as an example) you call kevent() with a zero valued timespec, you'll never get EINTR, since there's no possibility of it sleeping." So if I've understood correctly, what I wrote in the v4 patch is universally true, but it's also moot in this case: kevent cannot fail with errno == EINTR because nevents == 0. On that basis, here is a new version with the comment and special case for EINTR removed. [1] https://svnweb.freebsd.org/base?view=revision&revision=280818 [2] https://github.com/freebsd/freebsd/blob/release/4.1.0/sys/kern/kern_event.c [3] https://github.com/freebsd/freebsd/blob/master/sys/kern/kern_event.c [4] https://github.com/IIJ-NetBSD/netbsd-src/blob/master/sys/kern/kern_event.c [5] https://github.com/openbsd/src/blob/master/sys/kern/kern_event.c [6] https://github.com/opensource-apple/xnu/blob/master/bsd/kern/kern_event.c [7] http://marc.info/?l=freebsd-arch&m=98147346707952&w=2 -- Thomas Munro http://www.enterprisedb.com
Attachment
So, if I've understood correctly, the purpose of this patch is to improve performance on a multi-CPU system, which has the kqueue() function. Most notably, FreeBSD? I launched a FreeBSD 10.3 instance on Amazon EC2 (ami-e0682b80), on a m4.10xlarge instance. That's a 40 core system, biggest available, I believe. I built PostgreSQL master on it, and ran pgbench to benchmark: pgbench -i -s 200 postgres pgbench -M prepared -j 36 -c 36 -S postgres -T20 -P1 I set shared_buffers to 10 GB, so that the whole database fits in cache. I tested that with and without kqueue-v5.patch Result: I don't see any difference in performance. pgbench reports between 80,000 and 97,000 TPS, with or without the patch: [ec2-user@ip-172-31-17-174 ~/postgresql]$ ~/pgsql-install/bin/pgbench -M prepared -j 36 -c 36 -S postgres -T20 -P1 starting vacuum...end. progress: 1.0 s, 94537.1 tps, lat 0.368 ms stddev 0.145 progress: 2.0 s, 96745.9 tps, lat 0.368 ms stddev 0.143 progress: 3.0 s, 93870.1 tps, lat 0.380 ms stddev 0.146 progress: 4.0 s, 89482.9 tps, lat 0.399 ms stddev 0.146 progress: 5.0 s, 87815.0 tps, lat 0.406 ms stddev 0.148 progress: 6.0 s, 86415.5 tps, lat 0.413 ms stddev 0.145 progress: 7.0 s, 86011.0 tps, lat 0.415 ms stddev 0.147 progress: 8.0 s, 84923.0 tps, lat 0.420 ms stddev 0.147 progress: 9.0 s, 84596.6 tps, lat 0.422 ms stddev 0.146 progress: 10.0 s, 84537.7 tps, lat 0.422 ms stddev 0.146 progress: 11.0 s, 83910.5 tps, lat 0.425 ms stddev 0.150 progress: 12.0 s, 83738.2 tps, lat 0.426 ms stddev 0.150 progress: 13.0 s, 83837.5 tps, lat 0.426 ms stddev 0.147 progress: 14.0 s, 83578.4 tps, lat 0.427 ms stddev 0.147 progress: 15.0 s, 83609.5 tps, lat 0.427 ms stddev 0.148 progress: 16.0 s, 83423.5 tps, lat 0.428 ms stddev 0.151 progress: 17.0 s, 83318.2 tps, lat 0.428 ms stddev 0.149 progress: 18.0 s, 82992.7 tps, lat 0.430 ms stddev 0.149 progress: 19.0 s, 83155.9 tps, lat 0.429 ms stddev 0.151 progress: 20.0 s, 83209.0 tps, lat 0.429 ms stddev 0.152 transaction type: <builtin: select only> scaling factor: 200 query mode: prepared number of clients: 36 number of threads: 36 duration: 20 s number of transactions actually processed: 1723759 latency average = 0.413 ms latency stddev = 0.149 ms tps = 86124.484867 (including connections establishing) tps = 86208.458034 (excluding connections establishing) Is this test setup reasonable? I know very little about FreeBSD, I'm afraid, so I don't know how to profile or test that further than that. If there's no measurable difference in performance, between kqueue() and poll(), I think we should forget about this. If there's a FreeBSD hacker out there that can demonstrate better results, I'm all for committing this, but I'm reluctant to add code if no-one can show the benefit. - Heikki
Heikki Linnakangas <hlinnaka@iki.fi> writes: > So, if I've understood correctly, the purpose of this patch is to > improve performance on a multi-CPU system, which has the kqueue() > function. Most notably, FreeBSD? OS X also has this, so it might be worth trying on a multi-CPU Mac. > If there's no measurable difference in performance, between kqueue() and > poll(), I think we should forget about this. I agree that we shouldn't add this unless it's demonstrably a win. No opinion on whether your test is adequate. regards, tom lane
On 09/13/2016 04:33 PM, Tom Lane wrote: > Heikki Linnakangas <hlinnaka@iki.fi> writes: >> So, if I've understood correctly, the purpose of this patch is to >> improve performance on a multi-CPU system, which has the kqueue() >> function. Most notably, FreeBSD? > > OS X also has this, so it might be worth trying on a multi-CPU Mac. > >> If there's no measurable difference in performance, between kqueue() and >> poll(), I think we should forget about this. > > I agree that we shouldn't add this unless it's demonstrably a win. > No opinion on whether your test is adequate. I'm marking this as "Returned with Feedback", waiting for someone to post test results that show a positive performance benefit from this. - Heikki
Hi, On 2016-09-13 16:08:39 +0300, Heikki Linnakangas wrote: > So, if I've understood correctly, the purpose of this patch is to improve > performance on a multi-CPU system, which has the kqueue() function. Most > notably, FreeBSD? I think it's not necessarily about the current system, but more about future uses of the WaitEventSet stuff. Some of that is going to use a lot more sockets. E.g. doing a parallel append over FDWs. > I launched a FreeBSD 10.3 instance on Amazon EC2 (ami-e0682b80), on a > m4.10xlarge instance. That's a 40 core system, biggest available, I believe. > I built PostgreSQL master on it, and ran pgbench to benchmark: > > pgbench -i -s 200 postgres > pgbench -M prepared -j 36 -c 36 -S postgres -T20 -P1 This seems likely to actually seldomly exercise the relevant code path. We only do the poll()/epoll_wait()/... when a read() doesn't return anything, but that seems likely to seldomly occur here. Using a lower thread count and a lot higher client count might change that. Note that the case where poll vs. epoll made a large difference (after the regression due to ac1d7945f86) on linux was only on fairly large machines, with high clients counts. Greetings, Andres Freund
On 13 September 2016 at 08:08, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > So, if I've understood correctly, the purpose of this patch is to improve > performance on a multi-CPU system, which has the kqueue() function. Most > notably, FreeBSD? I'm getting a little fried from "self-documenting" patches, from multiple sources. I think we should make it a firm requirement to explain what a patch is actually about, with extra points for including with it a test that allows us to validate that. We don't have enough committer time to waste on such things. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Sep 13, 2016 at 11:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 13 September 2016 at 08:08, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> So, if I've understood correctly, the purpose of this patch is to improve >> performance on a multi-CPU system, which has the kqueue() function. Most >> notably, FreeBSD? > > I'm getting a little fried from "self-documenting" patches, from > multiple sources. > > I think we should make it a firm requirement to explain what a patch > is actually about, with extra points for including with it a test that > allows us to validate that. We don't have enough committer time to > waste on such things. You've complained about this a whole bunch of times recently, but in most of those cases I didn't think there was any real unclarity. I agree that it's a good idea for a patch to be submitted with suitable submission notes, but it also isn't reasonable to expect those submission notes to be reposted with every single version of every patch. Indeed, I'd find that pretty annoying. Thomas linked back to the previous thread where this was discussed, which seems more or less sufficient. If committers are too busy to click on links in the patch submission emails, they have no business committing anything. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Andres Freund <andres@anarazel.de> writes: > On 2016-09-13 16:08:39 +0300, Heikki Linnakangas wrote: >> So, if I've understood correctly, the purpose of this patch is to improve >> performance on a multi-CPU system, which has the kqueue() function. Most >> notably, FreeBSD? > I think it's not necessarily about the current system, but more about > future uses of the WaitEventSet stuff. Some of that is going to use a > lot more sockets. E.g. doing a parallel append over FDWs. All fine, but the burden of proof has to be on the patch to show that it does something significant. We don't want to be carrying around platform-specific code, which necessarily has higher maintenance cost than other code, without a darn good reason. Also, if it's only a win on machines with dozens of CPUs, how many people are running *BSD on that kind of iron? I think Linux is by far the dominant kernel for such hardware. For sure Apple isn't selling any machines like that. regards, tom lane
On 2016-09-13 12:43:36 -0400, Tom Lane wrote: > > I think it's not necessarily about the current system, but more about > > future uses of the WaitEventSet stuff. Some of that is going to use a > > lot more sockets. E.g. doing a parallel append over FDWs. (note that I'm talking about network sockets not cpu sockets here) > All fine, but the burden of proof has to be on the patch to show that > it does something significant. We don't want to be carrying around > platform-specific code, which necessarily has higher maintenance cost > than other code, without a darn good reason. No argument there. > Also, if it's only a win on machines with dozens of CPUs, how many > people are running *BSD on that kind of iron? I think Linux is by > far the dominant kernel for such hardware. For sure Apple isn't > selling any machines like that. I'm not sure you need quite that big a machine, if you test a workload that currently reaches the poll(). Regards, Andres
Andres Freund <andres@anarazel.de> writes: > On 2016-09-13 12:43:36 -0400, Tom Lane wrote: >> Also, if it's only a win on machines with dozens of CPUs, how many >> people are running *BSD on that kind of iron? I think Linux is by >> far the dominant kernel for such hardware. For sure Apple isn't >> selling any machines like that. > I'm not sure you need quite that big a machine, if you test a workload > that currently reaches the poll(). Well, Thomas stated in https://www.postgresql.org/message-id/CAEepm%3D1CwuAq35FtVBTZO-mnGFH1xEFtDpKQOf_b6WoEmdZZHA%40mail.gmail.com that he hadn't been able to measure any performance difference, and I assume he was trying test cases from the WaitEventSet thread. Also I notice that the WaitEventSet thread started with a simple pgbench test, so I don't really buy the claim that that's not a way that will reach the problem. I'd be happy to see this go in if it can be shown to provide a measurable performance improvement, but so far we have only guesses that someday it *might* make a difference. That's not good enough to add to our maintenance burden IMO. Anyway, the patch is in the archives now, so it won't be hard to resurrect if the situation changes. regards, tom lane
On 2016-09-13 14:47:08 -0400, Tom Lane wrote: > Also I notice that the WaitEventSet thread started with a simple > pgbench test, so I don't really buy the claim that that's not a > way that will reach the problem. You can reach it, but not when using 1 core:one pgbench thread:one client connection, there need to be more connections than that. At least that was my observation on x86 / linux. Andres
Andres Freund <andres@anarazel.de> writes: > On 2016-09-13 14:47:08 -0400, Tom Lane wrote: >> Also I notice that the WaitEventSet thread started with a simple >> pgbench test, so I don't really buy the claim that that's not a >> way that will reach the problem. > You can reach it, but not when using 1 core:one pgbench thread:one > client connection, there need to be more connections than that. At least > that was my observation on x86 / linux. Well, that original test was >> I tried to run pgbench -s 1000 -j 48 -c 48 -S -M prepared on 70 CPU-core >> machine: so no, not 1 client ;-) Anyway, I decided to put my money where my mouth was and run my own benchmark. On my couple-year-old Macbook Pro running OS X 10.11.6, using a straight build of today's HEAD, asserts disabled, fsync off but no other parameters changed, I did "pgbench -i -s 100" and then did this a few times:pgbench -T 60 -j 4 -c 4 -M prepared -S bench (It's a 4-core CPU so I saw little point in pressing harder than that.) Median of 3 runs was 56028 TPS. Repeating the runs with kqueue-v5.patch applied, I got a median of 58975 TPS, or 5% better. Run-to-run variation was only around 1% in each case. So that's not a huge improvement, but it's clearly above the noise floor, and this laptop is not what anyone would use for production work eh? Presumably you could show even better results on something closer to server-grade hardware with more active clients. So at this point I'm wondering why Thomas and Heikki could not measure any win. Based on my results it should be easy. Is it possible that OS X is better tuned for multi-CPU hardware than FreeBSD? regards, tom lane
On 2016-09-13 15:37:22 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2016-09-13 14:47:08 -0400, Tom Lane wrote: > >> Also I notice that the WaitEventSet thread started with a simple > >> pgbench test, so I don't really buy the claim that that's not a > >> way that will reach the problem. > > > You can reach it, but not when using 1 core:one pgbench thread:one > > client connection, there need to be more connections than that. At least > > that was my observation on x86 / linux. > > Well, that original test was > > >> I tried to run pgbench -s 1000 -j 48 -c 48 -S -M prepared on 70 CPU-core > >> machine: > > so no, not 1 client ;-) What I meant wasn't one client, but less than one client per cpu, and using a pgbench thread per backend. That way usually, at least on linux, there'll be a relatively small amount of poll/epoll/whatever, because the recvmsg()s will always have data available. > Anyway, I decided to put my money where my mouth was and run my own > benchmark. Cool. > (It's a 4-core CPU so I saw little point in pressing harder than > that.) I think in reality most busy machines, were performance and scalability matter, are overcommitted in the number of connections vs. cores. And if you look at throughput graphs that makes sense; they tend to increase considerably after reaching #hardware-threads, even if all connections are full throttle busy. It might not make sense if you just run large analytics queries, or if you want the lowest latency possible, but in everything else, the reality is that machines are often overcommitted for good reason. > So at this point I'm wondering why Thomas and Heikki could not measure > any win. Based on my results it should be easy. Is it possible that > OS X is better tuned for multi-CPU hardware than FreeBSD? Hah! Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2016-09-13 15:37:22 -0400, Tom Lane wrote: >> (It's a 4-core CPU so I saw little point in pressing harder than >> that.) > I think in reality most busy machines, were performance and scalability > matter, are overcommitted in the number of connections vs. cores. And > if you look at throughput graphs that makes sense; they tend to increase > considerably after reaching #hardware-threads, even if all connections > are full throttle busy. At -j 10 -c 10, all else the same, I get 84928 TPS on HEAD and 90357 with the patch, so about 6% better. >> So at this point I'm wondering why Thomas and Heikki could not measure >> any win. Based on my results it should be easy. Is it possible that >> OS X is better tuned for multi-CPU hardware than FreeBSD? > Hah! Well, there must be some reason why this patch improves matters on OS X and not FreeBSD ... regards, tom lane
I wrote: > At -j 10 -c 10, all else the same, I get 84928 TPS on HEAD and 90357 > with the patch, so about 6% better. And at -j 1 -c 1, I get 22390 and 24040 TPS, or about 7% better with the patch. So what I am seeing on OS X isn't contention of any sort, but just a straight speedup that's independent of the number of clients (at least up to 10). Probably this represents less setup/teardown cost for kqueue() waits than poll() waits. So you could spin this as "FreeBSD's poll() implementation is better than OS X's", or as "FreeBSD's kqueue() implementation is worse than OS X's", but either way I do not think we're seeing the same issue that was originally reported against Linux, where there was no visible problem at all till you got to a couple dozen clients, cf https://www.postgresql.org/message-id/CAB-SwXbPmfpgL6N4Ro4BbGyqXEqqzx56intHHBCfvpbFUx1DNA%40mail.gmail.com I'm inclined to think the kqueue patch is worth applying just on the grounds that it makes things better on OS X and doesn't seem to hurt on FreeBSD. Whether anyone would ever get to the point of seeing intra-kernel contention on these platforms is hard to predict, but we'd be ahead of the curve if so. It would be good for someone else to reproduce my results though. For one thing, 5%-ish is not that far above the noise level; maybe what I'm measuring here is just good luck from relocation of critical loops into more cache-line-friendly locations. regards, tom lane
On Wed, Sep 14, 2016 at 12:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> At -j 10 -c 10, all else the same, I get 84928 TPS on HEAD and 90357 >> with the patch, so about 6% better. > > And at -j 1 -c 1, I get 22390 and 24040 TPS, or about 7% better with > the patch. So what I am seeing on OS X isn't contention of any sort, > but just a straight speedup that's independent of the number of clients > (at least up to 10). Probably this represents less setup/teardown cost > for kqueue() waits than poll() waits. Thanks for running all these tests. I hadn't considered OS X performance. > So you could spin this as "FreeBSD's poll() implementation is better than > OS X's", or as "FreeBSD's kqueue() implementation is worse than OS X's", > but either way I do not think we're seeing the same issue that was > originally reported against Linux, where there was no visible problem at > all till you got to a couple dozen clients, cf > > https://www.postgresql.org/message-id/CAB-SwXbPmfpgL6N4Ro4BbGyqXEqqzx56intHHBCfvpbFUx1DNA%40mail.gmail.com > > I'm inclined to think the kqueue patch is worth applying just on the > grounds that it makes things better on OS X and doesn't seem to hurt > on FreeBSD. Whether anyone would ever get to the point of seeing > intra-kernel contention on these platforms is hard to predict, but > we'd be ahead of the curve if so. I was originally thinking of this as simply the obvious missing implementation of Andres's WaitEventSet API which would surely pay off later as we do more with that API (asynchronous execution with many remote nodes for sharding, built-in connection pooling/admission control for large numbers of sockets?, ...). I wasn't really expecting it to show performance increases in simple one or two pipe/socket cases on small core count machines, and it's interesting that it clearly does on OS X. > It would be good for someone else to reproduce my results though. > For one thing, 5%-ish is not that far above the noise level; maybe > what I'm measuring here is just good luck from relocation of critical > loops into more cache-line-friendly locations. Similar results here on a 4 core 2.2GHz Core i7 MacBook Pro running OS X 10.11.5. With default settings except fsync = off, I ran pgbench -i -s 100, then took the median result of three runs of pgbench -T 60 -j 4 -c 4 -M prepared -S. I used two different compilers in case it helps to see results with different random instruction cache effects, and got the following numbers: Apple clang 703.0.31: 51654 TPS -> 55739 TPS = 7.9% improvement GCC 6.1.0 from MacPorts: 52552 TPS -> 55143 TPS = 4.9% improvement I reran the tests under FreeBSD 10.3 on a 4 core laptop and again saw absolutely no measurable difference at 1, 4 or 24 clients. Maybe a big enough server could be made to contend on the postmaster pipe's selinfo->si_mtx, in selrecord(), in pipe_poll() -- maybe that'd be directly equivalent to what happened on multi-socket Linux with poll(), but I don't know. -- Thomas Munro http://www.enterprisedb.com
On Wed, Sep 14, 2016 at 7:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > It would be good for someone else to reproduce my results though. > For one thing, 5%-ish is not that far above the noise level; maybe > what I'm measuring here is just good luck from relocation of critical > loops into more cache-line-friendly locations. From an OSX laptop with -S, -c 1 and -M prepared (9 runs, removed the three best and three worst): - HEAD: 9356/9343/9369 - HEAD + patch: 9433/9413/9461.071168 This laptop has a lot of I/O overhead... Still there is a slight improvement here as well. Looking at the progress report, per-second TPS gets easier more frequently into 9500~9600 TPS with the patch. So at least I am seeing something. -- Michael
Michael Paquier <michael.paquier@gmail.com> writes: > From an OSX laptop with -S, -c 1 and -M prepared (9 runs, removed the > three best and three worst): > - HEAD: 9356/9343/9369 > - HEAD + patch: 9433/9413/9461.071168 > This laptop has a lot of I/O overhead... Still there is a slight > improvement here as well. Looking at the progress report, per-second > TPS gets easier more frequently into 9500~9600 TPS with the patch. So > at least I am seeing something. Which OSX version exactly? regards, tom lane
On Wed, Sep 14, 2016 at 3:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Paquier <michael.paquier@gmail.com> writes: >> From an OSX laptop with -S, -c 1 and -M prepared (9 runs, removed the >> three best and three worst): >> - HEAD: 9356/9343/9369 >> - HEAD + patch: 9433/9413/9461.071168 >> This laptop has a lot of I/O overhead... Still there is a slight >> improvement here as well. Looking at the progress report, per-second >> TPS gets easier more frequently into 9500~9600 TPS with the patch. So >> at least I am seeing something. > > Which OSX version exactly? El Capitan 10.11.6. With -s 20 (300MB) and 1GB of shared_buffers so as everything is on memory. Actually re-running the tests now with no VMs around and no apps, I am getting close to 9650~9700TPS with patch, and 9300~9400TPS on HEAD, so that's unlikely only noise. -- Michael
Hi, On 14/09/2016 00:06, Tom Lane wrote: > I'm inclined to think the kqueue patch is worth applying just on the > grounds that it makes things better on OS X and doesn't seem to hurt > on FreeBSD. Whether anyone would ever get to the point of seeing > intra-kernel contention on these platforms is hard to predict, but > we'd be ahead of the curve if so. > > It would be good for someone else to reproduce my results though. > For one thing, 5%-ish is not that far above the noise level; maybe > what I'm measuring here is just good luck from relocation of critical > loops into more cache-line-friendly locations. FWIW, I've tested HEAD vs patch on a 2-cpu low end NetBSD 7.0 i386 machine. HEAD: 1890/1935/1889 tps kqueue: 1905/1957/1932 tps no weird surprises, and basically no differences either. Cheers -- Matteo Beccati Development & Consulting - http://www.beccati.com/
On Wed, Sep 14, 2016 at 9:09 AM, Matteo Beccati <php@beccati.com> wrote:
Hi,
On 14/09/2016 00:06, Tom Lane wrote:I'm inclined to think the kqueue patch is worth applying just on the
grounds that it makes things better on OS X and doesn't seem to hurt
on FreeBSD. Whether anyone would ever get to the point of seeing
intra-kernel contention on these platforms is hard to predict, but
we'd be ahead of the curve if so.
It would be good for someone else to reproduce my results though.
For one thing, 5%-ish is not that far above the noise level; maybe
what I'm measuring here is just good luck from relocation of critical
loops into more cache-line-friendly locations.
FWIW, I've tested HEAD vs patch on a 2-cpu low end NetBSD 7.0 i386 machine.
HEAD: 1890/1935/1889 tps
kqueue: 1905/1957/1932 tps
no weird surprises, and basically no differences either.
Cheers
--
Matteo Beccati
Development & Consulting - http://www.beccati.com/
Thomas Munro brought up in #postgresql on freenode needing someone to test a patch on a larger FreeBSD server. I've got a pretty decent machine (3.1Ghz Quad Core Xeon E3-1220V3, 16GB ECC RAM, ZFS mirror on WD Red HDD) so offered to give it a try.
Bench setup was:
pgbench -i -s 100 -d postgres
pgbench -i -s 100 -d postgres
I ran this against 96rc1 instead of HEAD like most of the others in this thread seem to have done. Not sure if that makes a difference and can re-run if needed.
With higher concurrency, this seems to cause decreased performance. You can tell which of the runs is the kqueue patch by looking at the path to pgbench.
With higher concurrency, this seems to cause decreased performance. You can tell which of the runs is the kqueue patch by looking at the path to pgbench.
SINGLE PROCESS
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1547387
latency average: 0.039 ms
tps = 25789.750236 (including connections establishing)
tps = 25791.018293 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1549442
latency average: 0.039 ms
tps = 25823.981255 (including connections establishing)
tps = 25825.189871 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1547936
latency average: 0.039 ms
tps = 25798.572583 (including connections establishing)
tps = 25799.917170 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1520722
latency average: 0.039 ms
tps = 25343.122533 (including connections establishing)
tps = 25344.357116 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S postgres -p 5496~
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1549282
latency average: 0.039 ms
tps = 25821.107595 (including connections establishing)
tps = 25822.407310 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S postgres -p 5496~
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1541907
latency average: 0.039 ms
tps = 25698.025983 (including connections establishing)
tps = 25699.270663 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1547387
latency average: 0.039 ms
tps = 25789.750236 (including connections establishing)
tps = 25791.018293 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1549442
latency average: 0.039 ms
tps = 25823.981255 (including connections establishing)
tps = 25825.189871 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1547936
latency average: 0.039 ms
tps = 25798.572583 (including connections establishing)
tps = 25799.917170 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1520722
latency average: 0.039 ms
tps = 25343.122533 (including connections establishing)
tps = 25344.357116 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S postgres -p 5496~
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1549282
latency average: 0.039 ms
tps = 25821.107595 (including connections establishing)
tps = 25822.407310 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S postgres -p 5496~
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1541907
latency average: 0.039 ms
tps = 25698.025983 (including connections establishing)
tps = 25699.270663 (excluding connections establishing)
FOUR
/home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 4282185
latency average: 0.056 ms
tps = 71369.146931 (including connections establishing)
tps = 71372.646243 (excluding connections establishing)
[keith@corpus ~/postgresql-9.6rc1_kqueue]$ /home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 4777596
latency average: 0.050 ms
tps = 79625.214521 (including connections establishing)
tps = 79629.800123 (excluding connections establishing)
[keith@corpus ~/postgresql-9.6rc1_kqueue]$ /home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 4809132
latency average: 0.050 ms
tps = 80151.803249 (including connections establishing)
tps = 80155.903203 (excluding connections establishing)
/home/keith/pgsql96rc1/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5114286
latency average: 0.047 ms
tps = 85236.858383 (including connections establishing)
tps = 85241.847800 (excluding connections establishing)
/home/keith/pgsql96rc1/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5600194
latency average: 0.043 ms
tps = 93335.508864 (including connections establishing)
tps = 93340.970416 (excluding connections establishing)
/home/keith/pgsql96rc1/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5606962
latency average: 0.043 ms
tps = 93447.905764 (including connections establishing)
tps = 93454.077142 (excluding connections establishing)
/home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 4282185
latency average: 0.056 ms
tps = 71369.146931 (including connections establishing)
tps = 71372.646243 (excluding connections establishing)
[keith@corpus ~/postgresql-9.6rc1_kqueue]$ /home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 4777596
latency average: 0.050 ms
tps = 79625.214521 (including connections establishing)
tps = 79629.800123 (excluding connections establishing)
[keith@corpus ~/postgresql-9.6rc1_kqueue]$ /home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 4809132
latency average: 0.050 ms
tps = 80151.803249 (including connections establishing)
tps = 80155.903203 (excluding connections establishing)
/home/keith/pgsql96rc1/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5114286
latency average: 0.047 ms
tps = 85236.858383 (including connections establishing)
tps = 85241.847800 (excluding connections establishing)
/home/keith/pgsql96rc1/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5600194
latency average: 0.043 ms
tps = 93335.508864 (including connections establishing)
tps = 93340.970416 (excluding connections establishing)
/home/keith/pgsql96rc1/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5606962
latency average: 0.043 ms
tps = 93447.905764 (including connections establishing)
tps = 93454.077142 (excluding connections establishing)
SIXTY-FOUR
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4084213
latency average: 0.940 ms
tps = 67633.476871 (including connections establishing)
tps = 67751.865998 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4119994
latency average: 0.932 ms
tps = 68474.847365 (including connections establishing)
tps = 68540.221835 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4068071
latency average: 0.944 ms
tps = 67192.603129 (including connections establishing)
tps = 67254.760177 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4281302
latency average: 0.897 ms
tps = 70147.847337 (including connections establishing)
tps = 70389.283564 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4573114
latency average: 0.840 ms
tps = 74848.884475 (including connections establishing)
tps = 75102.862539 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4341447
latency average: 0.884 ms
tps = 72350.152281 (including connections establishing)
tps = 72421.831179 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4084213
latency average: 0.940 ms
tps = 67633.476871 (including connections establishing)
tps = 67751.865998 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4119994
latency average: 0.932 ms
tps = 68474.847365 (including connections establishing)
tps = 68540.221835 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1_kqueue/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4068071
latency average: 0.944 ms
tps = 67192.603129 (including connections establishing)
tps = 67254.760177 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4281302
latency average: 0.897 ms
tps = 70147.847337 (including connections establishing)
tps = 70389.283564 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4573114
latency average: 0.840 ms
tps = 74848.884475 (including connections establishing)
tps = 75102.862539 (excluding connections establishing)
[keith@corpus /tank/pgdata]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S postgres -p 5496
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4341447
latency average: 0.884 ms
tps = 72350.152281 (including connections establishing)
tps = 72421.831179 (excluding connections establishing)
On Thu, Sep 15, 2016 at 10:48 AM, Keith Fiske <keith@omniti.com> wrote: > Thomas Munro brought up in #postgresql on freenode needing someone to test a > patch on a larger FreeBSD server. I've got a pretty decent machine (3.1Ghz > Quad Core Xeon E3-1220V3, 16GB ECC RAM, ZFS mirror on WD Red HDD) so offered > to give it a try. > > Bench setup was: > pgbench -i -s 100 -d postgres > > I ran this against 96rc1 instead of HEAD like most of the others in this > thread seem to have done. Not sure if that makes a difference and can re-run > if needed. > With higher concurrency, this seems to cause decreased performance. You can > tell which of the runs is the kqueue patch by looking at the path to > pgbench. Thanks Keith. So to summarise, you saw no change with 1 client, but with 4 clients you saw a significant drop in performance (~93K TPS -> ~80K TPS), and a smaller drop for 64 clients (~72 TPS -> ~68K TPS). These results seem to be a nail in the coffin for this patch for now. Thanks to everyone who tested. I might be back in a later commitfest if I can figure out why and how to fix it. -- Thomas Munro http://www.enterprisedb.com
On Thu, Sep 15, 2016 at 11:04 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Thu, Sep 15, 2016 at 10:48 AM, Keith Fiske <keith@omniti.com> wrote: >> Thomas Munro brought up in #postgresql on freenode needing someone to test a >> patch on a larger FreeBSD server. I've got a pretty decent machine (3.1Ghz >> Quad Core Xeon E3-1220V3, 16GB ECC RAM, ZFS mirror on WD Red HDD) so offered >> to give it a try. >> >> Bench setup was: >> pgbench -i -s 100 -d postgres >> >> I ran this against 96rc1 instead of HEAD like most of the others in this >> thread seem to have done. Not sure if that makes a difference and can re-run >> if needed. >> With higher concurrency, this seems to cause decreased performance. You can >> tell which of the runs is the kqueue patch by looking at the path to >> pgbench. > > Thanks Keith. So to summarise, you saw no change with 1 client, but > with 4 clients you saw a significant drop in performance (~93K TPS -> > ~80K TPS), and a smaller drop for 64 clients (~72 TPS -> ~68K TPS). > These results seem to be a nail in the coffin for this patch for now. > > Thanks to everyone who tested. I might be back in a later commitfest > if I can figure out why and how to fix it. Ok, here's a version tweaked to use EVFILT_PROC for postmaster death detection instead of the pipe, as Tom Lane suggested in another thread[1]. The pipe still exists and is used for PostmasterIsAlive(), and also for the race case where kevent discovers that the PID doesn't exist when you try to add it (presumably it died already, but we want to defer the report of that until you call EventSetWait, so in that case we stick the traditional pipe into the kqueue set as before so that it'll fire a readable-because-EOF event then). Still no change measurable on my laptop. Keith, would you be able to test this on your rig and see if it sucks any less than the last one? [1] https://www.postgresql.org/message-id/13774.1473972000%40sss.pgh.pa.us -- Thomas Munro http://www.enterprisedb.com
Attachment
Hi, On 16/09/2016 05:11, Thomas Munro wrote: > Still no change measurable on my laptop. Keith, would you be able to > test this on your rig and see if it sucks any less than the last one? I've tested kqueue-v6.patch on the Celeron NetBSD machine and numbers were constantly lower by about 5-10% vs fairly recent HEAD (same as my last pgbench runs). Cheers -- Matteo Beccati Development & Consulting - http://www.beccati.com/
On Thu, Sep 15, 2016 at 11:11 PM, Thomas Munro <thomas.munro@enterprisedb.com > wrote:
Ran benchmarks on unaltered 96rc1 again just to be safe. Those are first. Decided to throw a 32 process test in there as well to see if there's anything going on between 4 and 64
~/pgsql96rc1/bin/pgbench -i -s 100 -d pgbench -p 5496
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1543809
latency average: 0.039 ms
tps = 25729.749474 (including connections establishing)
tps = 25731.006414 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1548340
latency average: 0.039 ms
tps = 25796.928387 (including connections establishing)
tps = 25798.275891 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1535072
latency average: 0.039 ms
tps = 25584.182830 (including connections establishing)
tps = 25585.487246 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5621013
latency average: 0.043 ms
tps = 93668.594248 (including connections establishing)
tps = 93674.730914 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5659929
latency average: 0.042 ms
tps = 94293.572928 (including connections establishing)
tps = 94300.500395 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5649572
latency average: 0.042 ms
tps = 94115.854165 (including connections establishing)
tps = 94123.436211 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 5196336
latency average: 0.369 ms
tps = 86570.696138 (including connections establishing)
tps = 86608.648579 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 5202443
latency average: 0.369 ms
tps = 86624.724577 (including connections establishing)
tps = 86664.848857 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 5198412
latency average: 0.369 ms
tps = 86637.730825 (including connections establishing)
tps = 86668.706105 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4790285
latency average: 0.802 ms
tps = 79800.369679 (including connections establishing)
tps = 79941.243428 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4852921
latency average: 0.791 ms
tps = 79924.873678 (including connections establishing)
tps = 80179.182200 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4672965
latency average: 0.822 ms
tps = 77871.911528 (including connections establishing)
tps = 77961.614345 (excluding connections establishing)
~/pgsql96rc1_kqueue_v6/bin/pgbench -i -s 100 -d pgbench -p 5496
Ran more than 3 times on occasion since results were coming out differently by larger than expected values sometimes. Probably just something else running on the server at the time.
Again, no real noticeable difference for single process
For 4 processes, things are mostly the same and only very, very slightly lower, which is better than before.
For thirty-two processes, I saw a slight increase in performance for v6.
But, again, for 64 the results were slightly worse. Although the last run did almost match, most runs were lower. They're better than they were last time, but still not as good as the unchanged 96rc1
SINGLE
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1508745
latency average: 0.040 ms
tps = 25145.524948 (including connections establishing)
tps = 25146.433564 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1346454
latency average: 0.045 ms
tps = 22440.692798 (including connections establishing)
tps = 22441.527989 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1426906
latency average: 0.042 ms
tps = 23781.710780 (including connections establishing)
tps = 23782.523744 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1546252
latency average: 0.039 ms
tps = 25770.468513 (including connections establishing)
tps = 25771.352027 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1542366
latency average: 0.039 ms
tps = 25705.706274 (including connections establishing)
tps = 25706.577285 (excluding connections establishing)
FOUR
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5606159
latency average: 0.043 ms
tps = 93435.464767 (including connections establishing)
tps = 93442.716270 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5602564
latency average: 0.043 ms
tps = 93375.528201 (including connections establishing)
tps = 93381.999147 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5608675
latency average: 0.043 ms
tps = 93474.081114 (including connections establishing)
tps = 93481.634509 (excluding connections establishing)
THIRTY-TWO
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 5273952
latency average: 0.364 ms
tps = 87855.483112 (including connections establishing)
tps = 87880.762662 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 5294039
latency average: 0.363 ms
tps = 88126.254862 (including connections establishing)
tps = 88151.282371 (excluding connections establishing)
[keith@corpus ~]$
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 5279444
latency average: 0.364 ms
tps = 87867.500628 (including connections establishing)
tps = 87891.856414 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 5286405
latency average: 0.363 ms
tps = 88049.742194 (including connections establishing)
tps = 88077.409809 (excluding connections establishing)
SIXTY-FOUR
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4426565
latency average: 0.867 ms
tps = 72142.306576 (including connections establishing)
tps = 72305.201516 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4070048
latency average: 0.943 ms
tps = 66587.264608 (including connections establishing)
tps = 66711.820878 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4478535
latency average: 0.857 ms
tps = 72768.961061 (including connections establishing)
tps = 72930.488922 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4051086
latency average: 0.948 ms
tps = 66540.741821 (including connections establishing)
tps = 66601.943062 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4374049
latency average: 0.878 ms
tps = 72093.025134 (including connections establishing)
tps = 72271.145559 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4762663
latency average: 0.806 ms
tps = 79372.610362 (including connections establishing)
tps = 79535.601194 (excluding connections establishing)
As a sanity check I went back and ran the pgbench from the v5 patch to see if it was still lower. It is. So v6 seems to have a slight improvement in some cases.
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v5/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 4618814
latency average: 0.416 ms
tps = 76960.608378 (including connections establishing)
tps = 76981.609781 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v5/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 4649745
latency average: 0.413 ms
tps = 77491.094077 (including connections establishing)
tps = 77525.443941 (excluding connections establishing)
On Thu, Sep 15, 2016 at 11:04 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Thu, Sep 15, 2016 at 10:48 AM, Keith Fiske <keith@omniti.com> wrote:
>> Thomas Munro brought up in #postgresql on freenode needing someone to test a
>> patch on a larger FreeBSD server. I've got a pretty decent machine (3.1Ghz
>> Quad Core Xeon E3-1220V3, 16GB ECC RAM, ZFS mirror on WD Red HDD) so offered
>> to give it a try.
>>
>> Bench setup was:
>> pgbench -i -s 100 -d postgres
>>
>> I ran this against 96rc1 instead of HEAD like most of the others in this
>> thread seem to have done. Not sure if that makes a difference and can re-run
>> if needed.
>> With higher concurrency, this seems to cause decreased performance. You can
>> tell which of the runs is the kqueue patch by looking at the path to
>> pgbench.
>
> Thanks Keith. So to summarise, you saw no change with 1 client, but
> with 4 clients you saw a significant drop in performance (~93K TPS ->
> ~80K TPS), and a smaller drop for 64 clients (~72 TPS -> ~68K TPS).
> These results seem to be a nail in the coffin for this patch for now.
>
> Thanks to everyone who tested. I might be back in a later commitfest
> if I can figure out why and how to fix it.
Ok, here's a version tweaked to use EVFILT_PROC for postmaster death
detection instead of the pipe, as Tom Lane suggested in another
thread[1].
The pipe still exists and is used for PostmasterIsAlive(), and also
for the race case where kevent discovers that the PID doesn't exist
when you try to add it (presumably it died already, but we want to
defer the report of that until you call EventSetWait, so in that case
we stick the traditional pipe into the kqueue set as before so that
it'll fire a readable-because-EOF event then).
Still no change measurable on my laptop. Keith, would you be able to
test this on your rig and see if it sucks any less than the last one?
[1] https://www.postgresql.org/message-id/13774.1473972000%40sss .pgh.pa.us
Ran benchmarks on unaltered 96rc1 again just to be safe. Those are first. Decided to throw a 32 process test in there as well to see if there's anything going on between 4 and 64
~/pgsql96rc1/bin/pgbench -i -s 100 -d pgbench -p 5496
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1543809
latency average: 0.039 ms
tps = 25729.749474 (including connections establishing)
tps = 25731.006414 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1548340
latency average: 0.039 ms
tps = 25796.928387 (including connections establishing)
tps = 25798.275891 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1535072
latency average: 0.039 ms
tps = 25584.182830 (including connections establishing)
tps = 25585.487246 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5621013
latency average: 0.043 ms
tps = 93668.594248 (including connections establishing)
tps = 93674.730914 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5659929
latency average: 0.042 ms
tps = 94293.572928 (including connections establishing)
tps = 94300.500395 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5649572
latency average: 0.042 ms
tps = 94115.854165 (including connections establishing)
tps = 94123.436211 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 5196336
latency average: 0.369 ms
tps = 86570.696138 (including connections establishing)
tps = 86608.648579 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 5202443
latency average: 0.369 ms
tps = 86624.724577 (including connections establishing)
tps = 86664.848857 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 5198412
latency average: 0.369 ms
tps = 86637.730825 (including connections establishing)
tps = 86668.706105 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4790285
latency average: 0.802 ms
tps = 79800.369679 (including connections establishing)
tps = 79941.243428 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4852921
latency average: 0.791 ms
tps = 79924.873678 (including connections establishing)
tps = 80179.182200 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4672965
latency average: 0.822 ms
tps = 77871.911528 (including connections establishing)
tps = 77961.614345 (excluding connections establishing)
~/pgsql96rc1_kqueue_v6/bin/pgbench -i -s 100 -d pgbench -p 5496
Ran more than 3 times on occasion since results were coming out differently by larger than expected values sometimes. Probably just something else running on the server at the time.
Again, no real noticeable difference for single process
For 4 processes, things are mostly the same and only very, very slightly lower, which is better than before.
For thirty-two processes, I saw a slight increase in performance for v6.
But, again, for 64 the results were slightly worse. Although the last run did almost match, most runs were lower. They're better than they were last time, but still not as good as the unchanged 96rc1
I can try running against HEAD if you'd like.
SINGLE
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1508745
latency average: 0.040 ms
tps = 25145.524948 (including connections establishing)
tps = 25146.433564 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1346454
latency average: 0.045 ms
tps = 22440.692798 (including connections establishing)
tps = 22441.527989 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1426906
latency average: 0.042 ms
tps = 23781.710780 (including connections establishing)
tps = 23782.523744 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1546252
latency average: 0.039 ms
tps = 25770.468513 (including connections establishing)
tps = 25771.352027 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 1 -c 1 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 1542366
latency average: 0.039 ms
tps = 25705.706274 (including connections establishing)
tps = 25706.577285 (excluding connections establishing)
FOUR
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5606159
latency average: 0.043 ms
tps = 93435.464767 (including connections establishing)
tps = 93442.716270 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5602564
latency average: 0.043 ms
tps = 93375.528201 (including connections establishing)
tps = 93381.999147 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 4 -c 4 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 5608675
latency average: 0.043 ms
tps = 93474.081114 (including connections establishing)
tps = 93481.634509 (excluding connections establishing)
THIRTY-TWO
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 5273952
latency average: 0.364 ms
tps = 87855.483112 (including connections establishing)
tps = 87880.762662 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 5294039
latency average: 0.363 ms
tps = 88126.254862 (including connections establishing)
tps = 88151.282371 (excluding connections establishing)
[keith@corpus ~]$
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 5279444
latency average: 0.364 ms
tps = 87867.500628 (including connections establishing)
tps = 87891.856414 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 5286405
latency average: 0.363 ms
tps = 88049.742194 (including connections establishing)
tps = 88077.409809 (excluding connections establishing)
SIXTY-FOUR
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4426565
latency average: 0.867 ms
tps = 72142.306576 (including connections establishing)
tps = 72305.201516 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4070048
latency average: 0.943 ms
tps = 66587.264608 (including connections establishing)
tps = 66711.820878 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4478535
latency average: 0.857 ms
tps = 72768.961061 (including connections establishing)
tps = 72930.488922 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4051086
latency average: 0.948 ms
tps = 66540.741821 (including connections establishing)
tps = 66601.943062 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4374049
latency average: 0.878 ms
tps = 72093.025134 (including connections establishing)
tps = 72271.145559 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v6/bin/pgbench -T 60 -j 64 -c 64 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 64
number of threads: 64
duration: 60 s
number of transactions actually processed: 4762663
latency average: 0.806 ms
tps = 79372.610362 (including connections establishing)
tps = 79535.601194 (excluding connections establishing)
As a sanity check I went back and ran the pgbench from the v5 patch to see if it was still lower. It is. So v6 seems to have a slight improvement in some cases.
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v5/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 4618814
latency average: 0.416 ms
tps = 76960.608378 (including connections establishing)
tps = 76981.609781 (excluding connections establishing)
[keith@corpus ~]$ /home/keith/pgsql96rc1_kqueue_v5/bin/pgbench -T 60 -j 32 -c 32 -M prepared -S -p 5496 pgbench
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 4649745
latency average: 0.413 ms
tps = 77491.094077 (including connections establishing)
tps = 77525.443941 (excluding connections establishing)
On Thu, Sep 29, 2016 at 9:09 AM, Keith Fiske <keith@omniti.com> wrote: > On Thu, Sep 15, 2016 at 11:11 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> Ok, here's a version tweaked to use EVFILT_PROC for postmaster death >> detection instead of the pipe, as Tom Lane suggested in another >> thread[1]. >> >> [...] > > Ran benchmarks on unaltered 96rc1 again just to be safe. Those are first. > Decided to throw a 32 process test in there as well to see if there's > anything going on between 4 and 64 Thanks! A summary: ┌──────────────────┬─────────┬───────────┬────────────────────┬───────────┐ │ code │ clients │ average │ standard_deviation │ median │ ├──────────────────┼─────────┼───────────┼────────────────────┼───────────┤ │ 9.6rc1 │ 1 │ 25704.923 │ 108.766 │ 25731.006 │ │ 9.6rc1 │ 4 │ 94032.889 │ 322.562 │ 94123.436 │ │ 9.6rc1 │ 32 │ 86647.401 │ 33.616 │ 86664.849 │ │ 9.6rc1 │ 64 │ 79360.680 │ 1217.453 │ 79941.243 │ │ 9.6rc1/kqueue-v6 │ 1 │ 24569.683 │ 1433.339 │ 25146.434 │ │ 9.6rc1/kqueue-v6 │ 4 │ 93435.450 │ 50.214 │ 93442.716 │ │ 9.6rc1/kqueue-v6 │ 32 │ 88000.328 │ 135.143 │ 87891.856 │ │ 9.6rc1/kqueue-v6 │ 64 │ 71726.034 │ 4784.794 │ 72271.146 │ └──────────────────┴─────────┴───────────┴────────────────────┴───────────┘ ┌─────────┬───────────┬───────────┬──────────────────────────┐ │ clients │ unpatched │ patched │ percent_change │ ├─────────┼───────────┼───────────┼──────────────────────────┤ │ 1 │ 25731.006 │ 25146.434 │ -2.271858317548874692000 │ │ 4 │ 94123.436 │ 93442.716 │ -0.723220516514080510000 │ │ 32 │ 86664.849 │ 87891.856 │ 1.415807001521458833000 │ │ 64 │ 79941.243 │ 72271.146 │ -9.594668173973727179000 │ └─────────┴───────────┴───────────┴──────────────────────────┘ The variation in the patched 64 client numbers is quite large, ranging from ~66.5k to ~79.5k. The highest number matched the unpatched numbers which ranged 77.9k to 80k. I wonder if that is noise and we need to run longer (in which case the best outcome might be 'this patch is neutral on FreeBSD'), or if something the patch does is doing is causing that (for example maybe EVFILT_PROC proc filters causes contention on the process table lock). Matteo's results with the v6 patch on a low end NetBSD machine were not good. But the report at [1] implies that larger NetBSD and OpenBSD systems have terrible problems with the poll-postmaster-alive-pipe approach, which this EVFILT_PROC approach would seem to address pretty well. It's difficult to draw any conclusions at this point. [1] https://www.postgresql.org/message-id/flat/20160915135755.GC19008%40genua.de -- Thomas Munro http://www.enterprisedb.com
On 28.09.2016 23:39, Thomas Munro wrote: > On Thu, Sep 29, 2016 at 9:09 AM, Keith Fiske <keith@omniti.com> wrote: >> On Thu, Sep 15, 2016 at 11:11 PM, Thomas Munro >> <thomas.munro@enterprisedb.com> wrote: >>> Ok, here's a version tweaked to use EVFILT_PROC for postmaster death >>> detection instead of the pipe, as Tom Lane suggested in another >>> thread[1]. >>> >>> [...] >> >> Ran benchmarks on unaltered 96rc1 again just to be safe. Those are first. >> Decided to throw a 32 process test in there as well to see if there's >> anything going on between 4 and 64 > > Thanks! A summary: > > [summary] > > The variation in the patched 64 client numbers is quite large, ranging > from ~66.5k to ~79.5k. The highest number matched the unpatched > numbers which ranged 77.9k to 80k. I wonder if that is noise and we > need to run longer (in which case the best outcome might be 'this > patch is neutral on FreeBSD'), or if something the patch does is doing > is causing that (for example maybe EVFILT_PROC proc filters causes > contention on the process table lock). > > [..] > > It's difficult to draw any conclusions at this point. I'm currently setting up a new FreeBSD machine. Its a FreeBSD 11 with ZFS, 64 GB RAM and Quad Core. If you're interested in i can give you access for more tests this week. Maybe this will help to draw any conclusion. Greetings, Torsten
On Tue, Oct 11, 2016 at 8:08 PM, Torsten Zuehlsdorff <mailinglists@toco-domains.de> wrote: > On 28.09.2016 23:39, Thomas Munro wrote: >> It's difficult to draw any conclusions at this point. > > I'm currently setting up a new FreeBSD machine. Its a FreeBSD 11 with ZFS, > 64 GB RAM and Quad Core. If you're interested in i can give you access for > more tests this week. Maybe this will help to draw any conclusion. I don't plan to resubmit this patch myself, but I was doing some spring cleaning and rebasing today and I figured it might be worth quietly leaving a working patch here just in case anyone from the various BSD communities is interested in taking the idea further. Some thoughts: We could decide to make it the default on FooBSD but not BarBSD according to experimental results... for example several people reported that macOS developer machines run pgbench a bit faster. Also, we didn't ever get to the bottom of the complaint that NetBSD and OpenBSD systems wake up every waiting backend when anyone calls PostmasterIsAlive[1], which this patch should in theory fix (by using EVFILT_PROC instead of waiting on that pipe). On the other hand, the fix for that may be to stop calling PostmasterIsAlive in loops[2]! [1] https://www.postgresql.org/message-id/CAEepm%3D27K-2AP1th97kiVvKpTuria9ocbjT0cXCJqnt4if5rJQ%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAEepm%3D3FW33PeRxt0jE4N0truJqOepp72R6W-zyM5mu1bxnZRw%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Thu, Jun 22, 2017 at 7:19 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > I don't plan to resubmit this patch myself, but I was doing some > spring cleaning and rebasing today and I figured it might be worth > quietly leaving a working patch here just in case anyone from the > various BSD communities is interested in taking the idea further. Since there was a mention of kqueue on -hackers today, here's another rebase. I got curious just now and ran a very quick test on an AWS 64 vCPU m4.16xlarge instance running image "FreeBSD 11.1-STABLE-amd64-2017-08-08 - ami-00608178". I set shared_buffers = 10GB and ran pgbench approximately the same way Heikki and Keith did upthread: pgbench -i -s 200 postgres pgbench -M prepared -j 6 -c 6 -S postgres -T60 -P1 pgbench -M prepared -j 12 -c 12 -S postgres -T60 -P1 pgbench -M prepared -j 24 -c 24 -S postgres -T60 -P1 pgbench -M prepared -j 36 -c 36 -S postgres -T60 -P1 pgbench -M prepared -j 48 -c 48 -S postgres -T60 -P1 The TPS numbers I got (including connections establishing) were: clients master patched 6 146,215 147,535 (+0.9%) 12 273,056 280,505 (+2.7%) 24 360,751 369,965 (+2.5%) 36 413,147 420,769 (+1.8%) 48 416,189 444,537 (+6.8%) The patch appears to be doing something positive on this particular system and that effect was stable over a few runs. -- Thomas Munro http://www.enterprisedb.com
Attachment
On Wed, Dec 6, 2017 at 12:53 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Thu, Jun 22, 2017 at 7:19 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> I don't plan to resubmit this patch myself, but I was doing some >> spring cleaning and rebasing today and I figured it might be worth >> quietly leaving a working patch here just in case anyone from the >> various BSD communities is interested in taking the idea further. I heard through the grapevine of some people currently investigating performance problems on busy FreeBSD systems, possibly related to the postmaster pipe. I suspect this patch might be a part of the solution (other patches probably needed to get maximum value out of this patch: reuse WaitEventSet objects in some key places, and get rid of high frequency PostmasterIsAlive() read() calls). The autoconf-fu in the last version bit-rotted so it seemed like a good time to post a rebased patch. -- Thomas Munro http://www.enterprisedb.com
Attachment
On Wed, Apr 11, 2018 at 1:05 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > I heard through the grapevine of some people currently investigating > performance problems on busy FreeBSD systems, possibly related to the > postmaster pipe. I suspect this patch might be a part of the solution > (other patches probably needed to get maximum value out of this patch: > reuse WaitEventSet objects in some key places, and get rid of high > frequency PostmasterIsAlive() read() calls). The autoconf-fu in the > last version bit-rotted so it seemed like a good time to post a > rebased patch. Once I knew how to get a message resent to someone who wasn't subscribed to our mailing list at the time it was sent[1] so they could join an existing thread. I don't know how to do that with the new mailing list software, so I'm CC'ing Mateusz so he can share his results on-thread. Sorry for the noise. [1] https://www.postgresql.org/message-id/CAEepm=0-KsV4Sj-0Qd4rMCg7UYdOQA=TUjLkEZOX7h_qiQQaCA@mail.gmail.com -- Thomas Munro http://www.enterprisedb.com
On Mon, May 21, 2018 at 9:03 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
The test was performed few weeks ago.
For convenience PostgreSQL 10.3 as found in the ports tree was used.
3 variants were tested:
- stock 10.3
- stock 10.3 + pdeathsig
- stock 10.3 + pdeathsig + kqueue
Appropriate patches were provided by Thomas.
In order to keep this message PG-13 I'm not going to show the actual
script, but a mere outline:
for i in $(seq 1 10): do
for t in vanilla pdeathsig pdeathsig_kqueue; do
start up the relevant version
for c in 32 64 96; do
pgbench -j 96 -c $c -T 120 -M prepared -S -U bench -h 172.16.0.2 -P1 bench > ${t}-${c}-out-warmup 2>&1
pgbench -j 96 -c $c -T 120 -M prepared -S -U bench -h 172.16.0.2 -P1 bench > ${t}-${c}-out 2>&1
done
shutdown the relevant version
done
Data from the warmup is not used. All the data was pre-read prior to the
test.
PostgreSQL was configured with 32GB of shared buffers and 200 max
connections, otherwise it was the default.
The server is:
Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz
2 package(s) x 8 core(s) x 2 hardware threads
i.e. 32 threads in total.
running FreeBSD -head with 'options NUMA' in kernel config and
sysctl net.inet.tcp.per_cpu_timers=1 on top of zfs.
The load was generated from a different box over a 100Gbit ethernet link.
x cumulative-tps-vanilla-32
+ cumulative-tps-pdeathsig-32
* cumulative-tps-pdeathsig_kqueue-32
+------------------------------------------------------------------------+
|+ + x+* x+ * x * + * * * * ** * ** *|
| |_____|__M_A___M_A_____|____| |________MA________| |
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 442898.77 448476.81 444805.17 445062.08 1679.7169
+ 10 442057.2 447835.46 443840.28 444235.01 1771.2254
No difference proven at 95.0% confidence
* 10 448138.07 452786.41 450274.56 450311.51 1387.2927
Difference at 95.0% confidence
5249.43 +/- 1447.41
1.17948% +/- 0.327501%
(Student's t, pooled s = 1540.46)
x cumulative-tps-vanilla-64
+ cumulative-tps-pdeathsig-64
* cumulative-tps-pdeathsig_kqueue-64
+------------------------------------------------------------------------+
| ** |
| ** |
| xx x + ***|
|++**x *+*++ ***|
| ||_A|M_| |A |
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 411849.26 422145.5 416043.77 416061.9 3763.2545
+ 10 407123.74 425727.84 419908.73 417480.7 6817.5549
No difference proven at 95.0% confidence
* 10 542032.71 546106.93 543948.05 543874.06 1234.1788
Difference at 95.0% confidence
127812 +/- 2631.31
30.7195% +/- 0.809892%
(Student's t, pooled s = 2800.47)
x cumulative-tps-vanilla-96
+ cumulative-tps-pdeathsig-96
* cumulative-tps-pdeathsig_kqueue-96
+------------------------------------------------------------------------+
| * |
| * |
| * |
| * |
| + x * |
| *xxx+ **|
|+ *****+ * **|
| |MA|| |A||
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 325263.7 336338 332399.16 331321.82 3571.2478
+ 10 321213.33 338669.66 329553.78 330903.58 5652.008
No difference proven at 95.0% confidence
* 10 503877.22 511449.96 508708.41 508808.51 2016.9483
Difference at 95.0% confidence
177487 +/- 2724.98
53.5693% +/- 1.17178%
(Student's t, pooled s = 2900.16)
--
On Wed, Apr 11, 2018 at 1:05 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> I heard through the grapevine of some people currently investigating
> performance problems on busy FreeBSD systems, possibly related to the
> postmaster pipe. I suspect this patch might be a part of the solution
> (other patches probably needed to get maximum value out of this patch:
> reuse WaitEventSet objects in some key places, and get rid of high
> frequency PostmasterIsAlive() read() calls). The autoconf-fu in the
> last version bit-rotted so it seemed like a good time to post a
> rebased patch.
Hi everyone,
I have benchmarked the change on a FreeBSD box and found an big
performance win once the number of clients goes beyond the number of
hardware threads on the target machine. For smaller number of clients
I have benchmarked the change on a FreeBSD box and found an big
performance win once the number of clients goes beyond the number of
hardware threads on the target machine. For smaller number of clients
the win was very modest.
The test was performed few weeks ago.
For convenience PostgreSQL 10.3 as found in the ports tree was used.
3 variants were tested:
- stock 10.3
- stock 10.3 + pdeathsig
- stock 10.3 + pdeathsig + kqueue
Appropriate patches were provided by Thomas.
In order to keep this message PG-13 I'm not going to show the actual
script, but a mere outline:
for i in $(seq 1 10): do
for t in vanilla pdeathsig pdeathsig_kqueue; do
start up the relevant version
for c in 32 64 96; do
pgbench -j 96 -c $c -T 120 -M prepared -S -U bench -h 172.16.0.2 -P1 bench > ${t}-${c}-out-warmup 2>&1
pgbench -j 96 -c $c -T 120 -M prepared -S -U bench -h 172.16.0.2 -P1 bench > ${t}-${c}-out 2>&1
done
shutdown the relevant version
done
Data from the warmup is not used. All the data was pre-read prior to the
test.
PostgreSQL was configured with 32GB of shared buffers and 200 max
connections, otherwise it was the default.
The server is:
Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz
2 package(s) x 8 core(s) x 2 hardware threads
i.e. 32 threads in total.
running FreeBSD -head with 'options NUMA' in kernel config and
sysctl net.inet.tcp.per_cpu_timers=1 on top of zfs.
The load was generated from a different box over a 100Gbit ethernet link.
x cumulative-tps-vanilla-32
+ cumulative-tps-pdeathsig-32
* cumulative-tps-pdeathsig_kqueue-32
+------------------------------------------------------------------------+
|+ + x+* x+ * x * + * * * * ** * ** *|
| |_____|__M_A___M_A_____|____| |________MA________| |
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 442898.77 448476.81 444805.17 445062.08 1679.7169
+ 10 442057.2 447835.46 443840.28 444235.01 1771.2254
No difference proven at 95.0% confidence
* 10 448138.07 452786.41 450274.56 450311.51 1387.2927
Difference at 95.0% confidence
5249.43 +/- 1447.41
1.17948% +/- 0.327501%
(Student's t, pooled s = 1540.46)
x cumulative-tps-vanilla-64
+ cumulative-tps-pdeathsig-64
* cumulative-tps-pdeathsig_kqueue-64
+------------------------------------------------------------------------+
| ** |
| ** |
| xx x + ***|
|++**x *+*++ ***|
| ||_A|M_| |A |
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 411849.26 422145.5 416043.77 416061.9 3763.2545
+ 10 407123.74 425727.84 419908.73 417480.7 6817.5549
No difference proven at 95.0% confidence
* 10 542032.71 546106.93 543948.05 543874.06 1234.1788
Difference at 95.0% confidence
127812 +/- 2631.31
30.7195% +/- 0.809892%
(Student's t, pooled s = 2800.47)
x cumulative-tps-vanilla-96
+ cumulative-tps-pdeathsig-96
* cumulative-tps-pdeathsig_kqueue-96
+------------------------------------------------------------------------+
| * |
| * |
| * |
| * |
| + x * |
| *xxx+ **|
|+ *****+ * **|
| |MA|| |A||
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 325263.7 336338 332399.16 331321.82 3571.2478
+ 10 321213.33 338669.66 329553.78 330903.58 5652.008
No difference proven at 95.0% confidence
* 10 503877.22 511449.96 508708.41 508808.51 2016.9483
Difference at 95.0% confidence
177487 +/- 2724.98
53.5693% +/- 1.17178%
(Student's t, pooled s = 2900.16)
--
Mateusz Guzik <mjguzik gmail.com>
On Mon, May 21, 2018 at 7:27 PM, Mateusz Guzik <mjguzik@gmail.com> wrote: > I have benchmarked the change on a FreeBSD box and found an big > performance win once the number of clients goes beyond the number of > hardware threads on the target machine. For smaller number of clients > the win was very modest. Thanks for the report! This is good news for the patch, if we can explain a few mysteries. > 3 variants were tested: > - stock 10.3 > - stock 10.3 + pdeathsig > - stock 10.3 + pdeathsig + kqueue For the record, "pdeathsig" refers to another patch of mine[1] that is not relevant to this test (it's a small change in the recovery loop, important for replication but not even reached here). > [a bunch of neat output from ministat] So to summarise your results: 32 connections: ~445k -> ~450k = +1.2% 64 connections: ~416k -> ~544k = +30.7% 96 connections: ~331k -> ~508k = +53.6% As you added more connections above your thread count, stock 10.3's TPS number went down, but with the patch it went up. So now we have to explain why you see a huge performance boost but others reported a modest gain or in some cases loss. The main things that jump out: 1. You used TCP sockets and ran pgbench on another machine, while others used Unix domain sockets. 2. You're running a newer/bleeding edge kernel. 3. You used more CPUs than most reporters. For the record, Mateusz and others discovered some fixable global lock contention in the Unix domain socket layer that is now being hacked on[2], though it's not clear if that'd affect the results reported earlier or not. [1] https://www.postgresql.org/message-id/CAEepm%3D0w9AAHAH73-tkZ8VS2Lg6JzY4ii3TG7t-R%2B_MWyUAk9g%40mail.gmail.com [2] https://reviews.freebsd.org/D15430 -- Thomas Munro http://www.enterprisedb.com
On Tue, May 22, 2018 at 12:07 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Mon, May 21, 2018 at 7:27 PM, Mateusz Guzik <mjguzik@gmail.com> wrote: > > I have benchmarked the change on a FreeBSD box and found an big > > performance win once the number of clients goes beyond the number of > > hardware threads on the target machine. For smaller number of clients > > the win was very modest. > > So to summarise your results: > > 32 connections: ~445k -> ~450k = +1.2% > 64 connections: ~416k -> ~544k = +30.7% > 96 connections: ~331k -> ~508k = +53.6% I would like to commit this patch for PostgreSQL 12, based on this report. We know it helps performance on macOS developer machines and big FreeBSD servers, and it is the right kernel interface for the job on principle. Matteo Beccati reported a 5-10% performance drop on a low-end Celeron NetBSD box which we have no explanation for, and we have no reports from server-class machines on that OS -- so perhaps we (or the NetBSD port?) should consider building with WAIT_USE_POLL on NetBSD until someone can figure out what needs to be fixed there (possibly on the NetBSD side)? Here's a rebased patch, which I'm adding to the to November CF to give people time to retest, object, etc if they want to. -- Thomas Munro http://www.enterprisedb.com
Attachment
Hi, On 2018-09-28 10:55:13 +1200, Thomas Munro wrote: > On Tue, May 22, 2018 at 12:07 PM Thomas Munro > <thomas.munro@enterprisedb.com> wrote: > > On Mon, May 21, 2018 at 7:27 PM, Mateusz Guzik <mjguzik@gmail.com> wrote: > > > I have benchmarked the change on a FreeBSD box and found an big > > > performance win once the number of clients goes beyond the number of > > > hardware threads on the target machine. For smaller number of clients > > > the win was very modest. > > > > So to summarise your results: > > > > 32 connections: ~445k -> ~450k = +1.2% > > 64 connections: ~416k -> ~544k = +30.7% > > 96 connections: ~331k -> ~508k = +53.6% > > I would like to commit this patch for PostgreSQL 12, based on this > report. We know it helps performance on macOS developer machines and > big FreeBSD servers, and it is the right kernel interface for the job > on principle. Seems reasonable. > Matteo Beccati reported a 5-10% performance drop on a > low-end Celeron NetBSD box which we have no explanation for, and we > have no reports from server-class machines on that OS -- so perhaps we > (or the NetBSD port?) should consider building with WAIT_USE_POLL on > NetBSD until someone can figure out what needs to be fixed there > (possibly on the NetBSD side)? Yea, I'm not too worried about that. It'd be great to test that, but otherwise I'm also ok to just plonk that into the template. > @@ -576,6 +592,10 @@ CreateWaitEventSet(MemoryContext context, int nevents) > if (fcntl(set->epoll_fd, F_SETFD, FD_CLOEXEC) == -1) > elog(ERROR, "fcntl(F_SETFD) failed on epoll descriptor: %m"); > #endif /* EPOLL_CLOEXEC */ > +#elif defined(WAIT_USE_KQUEUE) > + set->kqueue_fd = kqueue(); > + if (set->kqueue_fd < 0) > + elog(ERROR, "kqueue failed: %m"); > #elif defined(WAIT_USE_WIN32) Is this automatically opened with some FD_CLOEXEC equivalent? > +static inline void > +WaitEventAdjustKqueueAdd(struct kevent *k_ev, int filter, int action, > + WaitEvent *event) > +{ > + k_ev->ident = event->fd; > + k_ev->filter = filter; > + k_ev->flags = action | EV_CLEAR; > + k_ev->fflags = 0; > + k_ev->data = 0; > + > + /* > + * On most BSD family systems, udata is of type void * so we could simply > + * assign event to it without casting, or use the EV_SET macro instead of > + * filling in the struct manually. Unfortunately, NetBSD and possibly > + * others have it as intptr_t, so here we wallpaper over that difference > + * with an unsightly lvalue cast. > + */ > + *((WaitEvent **)(&k_ev->udata)) = event; I'm mildly inclined to hide that behind a macro, so the other places have a reference, via the macro definition, to this too. > + if (rc < 0 && event->events == WL_POSTMASTER_DEATH && errno == ESRCH) > + { > + /* > + * The postmaster is already dead. Defer reporting this to the caller > + * until wait time, for compatibility with the other implementations. > + * To do that we will now add the regular alive pipe. > + */ > + WaitEventAdjustKqueueAdd(&k_ev[0], EVFILT_READ, EV_ADD, event); > + rc = kevent(set->kqueue_fd, &k_ev[0], count, NULL, 0, NULL); > + } That's, ... not particulary pretty. Kinda wonder if we shouldn't instead just add a 'pending_events' field, that we can check at wait time. > diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in > index 90dda8ea050..4bcabc3b381 100644 > --- a/src/include/pg_config.h.in > +++ b/src/include/pg_config.h.in > @@ -330,6 +330,9 @@ > /* Define to 1 if you have isinf(). */ > #undef HAVE_ISINF > > +/* Define to 1 if you have the `kqueue' function. */ > +#undef HAVE_KQUEUE > + > /* Define to 1 if you have the <langinfo.h> header file. */ > #undef HAVE_LANGINFO_H > > @@ -598,6 +601,9 @@ > /* Define to 1 if you have the <sys/epoll.h> header file. */ > #undef HAVE_SYS_EPOLL_H > > +/* Define to 1 if you have the <sys/event.h> header file. */ > +#undef HAVE_SYS_EVENT_H > + > /* Define to 1 if you have the <sys/ipc.h> header file. */ > #undef HAVE_SYS_IPC_H Should adjust pg_config.win32.h too. Greetings, Andres Freund
Hi Thomas, On 28/09/2018 00:55, Thomas Munro wrote: > I would like to commit this patch for PostgreSQL 12, based on this > report. We know it helps performance on macOS developer machines and > big FreeBSD servers, and it is the right kernel interface for the job > on principle. Matteo Beccati reported a 5-10% performance drop on a > low-end Celeron NetBSD box which we have no explanation for, and we > have no reports from server-class machines on that OS -- so perhaps we > (or the NetBSD port?) should consider building with WAIT_USE_POLL on > NetBSD until someone can figure out what needs to be fixed there > (possibly on the NetBSD side)? Thanks for keeping me in the loop. Out of curiosity (and time permitting) I'll try to spin up a NetBSD 8 VM and run some benchmarks, but I guess we should leave it up to the pkgsrc people to eventually change the build flags. Cheers -- Matteo Beccati Development & Consulting - http://www.beccati.com/
On Fri, Sep 28, 2018 at 11:09 AM Andres Freund <andres@anarazel.de> wrote: > On 2018-09-28 10:55:13 +1200, Thomas Munro wrote: > > Matteo Beccati reported a 5-10% performance drop on a > > low-end Celeron NetBSD box which we have no explanation for, and we > > have no reports from server-class machines on that OS -- so perhaps we > > (or the NetBSD port?) should consider building with WAIT_USE_POLL on > > NetBSD until someone can figure out what needs to be fixed there > > (possibly on the NetBSD side)? > > Yea, I'm not too worried about that. It'd be great to test that, but > otherwise I'm also ok to just plonk that into the template. Thanks for the review! Ok, if we don't get a better idea I'll put this in src/template/netbsd: CPPFLAGS="$CPPFLAGS -DWAIT_USE_POLL" > > @@ -576,6 +592,10 @@ CreateWaitEventSet(MemoryContext context, int nevents) > > if (fcntl(set->epoll_fd, F_SETFD, FD_CLOEXEC) == -1) > > elog(ERROR, "fcntl(F_SETFD) failed on epoll descriptor: %m"); > > #endif /* EPOLL_CLOEXEC */ > > +#elif defined(WAIT_USE_KQUEUE) > > + set->kqueue_fd = kqueue(); > > + if (set->kqueue_fd < 0) > > + elog(ERROR, "kqueue failed: %m"); > > #elif defined(WAIT_USE_WIN32) > > Is this automatically opened with some FD_CLOEXEC equivalent? No. Hmm, I thought it wasn't necessary because kqueue descriptors are not inherited and backends don't execve() directly without forking, but I guess it can't hurt to add a fcntl() call. Done. > > + *((WaitEvent **)(&k_ev->udata)) = event; > > I'm mildly inclined to hide that behind a macro, so the other places > have a reference, via the macro definition, to this too. Done. > > + if (rc < 0 && event->events == WL_POSTMASTER_DEATH && errno == ESRCH) > > + { > > + /* > > + * The postmaster is already dead. Defer reporting this to the caller > > + * until wait time, for compatibility with the other implementations. > > + * To do that we will now add the regular alive pipe. > > + */ > > + WaitEventAdjustKqueueAdd(&k_ev[0], EVFILT_READ, EV_ADD, event); > > + rc = kevent(set->kqueue_fd, &k_ev[0], count, NULL, 0, NULL); > > + } > > That's, ... not particulary pretty. Kinda wonder if we shouldn't instead > just add a 'pending_events' field, that we can check at wait time. Done. > > +/* Define to 1 if you have the `kqueue' function. */ > > +#undef HAVE_KQUEUE > > + > Should adjust pg_config.win32.h too. Done. -- Thomas Munro http://www.enterprisedb.com
Attachment
On 28/09/2018 14:19, Thomas Munro wrote: > On Fri, Sep 28, 2018 at 11:09 AM Andres Freund <andres@anarazel.de> wrote: >> On 2018-09-28 10:55:13 +1200, Thomas Munro wrote: >>> Matteo Beccati reported a 5-10% performance drop on a >>> low-end Celeron NetBSD box which we have no explanation for, and we >>> have no reports from server-class machines on that OS -- so perhaps we >>> (or the NetBSD port?) should consider building with WAIT_USE_POLL on >>> NetBSD until someone can figure out what needs to be fixed there >>> (possibly on the NetBSD side)? >> >> Yea, I'm not too worried about that. It'd be great to test that, but >> otherwise I'm also ok to just plonk that into the template. > > Thanks for the review! Ok, if we don't get a better idea I'll put > this in src/template/netbsd: > > CPPFLAGS="$CPPFLAGS -DWAIT_USE_POLL" A quick test on a 8 vCPU / 4GB RAM virtual machine running a fresh install of NetBSD 8.0 again shows that kqueue is consistently slower running pgbench vs unpatched master on tcp-b like pgbench workloads: ~1200tps vs ~1400tps w/ 96 clients and threads, scale factor 10 while on select only benchmarks the difference is below the noise floor, with both doing roughly the same ~30k tps. Out of curiosity, I've installed FreBSD on an identically specced VM, and the select benchmark was ~75k tps for kqueue vs ~90k tps on unpatched master, so maybe there's something wrong I'm doing when benchmarking. Could you please provide proper instructions? Cheers -- Matteo Beccati Development & Consulting - http://www.beccati.com/
On Sat, Sep 29, 2018 at 7:51 PM Matteo Beccati <php@beccati.com> wrote: > On 28/09/2018 14:19, Thomas Munro wrote: > > On Fri, Sep 28, 2018 at 11:09 AM Andres Freund <andres@anarazel.de> wrote: > >> On 2018-09-28 10:55:13 +1200, Thomas Munro wrote: > >>> Matteo Beccati reported a 5-10% performance drop on a > >>> low-end Celeron NetBSD box which we have no explanation for, and we > >>> have no reports from server-class machines on that OS -- so perhaps we > >>> (or the NetBSD port?) should consider building with WAIT_USE_POLL on > >>> NetBSD until someone can figure out what needs to be fixed there > >>> (possibly on the NetBSD side)? > >> > >> Yea, I'm not too worried about that. It'd be great to test that, but > >> otherwise I'm also ok to just plonk that into the template. > > > > Thanks for the review! Ok, if we don't get a better idea I'll put > > this in src/template/netbsd: > > > > CPPFLAGS="$CPPFLAGS -DWAIT_USE_POLL" > > A quick test on a 8 vCPU / 4GB RAM virtual machine running a fresh > install of NetBSD 8.0 again shows that kqueue is consistently slower > running pgbench vs unpatched master on tcp-b like pgbench workloads: > > ~1200tps vs ~1400tps w/ 96 clients and threads, scale factor 10 > > while on select only benchmarks the difference is below the noise floor, > with both doing roughly the same ~30k tps. > > Out of curiosity, I've installed FreBSD on an identically specced VM, > and the select benchmark was ~75k tps for kqueue vs ~90k tps on > unpatched master, so maybe there's something wrong I'm doing when > benchmarking. Could you please provide proper instructions? Ouch. What kind of virtualisation is this? Which version of FreeBSD? Not sure if it's relevant, but do you happen to see gettimeofday() showing up as a syscall, if you truss a backend running pgbench? -- Thomas Munro http://www.enterprisedb.com
Hi Thomas, On 30/09/2018 04:36, Thomas Munro wrote: > On Sat, Sep 29, 2018 at 7:51 PM Matteo Beccati <php@beccati.com> wrote: >> Out of curiosity, I've installed FreBSD on an identically specced VM, >> and the select benchmark was ~75k tps for kqueue vs ~90k tps on >> unpatched master, so maybe there's something wrong I'm doing when >> benchmarking. Could you please provide proper instructions? > > Ouch. What kind of virtualisation is this? Which version of FreeBSD? > Not sure if it's relevant, but do you happen to see gettimeofday() > showing up as a syscall, if you truss a backend running pgbench? I downloaded 11.2 as VHD file in order to run on MS Hyper-V / Win10 Pro. Yes, I saw plenty of gettimeofday calls when running truss: > gettimeofday({ 1538297117.071344 },0x0) = 0 (0x0) > gettimeofday({ 1538297117.071743 },0x0) = 0 (0x0) > gettimeofday({ 1538297117.072021 },0x0) = 0 (0x0) > getpid() = 766 (0x2fe) > __sysctl(0x7fffffffce90,0x4,0x0,0x0,0x801891000,0x2b) = 0 (0x0) > gettimeofday({ 1538297117.072944 },0x0) = 0 (0x0) > getpid() = 766 (0x2fe) > __sysctl(0x7fffffffce90,0x4,0x0,0x0,0x801891000,0x29) = 0 (0x0) > gettimeofday({ 1538297117.073682 },0x0) = 0 (0x0) > sendto(9,"2\0\0\0\^DT\0\0\0!\0\^Aabalance"...,71,0,NULL,0) = 71 (0x47) > recvfrom(9,"B\0\0\0\^\\0P0_1\0\0\0\0\^A\0\0"...,8192,0,NULL,0x0) = 51 (0x33) > gettimeofday({ 1538297117.074955 },0x0) = 0 (0x0) > gettimeofday({ 1538297117.075308 },0x0) = 0 (0x0) > getpid() = 766 (0x2fe) > __sysctl(0x7fffffffce90,0x4,0x0,0x0,0x801891000,0x29) = 0 (0x0) > gettimeofday({ 1538297117.076252 },0x0) = 0 (0x0) > gettimeofday({ 1538297117.076431 },0x0) = 0 (0x0) > gettimeofday({ 1538297117.076678 },0x0^C) = 0 (0x0) Cheers -- Matteo Beccati Development & Consulting - http://www.beccati.com/
On Sun, Sep 30, 2018 at 9:49 PM Matteo Beccati <php@beccati.com> wrote: > On 30/09/2018 04:36, Thomas Munro wrote: > > On Sat, Sep 29, 2018 at 7:51 PM Matteo Beccati <php@beccati.com> wrote: > >> Out of curiosity, I've installed FreBSD on an identically specced VM, > >> and the select benchmark was ~75k tps for kqueue vs ~90k tps on > >> unpatched master, so maybe there's something wrong I'm doing when > >> benchmarking. Could you please provide proper instructions? > > > > Ouch. What kind of virtualisation is this? Which version of FreeBSD? > > Not sure if it's relevant, but do you happen to see gettimeofday() > > showing up as a syscall, if you truss a backend running pgbench? > > I downloaded 11.2 as VHD file in order to run on MS Hyper-V / Win10 Pro. > > Yes, I saw plenty of gettimeofday calls when running truss: > > > gettimeofday({ 1538297117.071344 },0x0) = 0 (0x0) > > gettimeofday({ 1538297117.071743 },0x0) = 0 (0x0) > > gettimeofday({ 1538297117.072021 },0x0) = 0 (0x0) Ok. Those syscalls show up depending on your kern.timecounter.hardware setting and virtualised hardware: just like on Linux, gettimeofday() can be a cheap userspace operation (vDSO) that avoids the syscall path, or not. I'm not seeing any reason to think that's relevant here. > > getpid() = 766 (0x2fe) > > __sysctl(0x7fffffffce90,0x4,0x0,0x0,0x801891000,0x2b) = 0 (0x0) > > gettimeofday({ 1538297117.072944 },0x0) = 0 (0x0) > > getpid() = 766 (0x2fe) > > __sysctl(0x7fffffffce90,0x4,0x0,0x0,0x801891000,0x29) = 0 (0x0) That's setproctitle(). Those syscalls go away if you use FreeBSD 12 (which has setproctitle_fast()). If you fix both of those problems, you are left with just: > > sendto(9,"2\0\0\0\^DT\0\0\0!\0\^Aabalance"...,71,0,NULL,0) = 71 (0x47) > > recvfrom(9,"B\0\0\0\^\\0P0_1\0\0\0\0\^A\0\0"...,8192,0,NULL,0x0) = 51 (0x33) These are the only syscalls I see for each pgbench -S transaction on my bare metal machine: just the network round trip. The funny thing is ... there are almost no kevent() calls. I managed to reproduce the regression (~70k -> ~50k) using a prewarmed scale 10 select-only pgbench with 2GB of shared_buffers (so it all fits), with -j 96 -c 96 on an 8 vCPU AWS t2.2xlarge running FreeBSD 12 ALPHA8. Here is what truss -c says, capturing data from one backend for about 10 seconds: syscall seconds calls errors sendto 0.396840146 3452 0 recvfrom 0.415802029 3443 6 kevent 0.000626393 6 0 gettimeofday 2.723923249 24053 0 ------------- ------- ------- 3.537191817 30954 6 (There's no regression with -j 8 -c 8, the problem is when significantly overloaded, the same circumstances under which Matheusz reported a great improvement). So... it's very rarely accessing the kqueue directly... but its existence somehow slows things down. Curiously, when using poll() it's actually calling poll() ~90/sec for me: syscall seconds calls errors sendto 0.352784808 3226 0 recvfrom 0.614855254 4125 916 poll 0.319396480 916 0 gettimeofday 2.659035352 22456 0 ------------- ------- ------- 3.946071894 30723 916 I don't know what's going on here. Based on the reports so far, we know that kqueue gives a speedup when using bare metal with pgbench running on a different machine, but a slowdown when using virtualisation and pgbench running on the same machine (and I just checked that that's observable with both Unix sockets and TCP sockets). That gave me the idea of looking at pgbench itself: Unpatched: syscall seconds calls errors ppoll 0.004869268 1 0 sendto 16.489416911 7033 0 recvfrom 21.137606238 7049 0 ------------- ------- ------- 37.631892417 14083 0 Patched: syscall seconds calls errors ppoll 0.002773195 1 0 sendto 16.597880468 7217 0 recvfrom 25.646406008 7238 0 ------------- ------- ------- 42.247059671 14456 0 I don't know why the existence of the kqueue should make recvfrom() slower on the pgbench side. That's probably something to look into off-line with some FreeBSD guru help. Degraded performance for clients on the same machine does seem to be a show stopper for this patch for now. Thanks for testing! -- Thomas Munro http://www.enterprisedb.com
Hi Thomas, On 01/10/2018 01:09, Thomas Munro wrote: > I don't know why the existence of the kqueue should make recvfrom() > slower on the pgbench side. That's probably something to look into > off-line with some FreeBSD guru help. Degraded performance for > clients on the same machine does seem to be a show stopper for this > patch for now. Thanks for testing! Glad to be helpful! I've tried running pgbench from a separate VM and in fact kqueue consistently takes the lead with 5-10% more tps on select/prepared pgbench on NetBSD too. What I have observed is that sys cpu usage is ~65% (35% idle) with kqueue, while unpatched master averages at 55% (45% idle): relatively speaking that's almost 25% less idle cpu available for a local pgbench to do its own stuff. Running pgbench locally shows an average 47% usr / 53% sys cpu distribution w/ kqueue vs more like 50-50 w/ vanilla, so I'm inclined to think that's the reason why we see a performance drop instead. Thoguhts? Cheers -- Matteo Beccati Development & Consulting - http://www.beccati.com/
On 2018-10-01 19:25:45 +0200, Matteo Beccati wrote: > On 01/10/2018 01:09, Thomas Munro wrote: > > I don't know why the existence of the kqueue should make recvfrom() > > slower on the pgbench side. That's probably something to look into > > off-line with some FreeBSD guru help. Degraded performance for > > clients on the same machine does seem to be a show stopper for this > > patch for now. Thanks for testing! > > Glad to be helpful! > > I've tried running pgbench from a separate VM and in fact kqueue > consistently takes the lead with 5-10% more tps on select/prepared pgbench > on NetBSD too. > > What I have observed is that sys cpu usage is ~65% (35% idle) with kqueue, > while unpatched master averages at 55% (45% idle): relatively speaking > that's almost 25% less idle cpu available for a local pgbench to do its own > stuff. This suggest that either the the wakeup logic between kqueue and poll, or the internal locking could be at issue. Is it possible that poll triggers a directed wakeup path, but kqueue doesn't? Greetings, Andres Freund
On Tue, Oct 2, 2018 at 6:28 AM Andres Freund <andres@anarazel.de> wrote: > On 2018-10-01 19:25:45 +0200, Matteo Beccati wrote: > > On 01/10/2018 01:09, Thomas Munro wrote: > > > I don't know why the existence of the kqueue should make recvfrom() > > > slower on the pgbench side. That's probably something to look into > > > off-line with some FreeBSD guru help. Degraded performance for > > > clients on the same machine does seem to be a show stopper for this > > > patch for now. Thanks for testing! > > > > Glad to be helpful! > > > > I've tried running pgbench from a separate VM and in fact kqueue > > consistently takes the lead with 5-10% more tps on select/prepared pgbench > > on NetBSD too. > > > > What I have observed is that sys cpu usage is ~65% (35% idle) with kqueue, > > while unpatched master averages at 55% (45% idle): relatively speaking > > that's almost 25% less idle cpu available for a local pgbench to do its own > > stuff. > > This suggest that either the the wakeup logic between kqueue and poll, > or the internal locking could be at issue. Is it possible that poll > triggers a directed wakeup path, but kqueue doesn't? I am following up with some kernel hackers. In the meantime, here is a rebase for the new split-line configure.in, to turn cfbot green. -- Thomas Munro http://www.enterprisedb.com
Attachment
> On Apr 10, 2018, at 9:05 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > > On Wed, Dec 6, 2017 at 12:53 AM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> On Thu, Jun 22, 2017 at 7:19 PM, Thomas Munro >> <thomas.munro@enterprisedb.com> wrote: >>> I don't plan to resubmit this patch myself, but I was doing some >>> spring cleaning and rebasing today and I figured it might be worth >>> quietly leaving a working patch here just in case anyone from the >>> various BSD communities is interested in taking the idea further. > > I heard through the grapevine of some people currently investigating > performance problems on busy FreeBSD systems, possibly related to the > postmaster pipe. I suspect this patch might be a part of the solution > (other patches probably needed to get maximum value out of this patch: > reuse WaitEventSet objects in some key places, and get rid of high > frequency PostmasterIsAlive() read() calls). The autoconf-fu in the > last version bit-rotted so it seemed like a good time to post a > rebased patch. > > -- > Thomas Munro > http://www.enterprisedb.com > <kqueue-v9.patch> Hi, I’m instrested in the kqueue patch and would like to know its current state and possible timeline for inclusion in the basecode. I have several large FreeBSD systems running PostgreSQL 11 that I believe currently displays this issue. Thesystem has 88 vCPUs, 512GB Ram, and very active application with over 1000 connections to the database. The system exhibitshigh kernel CPU usage servicing poll() for connections that are idle. I’ve being testing pg_bouncer to reduce the number of connections and thus system CPU usage; however, not all connectionscan go through pg_bouncer. Thanks, Rui.
On Fri, Dec 20, 2019 at 12:41 PM Rui DeSousa <rui@crazybean.net> wrote: > I’m instrested in the kqueue patch and would like to know its current state and possible timeline for inclusion in thebase code. I have several large FreeBSD systems running PostgreSQL 11 that I believe currently displays this issue. The system has 88 vCPUs, 512GB Ram, and very active application with over 1000 connections to the database. The system exhibitshigh kernel CPU usage servicing poll() for connections that are idle. Hi Rui, It's still my intention to get this committed eventually, but I got a bit frazzled by conflicting reports on several operating systems. For FreeBSD, performance was improved in many cases, but there were also some regressions that seemed to be related to ongoing work in the kernel that seemed worth waiting for. I don't have the details swapped into my brain right now, but there was something about a big kernel lock for Unix domain sockets which possibly explained some local pgbench problems, and there was also a problem relating to wakeup priority with some test parameters, which I'd need to go and dig up. If you want to test this and let us know how you get on, that'd be great! Here's a rebase against PostgreSQL's master branch, and since you mentioned PostgreSQL 11, here's a rebased version for REL_11_STABLE in case that's easier for you to test/build via ports or whatever and test with your production workload (eg on a throwaway copy of your production system). You can see it's working by looking in top: instead of state "select" (which is how poll() is reported) you see "kqread", which on its own isn't exciting enough to get this committed :-) PS Here's a list of slow burner PostgreSQL/FreeBSD projects: https://wiki.postgresql.org/wiki/FreeBSD
Attachment
On Fri, Dec 20, 2019 at 1:26 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Fri, Dec 20, 2019 at 12:41 PM Rui DeSousa <rui@crazybean.net> wrote: > > PostgreSQL 11 BTW, PostgreSQL 12 has an improvement that may be relevant for your case: it suppresses a bunch of high frequency reads on the "postmaster death" pipe in some scenarios, mainly the streaming replica replay loop (if you build on a system new enough to have PROC_PDEATHSIG_CTL, namely FreeBSD 11.2+, it doesn't bother reading the pipe unless it's received a signal). That pipe is inherited by every process and included in every poll() set. The kqueue patch doesn't even bother to add it to the wait event set, preferring to use an EVFILT_PROC event, so in theory we could get rid of the death pipe completely on FreeBSD and rely on EVFILT_PROC (sleeping) and PDEATHSIG (while awake), but I wouldn't want to make the code diverge from the Linux code too much, so I figured we should leave the pipe in place but just avoid accessing it when possible, if that makes sense.
Thanks Thomas, Just a quick update. I just deployed this patch into a lower environment yesterday running FreeBSD 12.1 and PostgreSQL 11.6. I see a significantreduction is CPU/system load from load highs of 500+ down to the low 20’s. System CPU time has been reduced topractically nothing. I’m working with our support vendor in testing the patch and will continue to let it burn in. Hopefully, we can get thepatched committed. Thanks. > On Dec 19, 2019, at 7:26 PM, Thomas Munro <thomas.munro@gmail.com> wrote: > > It's still my intention to get this committed eventually, but I got a > bit frazzled by conflicting reports on several operating systems. For > FreeBSD, performance was improved in many cases, but there were also > some regressions that seemed to be related to ongoing work in the > kernel that seemed worth waiting for. I don't have the details > swapped into my brain right now, but there was something about a big > kernel lock for Unix domain sockets which possibly explained some > local pgbench problems, and there was also a problem relating to > wakeup priority with some test parameters, which I'd need to go and > dig up. If you want to test this and let us know how you get on, > that'd be great! Here's a rebase against PostgreSQL's master branch, > and since you mentioned PostgreSQL 11, here's a rebased version for > REL_11_STABLE in case that's easier for you to test/build via ports or > whatever and test with your production workload (eg on a throwaway > copy of your production system). You can see it's working by looking > in top: instead of state "select" (which is how poll() is reported) > you see "kqread", which on its own isn't exciting enough to get this > committed :-) >
On 2019-12-20 01:26, Thomas Munro wrote: > It's still my intention to get this committed eventually, but I got a > bit frazzled by conflicting reports on several operating systems. For > FreeBSD, performance was improved in many cases, but there were also > some regressions that seemed to be related to ongoing work in the > kernel that seemed worth waiting for. I don't have the details > swapped into my brain right now, but there was something about a big > kernel lock for Unix domain sockets which possibly explained some > local pgbench problems, and there was also a problem relating to > wakeup priority with some test parameters, which I'd need to go and > dig up. If you want to test this and let us know how you get on, > that'd be great! Here's a rebase against PostgreSQL's master branch, I took this patch for a quick spin on macOS. The result was that the test suite hangs in the test src/test/recovery/t/017_shm.pl. I didn't see any mentions of this anywhere in the thread, but that test is newer than the beginning of this thread. Can anyone confirm or deny this issue? Is it specific to macOS perhaps? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > I took this patch for a quick spin on macOS. The result was that the > test suite hangs in the test src/test/recovery/t/017_shm.pl. I didn't > see any mentions of this anywhere in the thread, but that test is newer > than the beginning of this thread. Can anyone confirm or deny this > issue? Is it specific to macOS perhaps? Yeah, I duplicated the problem in macOS Catalina (10.15.2), using today's HEAD. The core regression tests pass, as do the earlier recovery tests (I didn't try a full check-world though). Somewhere early in 017_shm.pl, things freeze up with four postmaster-child processes stuck in 100%- CPU-consuming loops. I captured stack traces: (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP * frame #0: 0x00007fff6554dbb6 libsystem_kernel.dylib`kqueue + 10 frame #1: 0x0000000105511533 postgres`CreateWaitEventSet(context=<unavailable>, nevents=<unavailable>) at latch.c:622:19[opt] frame #2: 0x0000000105511305 postgres`WaitLatchOrSocket(latch=0x0000000112e02da4, wakeEvents=41, sock=-1, timeout=237000,wait_event_info=83886084) at latch.c:389:22 [opt] frame #3: 0x00000001054a7073 postgres`CheckpointerMain at checkpointer.c:514:10 [opt] frame #4: 0x00000001052da390 postgres`AuxiliaryProcessMain(argc=2, argv=0x00007ffeea9dded0) at bootstrap.c:461:4 [opt] (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP * frame #0: 0x00007fff6554dbce libsystem_kernel.dylib`kevent + 10 frame #1: 0x0000000105511ddc postgres`WaitEventAdjustKqueue(set=0x00007fc8e8805920, event=0x00007fc8e8805958, old_events=<unavailable>)at latch.c:1034:7 [opt] frame #2: 0x0000000105511638 postgres`AddWaitEventToSet(set=<unavailable>, events=<unavailable>, fd=<unavailable>, latch=<unavailable>,user_data=<unavailable>) at latch.c:778:2 [opt] frame #3: 0x0000000105511342 postgres`WaitLatchOrSocket(latch=0x0000000112e030f4, wakeEvents=41, sock=-1, timeout=200,wait_event_info=83886083) at latch.c:397:3 [opt] frame #4: 0x00000001054a6d69 postgres`BackgroundWriterMain at bgwriter.c:304:8 [opt] frame #5: 0x00000001052da38b postgres`AuxiliaryProcessMain(argc=2, argv=0x00007ffeea9dded0) at bootstrap.c:456:4 [opt] (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP * frame #0: 0x00007fff65549c66 libsystem_kernel.dylib`close + 10 frame #1: 0x0000000105511466 postgres`WaitLatchOrSocket [inlined] FreeWaitEventSet(set=<unavailable>) at latch.c:660:2[opt] frame #2: 0x000000010551145d postgres`WaitLatchOrSocket(latch=0x0000000112e03444, wakeEvents=<unavailable>, sock=-1,timeout=5000, wait_event_info=83886093) at latch.c:432 [opt] frame #3: 0x00000001054b8685 postgres`WalWriterMain at walwriter.c:256:10 [opt] frame #4: 0x00000001052da39a postgres`AuxiliaryProcessMain(argc=2, argv=0x00007ffeea9dded0) at bootstrap.c:467:4 [opt] (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP * frame #0: 0x00007fff655515be libsystem_kernel.dylib`__select + 10 frame #1: 0x00000001056a6191 postgres`pg_usleep(microsec=<unavailable>) at pgsleep.c:56:10 [opt] frame #2: 0x00000001054abe12 postgres`backend_read_statsfile at pgstat.c:5720:3 [opt] frame #3: 0x00000001054adcc0 postgres`pgstat_fetch_stat_dbentry(dbid=<unavailable>) at pgstat.c:2431:2 [opt] frame #4: 0x00000001054a320c postgres`do_start_worker at autovacuum.c:1248:20 [opt] frame #5: 0x00000001054a2639 postgres`AutoVacLauncherMain [inlined] launch_worker(now=632853327674576) at autovacuum.c:1357:9[opt] frame #6: 0x00000001054a2634 postgres`AutoVacLauncherMain(argc=<unavailable>, argv=<unavailable>) at autovacuum.c:769[opt] frame #7: 0x00000001054a1ea7 postgres`StartAutoVacLauncher at autovacuum.c:415:4 [opt] I'm not sure how much faith to put in the last couple of those, as stopping the earlier processes could perhaps have had side-effects. But evidently 017_shm.pl is doing something that interferes with our ability to create kqueue-based WaitEventSets. regards, tom lane
Thomas Munro <thomas.munro@gmail.com> writes: > [ 0001-Add-kqueue-2-support-for-WaitEventSet-v13.patch ] I haven't read this patch in any detail, but a couple quick notes: * It needs to be rebased over the removal of pg_config.h.win32 --- it should be touching Solution.pm instead, I believe. * I'm disturbed by the addition of a hunk to the supposedly system-API-independent WaitEventSetWait() function. Is that a generic bug fix? If not, can we either get rid of it, or at least wrap it in "#ifdef WAIT_USE_KQUEUE" so that this patch isn't inflicting a performance penalty on everyone else? regards, tom lane
On Tue, Jan 21, 2020 at 2:34 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > I took this patch for a quick spin on macOS. The result was that the > test suite hangs in the test src/test/recovery/t/017_shm.pl. I didn't > see any mentions of this anywhere in the thread, but that test is newer > than the beginning of this thread. Can anyone confirm or deny this > issue? Is it specific to macOS perhaps? Thanks for testing, and sorry I didn't run a full check-world after that rebase. What happened here is that after commit cfdf4dc4 landed on master, every implementation now needs to check for exit_on_postmaster_death, and this patch didn't get the message. Those processes are stuck in their main loops having detected postmaster death, but not having any handling for it. Will fix.
I wrote: > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: >> I took this patch for a quick spin on macOS. The result was that the >> test suite hangs in the test src/test/recovery/t/017_shm.pl. I didn't >> see any mentions of this anywhere in the thread, but that test is newer >> than the beginning of this thread. Can anyone confirm or deny this >> issue? Is it specific to macOS perhaps? > Yeah, I duplicated the problem in macOS Catalina (10.15.2), using today's > HEAD. The core regression tests pass, as do the earlier recovery tests > (I didn't try a full check-world though). Somewhere early in 017_shm.pl, > things freeze up with four postmaster-child processes stuck in 100%- > CPU-consuming loops. I observe very similar behavior on FreeBSD/amd64 12.0-RELEASE-p12, so it's not just macOS. I now think that the autovac launcher isn't actually stuck in the way that the other processes are. The ones that are actually consuming CPU are the checkpointer, bgwriter, and walwriter. On the FreeBSD box their stack traces are (gdb) bt #0 _close () at _close.S:3 #1 0x00000000007b4dd1 in FreeWaitEventSet (set=<optimized out>) at latch.c:660 #2 WaitLatchOrSocket (latch=0x80a1477a8, wakeEvents=<optimized out>, sock=-1, timeout=<optimized out>, wait_event_info=83886084) at latch.c:432 #3 0x000000000074a1b0 in CheckpointerMain () at checkpointer.c:514 #4 0x00000000005691e2 in AuxiliaryProcessMain (argc=2, argv=0x7fffffffce90) at bootstrap.c:461 (gdb) bt #0 _fcntl () at _fcntl.S:3 #1 0x0000000800a6cd84 in fcntl (fd=4, cmd=2) at /usr/src/lib/libc/sys/fcntl.c:56 #2 0x00000000007b4eb5 in CreateWaitEventSet (context=<optimized out>, nevents=<optimized out>) at latch.c:625 #3 0x00000000007b4c82 in WaitLatchOrSocket (latch=0x80a147b00, wakeEvents=41, sock=-1, timeout=200, wait_event_info=83886083) at latch.c:389 #4 0x0000000000749ecd in BackgroundWriterMain () at bgwriter.c:304 #5 0x00000000005691dd in AuxiliaryProcessMain (argc=2, argv=0x7fffffffce90) at bootstrap.c:456 (gdb) bt #0 _kevent () at _kevent.S:3 #1 0x00000000007b58a1 in WaitEventAdjustKqueue (set=0x800e6a120, event=0x800e6a170, old_events=<optimized out>) at latch.c:1034 #2 0x00000000007b4d87 in AddWaitEventToSet (set=<optimized out>, events=<error reading variable: Cannot access memory at address 0x10>, fd=-1, latch=<optimized out>, user_data=<optimized out>) at latch.c:778 #3 WaitLatchOrSocket (latch=0x80a147e58, wakeEvents=41, sock=-1, timeout=5000, wait_event_info=83886093) at latch.c:410 #4 0x000000000075b349 in WalWriterMain () at walwriter.c:256 #5 0x00000000005691ec in AuxiliaryProcessMain (argc=2, argv=0x7fffffffce90) at bootstrap.c:467 Note that these are just snapshots --- it looks like these processes are repeatedly creating and destroying WaitEventSets, they're not stuck inside the kernel. regards, tom lane
On Tue, Jan 21, 2020 at 8:03 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I observe very similar behavior on FreeBSD/amd64 12.0-RELEASE-p12, > so it's not just macOS. Thanks for testing. Fixed by handling the new exit_on_postmaster_death flag from commit cfdf4dc4. On Tue, Jan 21, 2020 at 5:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > [ 0001-Add-kqueue-2-support-for-WaitEventSet-v13.patch ] > > I haven't read this patch in any detail, but a couple quick notes: > > * It needs to be rebased over the removal of pg_config.h.win32 > --- it should be touching Solution.pm instead, I believe. Done. > * I'm disturbed by the addition of a hunk to the supposedly > system-API-independent WaitEventSetWait() function. Is that > a generic bug fix? If not, can we either get rid of it, or > at least wrap it in "#ifdef WAIT_USE_KQUEUE" so that this > patch isn't inflicting a performance penalty on everyone else? Here's a version that adds no new code to non-WAIT_USE_KQUEUE paths. That code deals with the fact that we sometimes discover the postmaster is gone before we're in a position to report an event, so we need an inter-function memory of some kind. The new coding also handles a race case where someone reuses the postmaster's pid before we notice it went away. In theory, the need for that could be entirely removed by collapsing the 'adjust' call into the 'wait' call (a single kevent() invocation can do both things), but I'm not sure if it's worth the complexity. As for generally reducing syscalls noise, for both kqueue and epoll, I think that should be addressed separately by better reuse of WaitEventSet objects[1]. [1] https://www.postgresql.org/message-id/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com
Attachment
Hi, On 21/01/2020 02:06, Thomas Munro wrote: > [1] https://www.postgresql.org/message-id/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com I had a NetBSD 8.0 VM lying around and I gave the patch a spin on latest master. With the kqueue patch, a pgbench -c basically hangs the whole postgres instance. Not sure if it's a kernel issue, HyperVM issue o what, but when it hangs, I can't even kill -9 the postgres processes or get the VM to properly shutdown. The same doesn't happen, of course, with vanilla postgres. If the patch gets merged, I'd say it's safer not to enable it on NetBSD and eventually leave it up to the pkgsrc team. Cheers -- Matteo Beccati Development & Consulting - http://www.beccati.com/
Matteo Beccati <php@beccati.com> writes: > On 21/01/2020 02:06, Thomas Munro wrote: >> [1] https://www.postgresql.org/message-id/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com > I had a NetBSD 8.0 VM lying around and I gave the patch a spin on latest > master. > With the kqueue patch, a pgbench -c basically hangs the whole postgres > instance. Not sure if it's a kernel issue, HyperVM issue o what, but > when it hangs, I can't even kill -9 the postgres processes or get the VM > to properly shutdown. The same doesn't happen, of course, with vanilla > postgres. I'm a bit confused about what you are testing --- the kqueue patch as per this thread, or that plus the WaitLatch refactorizations in the other thread you point to above? I've gotten through check-world successfully with the v14 kqueue patch atop yesterday's HEAD on: * macOS Catalina 10.15.2 (current release) * FreeBSD/amd64 12.0-RELEASE-p12 * NetBSD/amd64 8.1 * NetBSD/arm 8.99.41 * OpenBSD/amd64 6.5 (These OSes are all on bare metal, no VMs involved) This just says it doesn't lock up, of course. I've not attempted any performance-oriented tests. regards, tom lane
On 22/01/2020 17:06, Tom Lane wrote: > Matteo Beccati <php@beccati.com> writes: >> On 21/01/2020 02:06, Thomas Munro wrote: >>> [1] https://www.postgresql.org/message-id/CA%2BhUKGJAC4Oqao%3DqforhNey20J8CiG2R%3DoBPqvfR0vOJrFysGw%40mail.gmail.com > >> I had a NetBSD 8.0 VM lying around and I gave the patch a spin on latest >> master. >> With the kqueue patch, a pgbench -c basically hangs the whole postgres >> instance. Not sure if it's a kernel issue, HyperVM issue o what, but >> when it hangs, I can't even kill -9 the postgres processes or get the VM >> to properly shutdown. The same doesn't happen, of course, with vanilla >> postgres. > > I'm a bit confused about what you are testing --- the kqueue patch > as per this thread, or that plus the WaitLatch refactorizations in > the other thread you point to above? my bad, I tested the v14 patch attached to the email. The quoted url was just above the patch name in the email client and somehow my brain thought I was quoting the v14 patch name. Cheers -- Matteo Beccati Development & Consulting - http://www.beccati.com/
Matteo Beccati <php@beccati.com> writes: > On 22/01/2020 17:06, Tom Lane wrote: >> Matteo Beccati <php@beccati.com> writes: >>> I had a NetBSD 8.0 VM lying around and I gave the patch a spin on latest >>> master. >>> With the kqueue patch, a pgbench -c basically hangs the whole postgres >>> instance. Not sure if it's a kernel issue, HyperVM issue o what, but >>> when it hangs, I can't even kill -9 the postgres processes or get the VM >>> to properly shutdown. The same doesn't happen, of course, with vanilla >>> postgres. >> I'm a bit confused about what you are testing --- the kqueue patch >> as per this thread, or that plus the WaitLatch refactorizations in >> the other thread you point to above? > my bad, I tested the v14 patch attached to the email. Thanks for clarifying. FWIW, I can't replicate the problem here using NetBSD 8.1 amd64 on bare metal. I tried various pgbench parameters up to "-c 20 -j 20" (on a 4-cores-plus-hyperthreading CPU), and it seems fine. One theory is that NetBSD fixed something since 8.0, but I trawled their 8.1 release notes [1], and the only items mentioning kqueue or kevent are for fixes in the pty and tun drivers, neither of which seem relevant. (But wait ... could your VM setup be dependent on a tunnel network interface for outside-the-VM connectivity? Still hard to see the connection though.) My guess is that what you're seeing is a VM bug. regards, tom lane [1] https://cdn.netbsd.org/pub/NetBSD/NetBSD-8.1/CHANGES-8.1
I wrote: > This just says it doesn't lock up, of course. I've not attempted > any performance-oriented tests. I've now done some light performance testing -- just stuff like pgbench -S -M prepared -c 20 -j 20 -T 60 bench I cannot see any improvement on either FreeBSD 12 or NetBSD 8.1, either as to net TPS or as to CPU load. If anything, the TPS rate is a bit lower with the patch, though I'm not sure that that effect is above the noise level. It's certainly possible that to see any benefit you need stress levels above what I can manage on the small box I've got these OSes on. Still, it'd be nice if a performance patch could show some improved performance, before we take any portability risks for it. regards, tom lane
On Jan 22, 2020, at 2:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I cannot see any improvement on either FreeBSD 12 or NetBSD 8.1,
either as to net TPS or as to CPU load. If anything, the TPS
rate is a bit lower with the patch, though I'm not sure that
that effect is above the noise level.
It's certainly possible that to see any benefit you need stress
levels above what I can manage on the small box I've got these
OSes on. Still, it'd be nice if a performance patch could show
some improved performance, before we take any portability risks
for it.
Tom,
Here is two charts comparing a patched and unpatched system. These systems are very large and have just shy of thousand connections each with averages of 20 to 30 active queries concurrently running at times including hundreds if not thousand of queries hitting the database in rapid succession. The effect is the unpatched system generates a lot of system load just handling idle connections where as the patched version is not impacted by idle sessions or sessions that have already received data.
Attachment
On Thu, Jan 23, 2020 at 9:38 AM Rui DeSousa <rui@crazybean.net> wrote: > On Jan 22, 2020, at 2:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It's certainly possible that to see any benefit you need stress >> levels above what I can manage on the small box I've got these >> OSes on. Still, it'd be nice if a performance patch could show >> some improved performance, before we take any portability risks >> for it. You might need more than one CPU socket, or at least lots more cores so that you can create enough contention. That was needed to see the regression caused by commit ac1d794 on Linux[1]. > Here is two charts comparing a patched and unpatched system. > These systems are very large and have just shy of thousand > connections each with averages of 20 to 30 active queries concurrently > running at times including hundreds if not thousand of queries hitting > the database in rapid succession. The effect is the unpatched system > generates a lot of system load just handling idle connections where as > the patched version is not impacted by idle sessions or sessions that > have already received data. Thanks. I can reproduce something like this on an Azure 72-vCPU system, using pgbench -S -c800 -j32. The point of those settings is to have many backends, but they're all alternating between work and sleep. That creates a stream of poll() syscalls, and system time goes through the roof (all CPUs pegged, but it's ~half system). Profiling the kernel with dtrace, I see the most common stack (by a long way) is in a poll-related lock, similar to a profile Rui sent me off-list from his production system. Patched, there is very little system time and the TPS number goes from 539k to 781k. [1] https://www.postgresql.org/message-id/flat/CAB-SwXZh44_2ybvS5Z67p_CDz%3DXFn4hNAD%3DCnMEF%2BQqkXwFrGg%40mail.gmail.com
On Sat, Jan 25, 2020 at 11:29 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Thu, Jan 23, 2020 at 9:38 AM Rui DeSousa <rui@crazybean.net> wrote: > > Here is two charts comparing a patched and unpatched system. > > These systems are very large and have just shy of thousand > > connections each with averages of 20 to 30 active queries concurrently > > running at times including hundreds if not thousand of queries hitting > > the database in rapid succession. The effect is the unpatched system > > generates a lot of system load just handling idle connections where as > > the patched version is not impacted by idle sessions or sessions that > > have already received data. > > Thanks. I can reproduce something like this on an Azure 72-vCPU > system, using pgbench -S -c800 -j32. The point of those settings is > to have many backends, but they're all alternating between work and > sleep. That creates a stream of poll() syscalls, and system time goes > through the roof (all CPUs pegged, but it's ~half system). Profiling > the kernel with dtrace, I see the most common stack (by a long way) is > in a poll-related lock, similar to a profile Rui sent me off-list from > his production system. Patched, there is very little system time and > the TPS number goes from 539k to 781k. If there are no further objections, I'm planning to commit this sooner rather than later, so that it gets plenty of air time on developer and build farm machines. If problems are discovered on a particular platform, there's a pretty good escape hatch: you can define WAIT_USE_POLL, and if it turns out to be necessary, we could always do something in src/template similar to what we do for semaphores.
On Sat, Jan 25, 2020 at 11:29:11AM +1300, Thomas Munro wrote: > On Thu, Jan 23, 2020 at 9:38 AM Rui DeSousa <rui@crazybean.net> wrote: > > On Jan 22, 2020, at 2:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> It's certainly possible that to see any benefit you need stress > >> levels above what I can manage on the small box I've got these > >> OSes on. Still, it'd be nice if a performance patch could show > >> some improved performance, before we take any portability risks > >> for it. > > You might need more than one CPU socket, or at least lots more cores > so that you can create enough contention. That was needed to see the > regression caused by commit ac1d794 on Linux[1]. > > > Here is two charts comparing a patched and unpatched system. > > These systems are very large and have just shy of thousand > > connections each with averages of 20 to 30 active queries concurrently > > running at times including hundreds if not thousand of queries hitting > > the database in rapid succession. The effect is the unpatched system > > generates a lot of system load just handling idle connections where as > > the patched version is not impacted by idle sessions or sessions that > > have already received data. > > Thanks. I can reproduce something like this on an Azure 72-vCPU > system, using pgbench -S -c800 -j32. The point of those settings is > to have many backends, but they're all alternating between work and > sleep. That creates a stream of poll() syscalls, and system time goes > through the roof (all CPUs pegged, but it's ~half system). Profiling > the kernel with dtrace, I see the most common stack (by a long way) is > in a poll-related lock, similar to a profile Rui sent me off-list from > his production system. Patched, there is very little system time and > the TPS number goes from 539k to 781k. > > [1] https://www.postgresql.org/message-id/flat/CAB-SwXZh44_2ybvS5Z67p_CDz%3DXFn4hNAD%3DCnMEF%2BQqkXwFrGg%40mail.gmail.com Just to add some data... I tried the kqueue v14 patch on a AWS EC2 m5a.24xlarge (96 vCPU) with FreeBSD 12.1, driving from a m5.8xlarge (32 vCPU) CentOS 7 system. I also use pgbench with a scale factor of 1000, with -S -c800 -j32. Comparing pg 12.1 vs 13-devel (30012a04): * TPS increased from ~93,000 to ~140,000, ~ 32% increase * system time dropped from ~ 78% to ~ 70%, ~ 8% decrease * user time increased from ~16% to ~ 23%, ~7% increase I don't have any profile data, but I've attached a couple chart showing the processor utilization over a 15 minute interval from the database system. Regards, Mark -- Mark Wong 2ndQuadrant - PostgreSQL Solutions for the Enterprise https://www.2ndQuadrant.com/
Attachment
On Wed, Jan 29, 2020 at 11:54 AM Thomas Munro <thomas.munro@gmail.com> wrote: > If there are no further objections, I'm planning to commit this sooner > rather than later, so that it gets plenty of air time on developer and > build farm machines. If problems are discovered on a particular > platform, there's a pretty good escape hatch: you can define > WAIT_USE_POLL, and if it turns out to be necessary, we could always do > something in src/template similar to what we do for semaphores. I updated the error messages to match the new "unified" style, adjust a couple of comments, and pushed. Thanks to all the people who tested. I'll keep an eye on the build farm.