Re: [HACKERS] kqueue - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: [HACKERS] kqueue
Date
Msg-id CAEepm=1YhBEH9FV_76k5GqzZcK4G+PF7_EqGc4eiMKswFtOYRg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] kqueue  (Matteo Beccati <php@beccati.com>)
Responses Re: [HACKERS] kqueue
List pgsql-hackers
On Sun, Sep 30, 2018 at 9:49 PM Matteo Beccati <php@beccati.com> wrote:
> On 30/09/2018 04:36, Thomas Munro wrote:
> > On Sat, Sep 29, 2018 at 7:51 PM Matteo Beccati <php@beccati.com> wrote:
> >> Out of curiosity, I've installed FreBSD on an identically specced VM,
> >> and the select benchmark was ~75k tps for kqueue vs ~90k tps on
> >> unpatched master, so maybe there's something wrong I'm doing when
> >> benchmarking. Could you please provide proper instructions?
> >
> > Ouch.  What kind of virtualisation is this?  Which version of FreeBSD?
> >  Not sure if it's relevant, but do you happen to see gettimeofday()
> > showing up as a syscall, if you truss a backend running pgbench?
>
> I downloaded 11.2 as VHD file in order to run on MS Hyper-V / Win10 Pro.
>
> Yes, I saw plenty of gettimeofday calls when running truss:
>
> > gettimeofday({ 1538297117.071344 },0x0)          = 0 (0x0)
> > gettimeofday({ 1538297117.071743 },0x0)          = 0 (0x0)
> > gettimeofday({ 1538297117.072021 },0x0)          = 0 (0x0)

Ok.  Those syscalls show up depending on your
kern.timecounter.hardware setting and virtualised hardware: just like
on Linux, gettimeofday() can be a cheap userspace operation (vDSO)
that avoids the syscall path, or not.  I'm not seeing any reason to
think that's relevant here.

> > getpid()                                         = 766 (0x2fe)
> > __sysctl(0x7fffffffce90,0x4,0x0,0x0,0x801891000,0x2b) = 0 (0x0)
> > gettimeofday({ 1538297117.072944 },0x0)          = 0 (0x0)
> > getpid()                                         = 766 (0x2fe)
> > __sysctl(0x7fffffffce90,0x4,0x0,0x0,0x801891000,0x29) = 0 (0x0)

That's setproctitle().  Those syscalls go away if you use FreeBSD 12
(which has setproctitle_fast()).  If you fix both of those problems,
you are left with just:

> > sendto(9,"2\0\0\0\^DT\0\0\0!\0\^Aabalance"...,71,0,NULL,0) = 71 (0x47)
> > recvfrom(9,"B\0\0\0\^\\0P0_1\0\0\0\0\^A\0\0"...,8192,0,NULL,0x0) = 51 (0x33)

These are the only syscalls I see for each pgbench -S transaction on
my bare metal machine: just the network round trip.  The funny thing
is ... there are almost no kevent() calls.

I managed to reproduce the regression (~70k -> ~50k) using a prewarmed
scale 10 select-only pgbench with 2GB of shared_buffers (so it all
fits), with -j 96 -c 96 on an 8 vCPU AWS t2.2xlarge running FreeBSD 12
ALPHA8.  Here is what truss -c says, capturing data from one backend
for about 10 seconds:

syscall                     seconds   calls  errors
sendto                  0.396840146    3452       0
recvfrom                0.415802029    3443       6
kevent                  0.000626393       6       0
gettimeofday            2.723923249   24053       0
                      ------------- ------- -------
                        3.537191817   30954       6

(There's no regression with -j 8 -c 8, the problem is when
significantly overloaded, the same circumstances under which Matheusz
reported a great improvement).  So... it's very rarely accessing the
kqueue directly... but its existence somehow slows things down.
Curiously, when using poll() it's actually calling poll() ~90/sec for
me:

syscall                     seconds   calls  errors
sendto                  0.352784808    3226       0
recvfrom                0.614855254    4125     916
poll                    0.319396480     916       0
gettimeofday            2.659035352   22456       0
                      ------------- ------- -------
                        3.946071894   30723     916

I don't know what's going on here.  Based on the reports so far, we
know that kqueue gives a speedup when using bare metal with pgbench
running on a different machine, but a slowdown when using
virtualisation and pgbench running on the same machine (and I just
checked that that's observable with both Unix sockets and TCP
sockets).  That gave me the idea of looking at pgbench itself:

Unpatched:

syscall                     seconds   calls  errors
ppoll                   0.004869268       1       0
sendto                 16.489416911    7033       0
recvfrom               21.137606238    7049       0
                      ------------- ------- -------
                       37.631892417   14083       0

Patched:

syscall                     seconds   calls  errors
ppoll                   0.002773195       1       0
sendto                 16.597880468    7217       0
recvfrom               25.646406008    7238       0
                      ------------- ------- -------
                       42.247059671   14456       0

I don't know why the existence of the kqueue should make recvfrom()
slower on the pgbench side.  That's probably something to look into
off-line with some FreeBSD guru help.  Degraded performance for
clients on the same machine does seem to be a show stopper for this
patch for now.  Thanks for testing!

-- 
Thomas Munro
http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: executor relation handling
Next
From: Michael Paquier
Date:
Subject: Re: [HACKERS] Optional message to user when terminating/cancellingbackend