Thread: something is wonky with pgbench pipelining

something is wonky with pgbench pipelining

From
Andres Freund
Date:
Hi,

I think something is slightly off with pgbench (or libpq) pipelining. Consider
e.g. the following pgbench workload:

\startpipeline
SELECT 1;
SELECT 1;
SELECT 1;
SELECT 1;
SELECT 1;
SELECT 1;
SELECT 1;
\endpipeline

A pgbench run using that results in in endless repetitions of the below:
pgbench -Mprepared -c 1 -T1000 -f ~/tmp/select1_batch.sql

sendto(3, "B\0\0\0\22\0P0_1\0\0\0\0\0\0\1\0\0D\0\0\0\6P\0E\0\0\0\t\0"..., 257, MSG_NOSIGNAL, NULL, 0) = 257
recvfrom(3, 0x5614032370f0, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
ppoll([{fd=3, events=POLLIN}], 1, NULL, NULL, 8) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "2\0\0\0\4T\0\0\0!\0\1?column?\0\0\0\0\0\0\0\0\0\0\27\0"..., 16384, 0, NULL, NULL) = 461
recvfrom(3, 0x56140323727c, 15988, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x56140323723b, 16053, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x5614032371fa, 16118, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x5614032371b9, 16183, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x561403237178, 16248, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x561403237137, 16313, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x5614032370f6, 16378, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
sendto(3, "B\0\0\0\22\0P0_1\0\0\0\0\0\0\1\0\0D\0\0\0\6P\0E\0\0\0\t\0"..., 257, MSG_NOSIGNAL, NULL, 0) = 257
recvfrom(3, 0x5614032370f0, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
ppoll([{fd=3, events=POLLIN}], 1, NULL, NULL, 8) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "2\0\0\0\4T\0\0\0!\0\1?column?\0\0\0\0\0\0\0\0\0\0\27\0"..., 16384, 0, NULL, NULL) = 461
recvfrom(3, 0x56140323727c, 15988, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x56140323723b, 16053, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x5614032371fa, 16118, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x5614032371b9, 16183, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x561403237178, 16248, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x561403237137, 16313, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x5614032370f6, 16378, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
sendto(3, "B\0\0\0\22\0P0_1\0\0\0\0\0\0\1\0\0D\0\0\0\6P\0E\0\0\0\t\0"..., 257, MSG_NOSIGNAL, NULL, 0) = 257

Note how recvfrom() returning EAGAIN is called 7 times in a row? There's also
7 SQL statements in the workload...

I think what's happening is that the first recvfrom() actually gets all 7
connection results. The server doesn't have any queries to process at that
point. But we ask the kernel whether there is new network input over and over
again, despite having results to process!

With a short pipeline this doesn't matter much. But if it's longer, adding a
syscall for each statement in the pipeline does increase pgbench overhead
measurably. An easy way to avoid that is to put a PQisBusy() && before the
PQconsumeInput().

Comparing pgbench of 100 pipelined SELECT 1;'s, under perf stat yields:

perf stat -e task-clock,raw_syscalls:sys_enter,context-switches,cycles:u,cycles:k,instructions:u,instructions:k \
  schedtool -a 38 -e \
  /home/andres/build/postgres/dev-optimize/vpath/src/bin/pgbench/pgbench -n -Mprepared -c 1 -j1 -T5 -f
~/tmp/select1_batch.sql

default:
...
tps = 3617.823383 (without initial connection time)
...
          1,339.25 msec task-clock                #    0.267 CPUs utilized
         1,880,855      raw_syscalls:sys_enter    #    1.404 M/sec
            18,084      context-switches          #   13.503 K/sec
     3,128,615,558      cycles:u                  #    2.336 GHz
     1,211,509,367      cycles:k                  #    0.905 GHz
     8,000,238,738      instructions:u            #    2.56  insn per cycle
     1,720,276,642      instructions:k            #    1.42  insn per cycle

       5.007540307 seconds time elapsed

       1.004346000 seconds user
       0.376209000 seconds sys

with-isbusy:
...
tps = 3990.424742 (without initial connection time)
...
          1,013.71 msec task-clock                #    0.202 CPUs utilized
            80,203      raw_syscalls:sys_enter    #   79.119 K/sec
            19,947      context-switches          #   19.677 K/sec
     2,943,676,361      cycles:u                  #    2.904 GHz
       346,607,769      cycles:k                  #    0.342 GHz
     8,464,188,379      instructions:u            #    2.88  insn per cycle
       226,665,530      instructions:k            #    0.65  insn per cycle

       5.007539846 seconds time elapsed

       0.906090000 seconds user
       0.151015000 seconds sys


1.8 million fewer syscalls, reduced overall "on cpu" time, and particularly
0.27x of the system time... The user/kernel cycles/instruction split is also
illuminating.

Greetings,

Andres Freund



Re: something is wonky with pgbench pipelining

From
Alvaro Herrera
Date:
On 2021-Jul-20, Andres Freund wrote:

> I think what's happening is that the first recvfrom() actually gets all 7
> connection results. The server doesn't have any queries to process at that
> point. But we ask the kernel whether there is new network input over and over
> again, despite having results to process!

Hmm, yeah, that seems a missed opportunity.

> with-isbusy:
> ...
> tps = 3990.424742 (without initial connection time)
> ...
>           1,013.71 msec task-clock                #    0.202 CPUs utilized
>             80,203      raw_syscalls:sys_enter    #   79.119 K/sec
>             19,947      context-switches          #   19.677 K/sec
>      2,943,676,361      cycles:u                  #    2.904 GHz
>        346,607,769      cycles:k                  #    0.342 GHz
>      8,464,188,379      instructions:u            #    2.88  insn per cycle
>        226,665,530      instructions:k            #    0.65  insn per cycle

This is quite compelling.

If you don't mind I can get this pushed soon in the next couple of days
-- or do you want to do it yourself?

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La espina, desde que nace, ya pincha" (Proverbio africano)



Re: something is wonky with pgbench pipelining

From
Andres Freund
Date:
Hi,

On 2021-07-20 14:57:15 -0400, Alvaro Herrera wrote:
> On 2021-Jul-20, Andres Freund wrote:
> 
> > I think what's happening is that the first recvfrom() actually gets all 7
> > connection results. The server doesn't have any queries to process at that
> > point. But we ask the kernel whether there is new network input over and over
> > again, despite having results to process!
> 
> Hmm, yeah, that seems a missed opportunity.

> > with-isbusy:
> > ...
> > tps = 3990.424742 (without initial connection time)
> > ...
> >           1,013.71 msec task-clock                #    0.202 CPUs utilized
> >             80,203      raw_syscalls:sys_enter    #   79.119 K/sec
> >             19,947      context-switches          #   19.677 K/sec
> >      2,943,676,361      cycles:u                  #    2.904 GHz
> >        346,607,769      cycles:k                  #    0.342 GHz
> >      8,464,188,379      instructions:u            #    2.88  insn per cycle
> >        226,665,530      instructions:k            #    0.65  insn per cycle
> 
> This is quite compelling.
> 
> If you don't mind I can get this pushed soon in the next couple of days
> -- or do you want to do it yourself?

I was thinking of pushing the attached, to both 14 and master, thinking
that was what you meant, but then I wasn't quite sure: It's a relatively
minor performance improvement, after all? OTOH, it arguably also just is
a bit of an API misuse...

I'm inclined to push it to 14 and master, but ...

Greetings,

Andres Freund

Attachment

Re: something is wonky with pgbench pipelining

From
Andres Freund
Date:
Hi,

Adding RMT.

On 2021-07-21 16:55:08 -0700, Andres Freund wrote:
> On 2021-07-20 14:57:15 -0400, Alvaro Herrera wrote:
> > On 2021-Jul-20, Andres Freund wrote:
> > 
> > > I think what's happening is that the first recvfrom() actually gets all 7
> > > connection results. The server doesn't have any queries to process at that
> > > point. But we ask the kernel whether there is new network input over and over
> > > again, despite having results to process!
> > 
> > Hmm, yeah, that seems a missed opportunity.
> 
> > > with-isbusy:
> > > ...
> > > tps = 3990.424742 (without initial connection time)
> > > ...
> > >           1,013.71 msec task-clock                #    0.202 CPUs utilized
> > >             80,203      raw_syscalls:sys_enter    #   79.119 K/sec
> > >             19,947      context-switches          #   19.677 K/sec
> > >      2,943,676,361      cycles:u                  #    2.904 GHz
> > >        346,607,769      cycles:k                  #    0.342 GHz
> > >      8,464,188,379      instructions:u            #    2.88  insn per cycle
> > >        226,665,530      instructions:k            #    0.65  insn per cycle
> > 
> > This is quite compelling.
> > 
> > If you don't mind I can get this pushed soon in the next couple of days
> > -- or do you want to do it yourself?
> 
> I was thinking of pushing the attached, to both 14 and master, thinking
> that was what you meant, but then I wasn't quite sure: It's a relatively
> minor performance improvement, after all? OTOH, it arguably also just is
> a bit of an API misuse...
> 
> I'm inclined to push it to 14 and master, but ...

RMT: ^

Greetings,

Andres Freund