Re: pgbench - implement strict TPC-B benchmark - Mailing list pgsql-hackers

From Andres Freund
Subject Re: pgbench - implement strict TPC-B benchmark
Date
Msg-id 20190802183106.aj4rmyvdlv6y5py4@alap3.anarazel.de
Whole thread Raw
In response to Re: pgbench - implement strict TPC-B benchmark  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: pgbench - implement strict TPC-B benchmark
List pgsql-hackers
Hi,

On 2019-08-02 10:34:24 +0200, Fabien COELHO wrote:
> 
> Hello Andres,
> 
> Thanks a lot for these feedbacks and comments.
> 
> > Using pgbench -Mprepared -n -c 8 -j 8 -S pgbench_100 -T 10 -r -P1
> > e.g. shows pgbench to use 189% CPU in my 4/8 core/thread laptop. That's
> > a pretty significant share.
> 
> Fine, but what is the corresponding server load? 211%? 611%? And what actual
> time is spent in pgbench itself, vs libpq and syscalls?

System wide pgbench, including libpq, is about 22% of the whole system.

As far as I can tell there's a number of things that are wrong:
- prepared statement names are recomputed for every query execution
- variable name lookup is done for every command, rather than once, when
  parsing commands
- a lot of string->int->string type back and forths


> Conclusion: pgbench-specific overheads are typically (much) below 10% of the
> total client-side cpu cost of a transaction, while over 90% of the cpu cost
> is spent in libpq and system, for the worst case do-nothing query.

I don't buy that that's the actual worst case, or even remotely close to
it. I e.g. see higher pgbench overhead for the *modify* case than for
the pgbench's readonly case. And that's because some of the meta
commands are slow, in particular everything related to variables. And
the modify case just has more variables.



> 
> > +   12.35%  pgbench  pgbench                [.] threadRun
> > +    3.54%  pgbench  pgbench                [.] dopr.constprop.0
> > +    3.30%  pgbench  pgbench                [.] fmtint
> > +    1.93%  pgbench  pgbench                [.] getVariable
> 
> ~ 21%, probably some inlining has been performed, because I would have
> expected to see significant time in "advanceConnectionState".

Yea, there's plenty inlining.  Note dopr() is string processing.


> > +    2.95%  pgbench  libpq.so.5.13          [.] PQsendQueryPrepared
> > +    2.15%  pgbench  libpq.so.5.13          [.] pqPutInt
> > +    4.47%  pgbench  libpq.so.5.13          [.] pqParseInput3
> > +    1.66%  pgbench  libpq.so.5.13          [.] pqPutMsgStart
> > +    1.63%  pgbench  libpq.so.5.13          [.] pqGetInt
> 
> ~ 13%

A lot of that is really stupid. We need to improve
libpq. PqsendQueryGuts (attributed to PQsendQueryPrepared here), builds
the command in many separate pqPut* commands, which reside in another
translation unit, is pretty sad.


> > +    3.16%  pgbench  libc-2.28.so           [.] __strcmp_avx2
> > +    2.95%  pgbench  libc-2.28.so           [.] malloc
> > +    1.85%  pgbench  libc-2.28.so           [.] ppoll
> > +    1.85%  pgbench  libc-2.28.so           [.] __strlen_avx2
> > +    1.85%  pgbench  libpthread-2.28.so     [.] __libc_recv
> 
> ~ 11%, str is a pain… Not sure who is calling though, pgbench or
> libpq.

Both. Most of the strcmp is from getQueryParams()/getVariable(). The
dopr() is from pg_*printf, which is mostly attributable to
preparedStatementName() and getVariable().


> This is basically 47% pgbench, 53% lib*, on the sample provided. I'm unclear
> about where system time is measured.

It was excluded in this profile, both to reduce profiling costs, and to
focus on pgbench.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: pglz performance
Next
From: Tomas Vondra
Date:
Subject: Re: pglz performance