Re: pgbench - implement strict TPC-B benchmark - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: pgbench - implement strict TPC-B benchmark |
Date | |
Msg-id | 20190802183106.aj4rmyvdlv6y5py4@alap3.anarazel.de Whole thread Raw |
In response to | Re: pgbench - implement strict TPC-B benchmark (Fabien COELHO <coelho@cri.ensmp.fr>) |
Responses |
Re: pgbench - implement strict TPC-B benchmark
|
List | pgsql-hackers |
Hi, On 2019-08-02 10:34:24 +0200, Fabien COELHO wrote: > > Hello Andres, > > Thanks a lot for these feedbacks and comments. > > > Using pgbench -Mprepared -n -c 8 -j 8 -S pgbench_100 -T 10 -r -P1 > > e.g. shows pgbench to use 189% CPU in my 4/8 core/thread laptop. That's > > a pretty significant share. > > Fine, but what is the corresponding server load? 211%? 611%? And what actual > time is spent in pgbench itself, vs libpq and syscalls? System wide pgbench, including libpq, is about 22% of the whole system. As far as I can tell there's a number of things that are wrong: - prepared statement names are recomputed for every query execution - variable name lookup is done for every command, rather than once, when parsing commands - a lot of string->int->string type back and forths > Conclusion: pgbench-specific overheads are typically (much) below 10% of the > total client-side cpu cost of a transaction, while over 90% of the cpu cost > is spent in libpq and system, for the worst case do-nothing query. I don't buy that that's the actual worst case, or even remotely close to it. I e.g. see higher pgbench overhead for the *modify* case than for the pgbench's readonly case. And that's because some of the meta commands are slow, in particular everything related to variables. And the modify case just has more variables. > > > + 12.35% pgbench pgbench [.] threadRun > > + 3.54% pgbench pgbench [.] dopr.constprop.0 > > + 3.30% pgbench pgbench [.] fmtint > > + 1.93% pgbench pgbench [.] getVariable > > ~ 21%, probably some inlining has been performed, because I would have > expected to see significant time in "advanceConnectionState". Yea, there's plenty inlining. Note dopr() is string processing. > > + 2.95% pgbench libpq.so.5.13 [.] PQsendQueryPrepared > > + 2.15% pgbench libpq.so.5.13 [.] pqPutInt > > + 4.47% pgbench libpq.so.5.13 [.] pqParseInput3 > > + 1.66% pgbench libpq.so.5.13 [.] pqPutMsgStart > > + 1.63% pgbench libpq.so.5.13 [.] pqGetInt > > ~ 13% A lot of that is really stupid. We need to improve libpq. PqsendQueryGuts (attributed to PQsendQueryPrepared here), builds the command in many separate pqPut* commands, which reside in another translation unit, is pretty sad. > > + 3.16% pgbench libc-2.28.so [.] __strcmp_avx2 > > + 2.95% pgbench libc-2.28.so [.] malloc > > + 1.85% pgbench libc-2.28.so [.] ppoll > > + 1.85% pgbench libc-2.28.so [.] __strlen_avx2 > > + 1.85% pgbench libpthread-2.28.so [.] __libc_recv > > ~ 11%, str is a pain… Not sure who is calling though, pgbench or > libpq. Both. Most of the strcmp is from getQueryParams()/getVariable(). The dopr() is from pg_*printf, which is mostly attributable to preparedStatementName() and getVariable(). > This is basically 47% pgbench, 53% lib*, on the sample provided. I'm unclear > about where system time is measured. It was excluded in this profile, both to reduce profiling costs, and to focus on pgbench. Greetings, Andres Freund
pgsql-hackers by date: