Re: pgbench - implement strict TPC-B benchmark - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: pgbench - implement strict TPC-B benchmark |
Date | |
Msg-id | alpine.DEB.2.21.1908012320430.32558@lancre Whole thread Raw |
In response to | Re: pgbench - implement strict TPC-B benchmark (Andres Freund <andres@anarazel.de>) |
Responses |
Re: pgbench - implement strict TPC-B benchmark
|
List | pgsql-hackers |
Hello Andres, Thanks a lot for these feedbacks and comments. > Using pgbench -Mprepared -n -c 8 -j 8 -S pgbench_100 -T 10 -r -P1 > e.g. shows pgbench to use 189% CPU in my 4/8 core/thread laptop. That's > a pretty significant share. Fine, but what is the corresponding server load? 211%? 611%? And what actual time is spent in pgbench itself, vs libpq and syscalls? Figures and discussion below. > And before you argue that that's just about a read-only workload: I'm fine with worth case scenarii:-) Let's do the worse for my 2 cores running at 2.2 GHz laptop: (0) we can run a really do nearly nothing script: sh> cat nope.sql \sleep 0 # do not sleep, so stay awake… sh> time pgbench -f nope.sql -T 10 -r latency average = 0.000 ms tps = 12569499.226367 (excluding connections establishing) # 12.6M statement latencies in milliseconds: 0.000 \sleep 0 real 0m10.072s, user 0m10.027s, sys 0m0.012s Unsurprisingly pgbench is at about 100% cpu load, and the transaction cost (transaction loop and stat collection) is 0.080 µs (1/12.6M) per script execution (one client on one thread). (1) a pgbench complex-commands-only script: sh> cat set.sql \set x random_exponential(1, :scale * 10, 2.5) + 2.1 \set y random(1, 9) + 17.1 * :x \set z case when :x > 7 then 1.0 / ln(:y) else 2.0 / sqrt(:y) end sh> time pgbench -f set.sql -T 10 -r latency average = 0.001 ms tps = 1304989.729560 (excluding connections establishing) # 1.3M statement latencies in milliseconds: 0.000 \set x random_exponential(1, :scale * 10, 2.5) + 2.1 0.000 \set y random(1, 9) + 17.1 * :x 0.000 \set z case when :x > 7 then 1.0 / ln(:y) else 2.0 / sqrt(:y) end real 0m10.038s, user 0m10.003s, sys 0m0.000s Again pgbench load is near 100%, with only pgbench stuff (thread loop, expression evaluation, variables, stat collection) costing about 0.766 µs cpu per script execution. This is about 10 times the previous case, 90% of pgbench cpu cost is in expressions and variables, not a surprise. Probably this under-a-µs could be reduced… but what overall improvements would it provide? An answer with the last test: (2) a ridiculously small SQL query, tested through a local unix socket: sh> cat empty.sql ; # yep, an empty query! sh> time pgbench -f empty.sql -T 10 -r latency average = 0.016 ms tps = 62206.501709 (excluding connections establishing) # 62.2K statement latencies in milliseconds: 0.016 ; real 0m10.038s, user 0m1.754s, sys 0m3.867s We are adding minimal libpq and underlying system calls to pgbench internal cpu costs in the most favorable (or worst:-) sql query with the most favorable postgres connection. Apparent load is about (1.754+3.867)/10.038 = 56%, so the cpu cost per script is 0.56 / 62206.5 = 9 µs, over 100 times the cost of a do-nothing script (0), and over 10 times the cost of a complex expression command script (1). Conclusion: pgbench-specific overheads are typically (much) below 10% of the total client-side cpu cost of a transaction, while over 90% of the cpu cost is spent in libpq and system, for the worst case do-nothing query. A perfect bench driver which would have zero overheads would reduce the cpu cost by at most 10%, because you still have to talk to the database. through the system. If pgbench cost were divided by two, which would be a reasonable achievement, the benchmark client cost would be reduced by 5%. Wow? I have already given some thought in the past to optimize "pgbench", especially to avoid long switches (eg in expression evaluation) and maybe to improve variable management, but as show above I would not expect a gain worth the effort and assume that a patch would probably be justly rejected, because for a realistic benchmark script these costs are already much less than other inevitable libpq/syscall costs. That does not mean that nothing needs to be done, but the situation is currently quite good. In conclusion, ISTM that current pgbench allows to saturate a postgres server with a client significantly smaller than the server, which seems like a reasonable benchmarking situation. Any other driver in any other language would necessarily incur the same kind of costs. > [...] And the largest part of the overhead is in pgbench's interpreter > loop: Indeed, the figures below are very interesting! Thanks for collecting them. > + 12.35% pgbench pgbench [.] threadRun > + 3.54% pgbench pgbench [.] dopr.constprop.0 > + 3.30% pgbench pgbench [.] fmtint > + 1.93% pgbench pgbench [.] getVariable ~ 21%, probably some inlining has been performed, because I would have expected to see significant time in "advanceConnectionState". > + 2.95% pgbench libpq.so.5.13 [.] PQsendQueryPrepared > + 2.15% pgbench libpq.so.5.13 [.] pqPutInt > + 4.47% pgbench libpq.so.5.13 [.] pqParseInput3 > + 1.66% pgbench libpq.so.5.13 [.] pqPutMsgStart > + 1.63% pgbench libpq.so.5.13 [.] pqGetInt ~ 13% > + 3.16% pgbench libc-2.28.so [.] __strcmp_avx2 > + 2.95% pgbench libc-2.28.so [.] malloc > + 1.85% pgbench libc-2.28.so [.] ppoll > + 1.85% pgbench libc-2.28.so [.] __strlen_avx2 > + 1.85% pgbench libpthread-2.28.so [.] __libc_recv ~ 11%, str is a pain… Not sure who is calling though, pgbench or libpq. This is basically 47% pgbench, 53% lib*, on the sample provided. I'm unclear about where system time is measured. > And that's the just the standard pgbench read/write case, without > additional script commands or anything. > Well, duh, that's because you're completely IO bound. You're doing > 400tps. That's *nothing*. All you're measuring is how fast the WAL can > be fdatasync()ed to disk. Of *course* pgbench isn't a relevant overhead > in that case. I really don't understand how this can be an argument. Sure. My interest in running it was to show that the \set stuff was ridiculous compared to processing an actual SQL query, but it does not allow to analyze all overheads. I hope that the 3 above examples allow to make my point more understandable. >> Also, pgbench overheads must be compared to an actual client application, >> which deals with a database through some language (PHP, Python, JS, Java…) >> the interpreter of which would be written in C/C++ just like pgbench, and >> some library (ORM, DBI, JDBC…), possibly written in the initial language and >> relying on libpq under the hood. Ok, there could be some JIT involved, but >> it will not change that there are costs there too, and it would have to do >> pretty much the same things that pgbench is doing, plus what the application >> has to do with the data. > > Uh, but those clients aren't all running on a single machine. Sure. The cumulated power of the clients is probably much larger than the postgres server itself, and ISTM that pgbench allows to simulate such things with much smaller client-side requirements, and that any other tool could not do much better. -- Fabien.
pgsql-hackers by date: