Re: pgbench - implement strict TPC-B benchmark - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: pgbench - implement strict TPC-B benchmark |
Date | |
Msg-id | alpine.DEB.2.21.1908010654590.2683@lancre Whole thread Raw |
In response to | Re: pgbench - implement strict TPC-B benchmark (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: pgbench - implement strict TPC-B benchmark
Re: pgbench - implement strict TPC-B benchmark |
List | pgsql-hackers |
Hello Tom, > [ shrug... ] TBH, the proposed patch does not look to me like actual > benchmark kit; it looks like a toy. Nobody who was intent on making their > benchmark numbers look good would do a significant amount of work in a > slow, ad-hoc interpreted language. I also wonder to what extent the > numbers would reflect pgbench itself being the bottleneck. > Which is really the fundamental problem I've got with all the stuff > that's been crammed into pgbench of late --- the more computation you're > doing there, the less your results measure the server's capabilities > rather than pgbench's implementation details. That is a very good question. It is easy to measure the overhead, for instance: sh> time pgbench -r -T 30 -M prepared ... latency average = 2.425 ms tps = 412.394420 (including connections establishing) statement latencies in milliseconds: 0.001 \set aid random(1, 100000 * :scale) 0.000 \set bid random(1, 1 * :scale) 0.000 \set tid random(1, 10 * :scale) 0.000 \set delta random(-5000, 5000) 0.022 BEGIN; 0.061 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid; 0.038 SELECT abalance FROM pgbench_accounts WHERE aid = :aid; 0.046 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid; 0.042 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid; 0.036 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP); 2.178 END; real 0m30.080s, user 0m0.406s, sys 0m0.689s The cost of pgbench interpreted part (\set) is under 1/1000. The full time of the process itself counts for 1.4%, below the inevitable system time which is 2.3%. Pgbench overheads are pretty small compared to postgres connection and command execution, and system time. The above used a local socket, if it were an actual remote network connection, the gap would be larger. A profile run could collect more data, but that does not seem necessary. Some parts of Pgbench could be optimized, eg for expressions the large switch could be avoided with precomputed function call, some static analysis could infer some types and avoid calls to generic functions which have to tests types, and so on. But franckly I do not think that this is currently needed so I would not bother unless an actual issue is proven. Also, pgbench overheads must be compared to an actual client application, which deals with a database through some language (PHP, Python, JS, Java…) the interpreter of which would be written in C/C++ just like pgbench, and some library (ORM, DBI, JDBC…), possibly written in the initial language and relying on libpq under the hood. Ok, there could be some JIT involved, but it will not change that there are costs there too, and it would have to do pretty much the same things that pgbench is doing, plus what the application has to do with the data. All in all, pgbench overheads are small compared to postgres processing times and representative of a reasonably optimized client application. > In any case, even if I were in love with the script itself, Love is probably not required for a feature demonstration:-) > we cannot commit something that claims to be "standard TPC-B". Yep, I clearly underestimated this legal aspect. > It needs weasel wording that makes it clear that it isn't TPC-B, and > then you have a problem of user confusion about why we have both > not-quite-TPC-B-1 and not-quite-TPC-B-2, and which one to use, or which > one was used in somebody else's tests. I agree that confusion is no good either. > I think if you want to show off what these pgbench features are good > for, it'd be better to find some other example that's not in the > midst of a legal minefield. Yep, I got that. To try to salvage my illustration idea: I could change the name to "demo", i.e. quite far from "TPC-B", do some extensions to make it differ, eg use a non-uniform random generator, and then explicitly say that it is a vaguely inspired by "TPC-B" and intended as a demo script susceptible to be updated to illustrate new features (eg if using a non-uniform generator I'd really like to add a permutation layer if available some day). This way, the "demo" real intention would be very clear. -- Fabien.
pgsql-hackers by date: