Re: CPU costs of random_zipfian in pgbench - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: CPU costs of random_zipfian in pgbench |
Date | |
Msg-id | alpine.DEB.2.21.1902191137030.7308@lancre Whole thread Raw |
In response to | Re: CPU costs of random_zipfian in pgbench (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: CPU costs of random_zipfian in pgbench
|
List | pgsql-hackers |
Hello Peter, My 0.02€: I'm not quite interested in maintaining a tool for *one* benchmark, whatever the benchmark, its standardness or quality. What I like in "pgbench" is that it is both versatile and simple so that people can benchmark their own data with their own load and their own queries by writing a few lines of trivial SQL and psql-like slash command and adjusting a few options, and extract meaningful statistics out of it. I've been, but not only me, improving it so that it keeps its usage simplicity but provides key features so that anyone can write a simple but realistic benchmark. The key features needed for that, and which happen to be nearly all there now are: - some expressions (thanks Roberts for the initial push) - non uniform random (ok, some are more expensive, too bad) however using non uniform random generates a correlation issue, hence the permutation function submission, which took time because this is a non trivial problem. - conditionals (\if, taken from psql's implementation) - getting a result out and being able to do something with it (\gset, and the associated \cset that Tom does not like). - improved reporting (including around latency, per script/command/...) - realistic loads (--rate vs only pedal-to-the-metal runs, --latency-limit) I have not encountered other tools with this versatility and simplicity. The TPC-C implementation you point out and others I have seen are structurally targetted at TPC-C and nothing else. I do not care about TPC-C per se, I care about people being able to run relevant benchmarks with minimal effort. I'm not planning to submit many things in the future (current: a strict-tpcb implementation which is really of show case of the existing features, faster server-side initialization, simple refactoring to simplify/clarify the code structure here and there, maybe some stuff may migrate to fe_utils if useful to psql), and review what other people find useful because I know the code base quite well. I do thing that the maintainability of the code has globally been improved recently because (1) the process-based implementation has been dropped (2) the FSA implementation makes the code easier to understand and check, compared to the lengthy plenty-of-if many-variables function used beforehand. Bugs have been identified and fixed. > I agree that pgbench is too complex, given its mandate and design. > While I found Zipfian useful once or twice, I probably would have done > just as well with an exponential distribution. Yep, I agree that exponential is mostly okay for most practical benchmarking uses, but some benchmark/people seem to really want zipf, so zipf and its intrinsic underlying complexity was submitted and finally included. > I have been using BenchmarkSQL as a fair-use TPC-C implementation for > my indexing project, with great results. pgbench just isn't very > useful when validating the changes to B-Tree page splits that I > propose, because the insertion pattern cannot be modeled > probabilistically. I do not understand the use case, and why pgbench could not be used for this purpose. > Besides, I really think that things like latency graphs are table stakes > for this kind of work, which BenchmarkSQL offers out of the box. It > isn't practical to make pgbench into a framework, which is what I'd > really like to see. There just isn't that much more than can be done > there. Yep. Pgbench only does "simple stats". I script around the per-second progress output for graphical display and additional stats (eg 5 number summary…). -- Fabien.
pgsql-hackers by date: