Re: CPU costs of random_zipfian in pgbench - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: CPU costs of random_zipfian in pgbench
Date
Msg-id alpine.DEB.2.21.1902191137030.7308@lancre
Whole thread Raw
In response to Re: CPU costs of random_zipfian in pgbench  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: CPU costs of random_zipfian in pgbench
List pgsql-hackers
Hello Peter,

My 0.02€: I'm not quite interested in maintaining a tool for *one* 
benchmark, whatever the benchmark, its standardness or quality.

What I like in "pgbench" is that it is both versatile and simple so that 
people can benchmark their own data with their own load and their own 
queries by writing a few lines of trivial SQL and psql-like slash command 
and adjusting a few options, and extract meaningful statistics out of it.

I've been, but not only me, improving it so that it keeps its usage 
simplicity but provides key features so that anyone can write a simple but 
realistic benchmark.

The key features needed for that, and which happen to be nearly all there 
now are:
  - some expressions (thanks Roberts for the initial push)
  - non uniform random (ok, some are more expensive, too bad)
    however using non uniform random generates a correlation issue,
    hence the permutation function submission, which took time because
    this is a non trivial problem.
  - conditionals (\if, taken from psql's implementation)
  - getting a result out and being able to do something with it
    (\gset, and the associated \cset that Tom does not like).
  - improved reporting (including around latency, per script/command/...)
  - realistic loads (--rate vs only pedal-to-the-metal runs, --latency-limit)

I have not encountered other tools with this versatility and simplicity. 
The TPC-C implementation you point out and others I have seen are 
structurally targetted at TPC-C and nothing else. I do not care about 
TPC-C per se, I care about people being able to run relevant benchmarks 
with minimal effort.

I'm not planning to submit many things in the future (current: a 
strict-tpcb implementation which is really of show case of the existing 
features, faster server-side initialization, simple refactoring to 
simplify/clarify the code structure here and there, maybe some stuff may 
migrate to fe_utils if useful to psql), and review what other people find 
useful because I know the code base quite well.

I do thing that the maintainability of the code has globally been improved 
recently because (1) the process-based implementation has been dropped (2) 
the FSA implementation makes the code easier to understand and check, 
compared to the lengthy plenty-of-if many-variables function used 
beforehand. Bugs have been identified and fixed.

> I agree that pgbench is too complex, given its mandate and design.
> While I found Zipfian useful once or twice, I probably would have done
> just as well with an exponential distribution.

Yep, I agree that exponential is mostly okay for most practical 
benchmarking uses, but some benchmark/people seem to really want zipf, so 
zipf and its intrinsic underlying complexity was submitted and finally 
included.

> I have been using BenchmarkSQL as a fair-use TPC-C implementation for
> my indexing project, with great results. pgbench just isn't very
> useful when validating the changes to B-Tree page splits that I
> propose, because the insertion pattern cannot be modeled
> probabilistically.

I do not understand the use case, and why pgbench could not be used for 
this purpose.

> Besides, I really think that things like latency graphs are table stakes 
> for this kind of work, which BenchmarkSQL offers out of the box. It 
> isn't practical to make pgbench into a framework, which is what I'd 
> really like to see. There just isn't that much more than can be done 
> there.

Yep. Pgbench only does "simple stats". I script around the per-second 
progress output for graphical display and additional stats (eg 5 number 
summary…).

-- 
Fabien.

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: unconstify equivalent for volatile
Next
From: Fabien COELHO
Date:
Subject: Re: Progress reporting for pg_verify_checksums