Re: gaussian distribution pgbench - Mailing list pgsql-hackers

From Gregory Smith
Subject Re: gaussian distribution pgbench
Date
Msg-id 52B39C0D.2020606@gmail.com
Whole thread Raw
In response to Re: gaussian distribution pgbench  (Gavin Flower <GavinFlower@archidevsys.co.nz>)
Responses Re: gaussian distribution pgbench
List pgsql-hackers
On 12/19/13 5:52 PM, Gavin Flower wrote:
> Curious, wouldn't the common usage pattern tend to favour a skewed 
> distribution, such as the  Poisson Distribution (it has been over 40 
> years since I studied this area, so there may be better candidates).
>

Some people like database load testing with a "Pareto principle" 
distribution, where 80% of the activity hammers 20% of the rows such 
that locking becomes important.  (That's one specific form of Pareto 
distribution)  The standard pgbench load indirectly gets you quite a bit 
of that due to all the contention on the branches table. Targeting all 
of that at a single table can be more realistic.

My last round of reviewing a pgbench change left me pretty worn out with 
wanting to extend that code much further.  Adding in some new 
probability distributions would be fine though, that's a narrow change.  
We shouldn't get too excited about pgbench remaining a great tool for 
too much longer though.  pgbench is fast approaching a wall nowadays, 
where it's hard for any single client server to fully overload today's 
larger server.  You basically need a second large server to generate 
load, whereas what people really want is a bunch of coordinated small 
clients.  (That sort of wall was in early versions too, it just got 
pushed upward a lot by the multi-worker changes in 9.0 coming around the 
same time desktop core counts really skyrocketed)

pgbench started as a clone of a now abandoned Java project called 
JDBCBench.  I've been seriously considering a move back toward that 
direction lately.  Nowadays spinning up ten machines to run load 
generation is trivial.  The idea of extending pgbench's C code to 
support multiple clients running at the same time and collating all of 
their results is not a project I'd be excited about.  It should remain a 
perfectly fine tool for PostgreSQL developers to find code hotspots, but 
that's only so useful.

(At this point someone normally points out Tsung solved all of those 
problems years ago if you'd only give it a chance.  I think it's kind of 
telling that work on sysbench is rewriting the whole thing so you can 
use Lua for your test scripts.)



pgsql-hackers by date:

Previous
From: Florian Pflug
Date:
Subject: XML Issue with DTDs
Next
From: Jim Nasby
Date:
Subject: Re: preserving forensic information when we freeze