Re: CPU costs of random_zipfian in pgbench - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: CPU costs of random_zipfian in pgbench
Date
Msg-id alpine.DEB.2.21.1903241823070.9939@lancre
Whole thread Raw
In response to Re: CPU costs of random_zipfian in pgbench  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: CPU costs of random_zipfian in pgbench
List pgsql-hackers
Hello Tom & Tomas,

>> If the choice is between reporting the failure to the user, and
>> addressing the failure, surely the latter would be the default option?
>> Particularly if the user can't really address the issue easily
>> (recompiling psql is not very practical solution).
>> 
>>>> I remain of the opinion that we ought to simply rip out support for
>>>> zipfian with s < 1.
>> 
>> +1 to that
>
> If this is done, some people with zipfian distribution that currently 
> work might be unhappy.

After giving it some thought, I think that this cannot be fully fixed for 
12.

The attached patch removes the code for param in (0, 1), and slightly 
improve the documentation about the performance, if you want to proceed.

For s > 1, there is no such constraint, and it works fine, there is no 
reason to remove it.

Given the constraint of Jim Gray's approximated method for s in (0, 1), 
which really does zipfian for the first two integers and then uses an 
exponential approximation, the only approach is that the parameters must 
be computed in a partial eval preparation phase before the bench code is 
run. This means that only (mostly) constants would be allowed as 
parameters when s is in (0, 1), but I think that this is acceptable 
because anyway the method fundamentaly requires it. I think that it can be 
implemented reasonably well (meaning not too much code), but would 
requires a few round of reviews if someone implements it (for a reminder, 
I was only the reviewer on this one). An added benefit would be that the 
parameter cache could be shared between thread, which would be a good 
thing.

The attached other attached patch illustrate what I call poor performance 
for stupid parameters (no point in doing zipfian on 2 integers…) :

   ./pgbench -T 3 -D n=2 -D s=1.01 -f zipf_perf.sql   # 46981 tps
   ./pgbench -T 3 -D n=2 -D s=1.001 -f zipf_perf.sql   # 6187 tps
   ./pgbench -T 3 -D n=2 -D s=1.0001 -f zipf_perf.sql   # 710 tps

   ./pgbench -T 3 -D n=100 -D s=1.01 -f zipf_perf.sql  # 142910 tps
   ./pgbench -T 3 -D n=100 -D s=1.001 -f zipf_perf.sql  # 21214 tps
   ./pgbench -T 3 -D n=100 -D s=1.0001 -f zipf_perf.sql  # 2466 tps

   ./pgbench -T 3 -D n=1000000 -D s=1.01 -f zipf_perf.sql # 376453 tps
   ./pgbench -T 3 -D n=1000000 -D s=1.001 -f zipf_perf.sql # 57441 tps
   ./pgbench -T 3 -D n=1000000 -D s=1.0001 -f zipf_perf.sql # 6780 tps

Maybe the implementation could impose that s is at least 1.001 to avoid
the lower performance?

-- 
Fabien.
Attachment

pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: Re: warning to publication created and wal_level is not set tological
Next
From: Tom Lane
Date:
Subject: Re: warning to publication created and wal_level is not set to logical