Re: pgbench-ycsb - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: pgbench-ycsb
Date
Msg-id alpine.DEB.2.21.1807221615000.13768@lancre
Whole thread Raw
In response to Re: pgbench-ycsb  (a.bykov@postgrespro.ru)
Responses Re: pgbench-ycsb
List pgsql-hackers
>>> Just to clarify - if I understand Anthony correctly, this proposal is 
>>> not about implementing exactly YCSB as it is, but more about using 
>>> zipfian distribution for an id in the regular pgbench table structure 
>>> in conjunction with read/write balance to simulate something similar 
>>> to it.
>> 
>> Ok, I misunderstood. My 0.02€: If it does not implement YCSB, and the
>> point is not to implement YCSB, then do not call it YCSB:-)
>> 
>> Maybe there could be other simpler builtins to use non uniform
>> distributions: {zipf,exp,...}-{simple,select} and default values
>> (exp_param, zipf_param?) for the random distribution parameters.
>>
>>   \set id random_zipfian(1, 100000*:scale, :zipf_param)
>>   \set val random(-5000, 5000)
>>   UPDATE pgbench_whatever ...;
>> 
>> Then
>>
>>   pgbench -b zipf-se@1 -b zipf-si@1 [ -D zipf_param=1.1 ... ] -T 10000 ...
>> 
>>> And probably instead of implementing the exact YCSB workload inside 
>>> pgbench, it makes more sense to add PostgreSQL Jsonb as one of the 
>>> options into the framework itself (I was in the middle of it few years 
>>> ago, but then was distracted by some interesting benchmarking 
>>> results).
>> 
>> Sure.
>
> Hello,
> thank you for your interest. I'm still improving this idea, the patch
> and I'm very happy about the discussion we have. It really helps.
>
> The idea was to implement the workloads as close to YCSB as possible
> using pgbench.

Basically I'm against having something called YCSB if it is not YCSB;-)

> So, the schema it should be applied to - is default schema generated by
> pgbnench -i (pgbench_accounts).

This is a contradiction, because pgbench_accounts table is in no way, even 
remotely, conformant to the YCSB benchmark test table.

So for me there are three possibilities:

(1) do nothing, always an option as committers may be against extending 
pgbench in this direction anyway. Personally I'm fine with having it.

(2) implement YCSB cleanly, i.e. both initialization and operations, at 
least if this is "reasonable" (i.e. it does not result in 2000 lines of 
new code). ISTM that it can be done, given that the YCSB schema is very 
simple, hence I suggested "pgbench -i --schema yscb" to trigger a non 
default initialization.

(3) if you are interested in demonstrating non uniform distribution on 
pgbench_accounts, I'm also fine with it, just do so, but do *NOT* call it 
YCSB.

Also it seems that the YCSB bench uses some hashing to mix keys and avoid 
having 1 as the most frequent, 2 as the second, and so on. There is a hash 
function in pgbench which can be used (although the solution is not 
perfect, some values cannot be reached), but it is used by YCSB. Otherwise 
I'm planning to submit a pseudo-random permutation function to ease this 
some day, provided that the size of the table stays constant.

-- 
Fabien.

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: [HACKERS] plpgsql - additional extra checks
Next
From: Tomas Vondra
Date:
Subject: Re: patch to allow disable of WAL recycling