Home > mailing lists

Re: pgbench-ycsb - Mailing list pgsql-hackers

From	Fabien COELHO
Subject	Re: pgbench-ycsb
Date	July 22, 2018 23:42:14
Msg-id	alpine.DEB.2.21.1807221615000.13768@lancre Whole thread
In response to	Re: pgbench-ycsb (a.bykov@postgrespro.ru)
Responses	Re: pgbench-ycsb
List	pgsql-hackers

Tree view

>>> Just to clarify - if I understand Anthony correctly, this proposal is 
>>> not about implementing exactly YCSB as it is, but more about using 
>>> zipfian distribution for an id in the regular pgbench table structure 
>>> in conjunction with read/write balance to simulate something similar 
>>> to it.
>> 
>> Ok, I misunderstood. My 0.02€: If it does not implement YCSB, and the
>> point is not to implement YCSB, then do not call it YCSB:-)
>> 
>> Maybe there could be other simpler builtins to use non uniform
>> distributions: {zipf,exp,...}-{simple,select} and default values
>> (exp_param, zipf_param?) for the random distribution parameters.
>>
>>   \set id random_zipfian(1, 100000*:scale, :zipf_param)
>>   \set val random(-5000, 5000)
>>   UPDATE pgbench_whatever ...;
>> 
>> Then
>>
>>   pgbench -b zipf-se@1 -b zipf-si@1 [ -D zipf_param=1.1 ... ] -T 10000 ...
>> 
>>> And probably instead of implementing the exact YCSB workload inside 
>>> pgbench, it makes more sense to add PostgreSQL Jsonb as one of the 
>>> options into the framework itself (I was in the middle of it few years 
>>> ago, but then was distracted by some interesting benchmarking 
>>> results).
>> 
>> Sure.
>
> Hello,
> thank you for your interest. I'm still improving this idea, the patch
> and I'm very happy about the discussion we have. It really helps.
>
> The idea was to implement the workloads as close to YCSB as possible
> using pgbench.

Basically I'm against having something called YCSB if it is not YCSB;-)

> So, the schema it should be applied to - is default schema generated by
> pgbnench -i (pgbench_accounts).

This is a contradiction, because pgbench_accounts table is in no way, even 
remotely, conformant to the YCSB benchmark test table.

So for me there are three possibilities:

(1) do nothing, always an option as committers may be against extending 
pgbench in this direction anyway. Personally I'm fine with having it.

(2) implement YCSB cleanly, i.e. both initialization and operations, at 
least if this is "reasonable" (i.e. it does not result in 2000 lines of 
new code). ISTM that it can be done, given that the YCSB schema is very 
simple, hence I suggested "pgbench -i --schema yscb" to trigger a non 
default initialization.

(3) if you are interested in demonstrating non uniform distribution on 
pgbench_accounts, I'm also fine with it, just do so, but do *NOT* call it 
YCSB.

Also it seems that the YCSB bench uses some hashing to mix keys and avoid 
having 1 as the most frequent, 2 as the second, and so on. There is a hash 
function in pgbench which can be used (although the solution is not 
perfect, some values cannot be reached), but it is used by YCSB. Otherwise 
I'm planning to submit a pseudo-random permutation function to ease this 
some day, provided that the size of the table stays constant.

-- 
Fabien.

pgsql-hackers by date:

From: Tomas Vondra
Date: 22 July 2018, 23:24:57
Subject: Re: [HACKERS] plpgsql - additional extra checks

From: Tomas Vondra
Date: 22 July 2018, 23:50:32
Subject: Re: patch to allow disable of WAL recycling

Re: pgbench-ycsb - Mailing list pgsql-hackers

Previous

Next