Re: General purpose hashing func in pgbench - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: General purpose hashing func in pgbench
Date
Msg-id alpine.DEB.2.20.1801101903150.15856@lancre
Whole thread Raw
In response to Re: General purpose hashing func in pgbench  (Ildar Musin <i.musin@postgrespro.ru>)
Responses Re: General purpose hashing func in pgbench
Re: General purpose hashing func in pgbench
Re: General purpose hashing func in pgbench
List pgsql-hackers
Hello Ildar,

>> Patch needs a rebase after Teodor push for a set of pgbench functions.
> Done. Congratulations on your patch finally being committed : )

Over 21 months... I hope that pgbench will have hash functions sooner:-)

>>> Should we probably add some infrastructure for optional arguments?
>>
>> You can look at the handling of "CASE" which may or may not have an
>> "ELSE" clause.
>>
>> I'd suggest you use a new negative argument with the special meaning
>> for hash, and create the seed value when missing when building the
>> function, so as to simplify the executor code.

> Added a new nargs option -3 for hash functions and moved arguments check
> to parser. It's starting to look a bit odd and I'm thinking about
> replacing bare numbers (-1, -2, -3) with #defined macros. E.g.:
>
> #define PGBENCH_NARGS_VARIABLE (-1)
> #define PGBENCH_NARGS_CASE (-2)
> #define PGBENCH_NARGS_HASH (-3)

Yes, I'm more than fine with improving readability.

>> Instead of 0, I'd consider providing a random default so that the
>> hashing behavior is not the same from one run to the next. What do you
>> think?
>
> Makes sence since each thread is also initializes its random_state with
> random numbers before start. So I added global variable 'hash_seed' and
> initialize it with random() before threads spawned.

Hmm. I do not think that we should want a shared seed value. The seed 
should be different for each call so as to avoid undesired correlations. 
If wanted, correlation could be obtained by using an explicit identical 
seed.

ISTM that the best way to add the seed is to call random() when the second 
arg is missing in make_func. Also, this means that the executor would 
always get its two arguments, so it would simplify the code there.

-- 
Fabien.


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: PATCH: Configurable file mode mask
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)