Re: [HACKERS] [WIP] Zipfian distribution in pgbench - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: [HACKERS] [WIP] Zipfian distribution in pgbench
Date
Msg-id alpine.DEB.2.20.1708050930520.16395@lancre
Whole thread Raw
In response to Re: [HACKERS] [WIP] Zipfian distribution in pgbench  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
Hello Peter,

> I think that it would also be nice if there was an option to make 
> functions like random_zipfian() actually return a value that has 
> undergone perfect hashing. When this option is used, any given value 
> that the function returns would actually be taken from a random mapping 
> to some other value in the same range. So, you can potentially get a 
> Zipfian distribution without the locality.

I definitely agree. This is a standard problem with all non uniform random 
generators in pgbench, namely random_{gaussian,exponential}.

However hashing is not a good solution on a finite domain because of the 
significant collision rate, so that typically 1/3 of values are empty and 
collisions cause spikes. Also, collisions would break PKs.

The solution is to provide a (good) pseudo-random parametric permutation, 
which is non trivial especially for non powers of two, so ISTM that it 
should be a patch on its own.

The good news is that it is on my todo list and I have some ideas on how 
to do it.

The bad news is that given the rate at which I succeed in getting things 
committed in pgbench, it might take some years:-( For instance, simplistic 
functions and operators to extend the current set have been in the pipe 
for 15 months and missed pg10.

-- 
Fabien.



pgsql-hackers by date:

Previous
From: Shay Rojansky
Date:
Subject: Re: [HACKERS] PostgreSQL not setting OpenSSL session id context?
Next
From: Michael Paquier
Date:
Subject: Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server