Re: [HACKERS] [WIP] Zipfian distribution in pgbench - Mailing list pgsql-hackers

From Alik Khilazhev
Subject Re: [HACKERS] [WIP] Zipfian distribution in pgbench
Date
Msg-id 9F00A6CA-EA92-4DF2-9007-652AFE3636D2@postgrespro.ru
Whole thread Raw
In response to Re: [HACKERS] [WIP] Zipfian distribution in pgbench  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: [HACKERS] [WIP] Zipfian distribution in pgbench
List pgsql-hackers
Hello Fabien,

I am attaching patch v4. 

On 19 Jul 2017, at 17:21, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

About the maths: As already said, I'm not at ease with a random_zipfian function which does not display a (good) zipfian distribution. At the minimum the documentation should be clear about the approximations implied depending on the parameter value.

I add one more sentence to documentation to emphasize that degree of proximity depends on parameter . And also I made restriction on parameter, now it can be only in range (0; 1)

In the litterature the theta parameter seems to be often called alpha
or s (eg see https://en.wikipedia.org/wiki/Zipf%27s_law). I would suggest to
stick to "s" instead of "theta”?

I have renamed it to “s”.

Functions zipfZeta(n, theta) does not really computes the zeta(n) function,
so I think that a better name should be chosen. It seems to compute
H_{n,theta}, the generalized harmonic number. Idem "thetan" field in struct.

Renamed zipfZeta to zipfGeneralizedHarmonic, zetan to harmonicn.

The handling of cache overflow by randomly removing one entry looks like
a strange idea. Rather remove the oldest entry?

Replaced with Least Recently Used replacement algorithm.

ISTM that it should print a warning once if the cache array overflows as performance would drop heavily.

Now it prints warning message if array overflowed. To print message only one time, it uses global flag, which is available for all threads. 
And theoretically message can be printed more than one time. 
It could be solved easily using pg_atomic_test_set_flag() from src/include/port/atomics.h but it can not be used in pgbench because of following lines of code there:

#ifdef FRONTEND
#error "atomics.h may not be included from frontend code"
#endif

Or it can be fixed by using mutexes from pthread, but I think code become less readable and more complex in this case.
So, should I spend time on solving this issue? 


If the zipf cache is constant size, there is no point in using dynamic allocation, just declare an array…

Fixed. Does ZIPF_CACHE_SIZE = 15 is ok? 

There should be non regression tests somehow. If the "improve pgbench
tap test infrastructure" get through, things should be added there.

I will send tests later, as separate patch.


Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company


Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [HACKERS] Cache lookup errors with functions manipulation objectaddresses
Next
From: Jeff Janes
Date:
Subject: Re: [HACKERS] Increase Vacuum ring buffer.