Re: gaussian distribution pgbench - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: gaussian distribution pgbench |
Date | |
Msg-id | 20140704100556.GO25909@awork2.anarazel.de Whole thread Raw |
In response to | Re: gaussian distribution pgbench (Fabien COELHO <coelho@cri.ensmp.fr>) |
Responses |
Re: gaussian distribution pgbench
(Mitsumasa KONDO <kondo.mitsumasa@gmail.com>)
Re: gaussian distribution pgbench -- part 1/2 (Fabien COELHO <coelho@cri.ensmp.fr>) |
List | pgsql-hackers |
On 2014-07-04 11:59:23 +0200, Fabien COELHO wrote: > > >Yea. I certainly disagree with the patch in it's current state because it > >copies the same 15 lines several times with a two word difference. > >Independent of whether we want those options, I don't think that's going > >to fly. > > I liked a simple static string for the different variants, which means > replication. Factorizing out the (large) common part will mean malloc & > sprintf. Well, why not. It sucks from a maintenance POV. And I don't see the overhead of malloc being relevant here... > >>OTOH, we've almost reached the consensus that supporting gaussian > >>and exponential options in \setrandom. So I think that you should > >>separate those two features into two patches, and we should apply > >>the \setrandom one first. Then we can discuss whether the other patch > >>should be applied or not. > > >Sounds like a good plan. > > Sigh. I'll do that as it seems to be a blocker... I think we also need documentation about the actual mathematical behaviour of the randomness generators. > + <para> > + With the gaussian option, the larger the <replaceable>threshold</>, > + the more frequently values close to the middle of the interval are drawn, > + and the less frequently values close to the <replaceable>min</> and > + <replaceable>max</> bounds. > + In other worlds, the larger the <replaceable>threshold</>, > + the narrower the access range around the middle. > + the smaller the threshold, the smoother the access pattern > + distribution. The minimum threshold is 2.0 for performance. > + </para> The only way to actually understand the distribution here is to create a table, insert random values, and then look at the result. That's not a good thing. > The caveat that I have is that without these options there is: > > (1) no return about the actual distributions in the final summary, which > depend on the threshold value, and > > (2) no included mean to test the feature, so the first patch is less > meaningful if the feature cannot be used simply and require a custom script. I personally agree that we likely want that as an additional feature. Even if just because it makes the results easier to compare. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: