Re: gaussian distribution pgbench -- part 1/2 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: gaussian distribution pgbench -- part 1/2
Date
Msg-id CA+TgmoYnutnnEajEuAe_9tzZ_DnFzcRaa7nkyn6tURnV89UCdQ@mail.gmail.com
Whole thread Raw
In response to Re: gaussian distribution pgbench -- part 1/2  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: gaussian distribution pgbench -- splits v4  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
On Thu, Jul 17, 2014 at 12:09 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> pgbench with gaussian & exponential, part 1 of 2.
>
> This patch is a subset of the previous patch which only adds the two
> new \setrandom gaussian and exponantial variants, but not the
> adapted pgbench test cases, as suggested by Fujii Masao.
> There is no new code nor code changes.
>
> The corresponding documentation has been yet again extended wrt
> to the initial patch, so that what is achieved is hopefully unambiguous
> (there are two mathematical formula, tasty!), in answer to Andres Freund
> comments, and partly to Robert Haas comments as well.
>
> This patch also provides several sql/pgbench scripts and a README, so
> that the feature can be tested. I do not know whether these scripts
> should make it to postgresql. I would say yes, otherwise there is no way
> to test...
>
> part 2 which provide adapted pgbench test cases will come later.

Some review comments:

1. I suggest that getExponentialrand and getGaussianrand be renamed to
getExponentialRand and getGaussianRand.

2. I suggest that the code be changed so that the branch currently
labeled as /* uniform with extra argument */ become a hard error
instead of a warning.

3. Similarly, I suggest that the use of gaussian or uniform be an
error when argc < 6 OR argc > 6.  I also suggest that the
parenthesized distribution type be dropped from the error message in
all cases.

4. This question mark seems like it should be a period:

+                * value fails the test? To be on the safe side, let
us try over.

5. With regards to the following paragraph:
     <para>
+      The default random distribution is uniform, that is all values in the
+      range are drawn with equal probability. The gaussian and exponential
+      options allow to change this default. The mandatory
+      <replaceable>threshold</> double value controls the actual distribution
+      with gaussian or exponential.
+     </para>

This paragraph needs a bit of copy-editing.  Here's an attempt: "By
default, all values in the range are drawn with equal probability.
The <literal>gaussian</> and <literal>exponential</> options modify
this behavior; each requires a mandatory threshold which determines
the precise shape of the distribution."  The following paragraph
should be changed to begin with "For a Gaussian distribution" and the
one after "For an exponential distribution".

6. Overall, I think the documentation here looks much better now, but
I suggest adding one or two example to the Gaussian section.  Like
this: for example, if threshold is 2.0, 68% of the values will fall in
the middle third of the interval; with a threshold of 3.0, 99.7% of
the values will fall in the middle third of the interval.  These
numbers are fabricated, and the middle third of the interval might not
be the best part to talk about, but you get the idea (I hope).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Production block comparison facility
Next
From: Robert Haas
Date:
Subject: Re: Doing better at HINTing an appropriate column within errorMissingColumn()