Re: gaussian distribution pgbench - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: gaussian distribution pgbench
Date
Msg-id 53D610DF.5030106@vmware.com
Whole thread Raw
In response to Re: gaussian distribution pgbench  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
On 07/17/2014 11:13 PM, Fabien COELHO wrote:
>
>>> However, ISTM that it is not the purpose of pgbench documentation to be a
>>> primer about what is an exponential or gaussian distribution, so the idea
>>> would yet be to have a relatively compact explanation, and that the
>>> interested but clueless reader would document h..self from wikipedia or a
>>> text book or a friend or a math teacher (who could be a friend as well:-).
>>
>> Well, I think it's a balance.  I agree that the pgbench documentation
>> shouldn't try to substitute for a text book or a math teacher, but I
>> also think that you shouldn't necessarily need to refer to a text book
>> or a math teacher in order to figure out how to use pgbench.  Saying
>> "it's complicated, so we don't have to explain it" would be a cop out;
>> we need to *make* it simple.  And if there's no way to do that, then
>> IMHO we should reject the patch in favor of some future patch that
>> implements something that will be easy for users to understand.
>>
>>>>>   [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=10
>>>>> starting vacuum...end.
>>>>> transaction type: Exponential distribution TPC-B (sort of)
>>>>> scaling factor: 1
>>>>> exponential threshold: 10.00000
>>>>>
>>>>> decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0%
>>>>> highest/lowest percent of the range: 9.5% 0.0%
>>>>
>>>> I don't have a clue what that means.  None.
>>>
>>> Maybe we could add in front of the decile/percent
>>>
>>> "distribution of increasing account key values selected by pgbench:"
>>
>> I still wouldn't know what that meant.  And it misses the point
>> anyway: if the documentation is good, this will be unnecessary.  If
>> the documentation is bad, a printout that tries to illustrate it by
>> example is not an acceptable substitute.
>
> The decile description is quite classic when discussing statistics.

IMHO we should include a diagram for each distribution. A diagram would 
be much more easy to understand than a decile or verbal explanation.

The only problem is that the build infrastructure doesn't currently 
support including images in the docs. That's been discussed before, and 
I think we even used to have a couple of images there a long time ago. 
Now would be a good time to bite the bullet and add the support.
We got fairly close to a consensus on how to do it in this thread: 
www.postgresql.org/message-id/flat/20120712181636.GC11063@momjian.us. 
The biggest problem was choosing an editor that has a fairly stable file 
format, so that we don't get huge diffs every time someone moves a line 
in a diagram. One work-around for that is to use graphviz and/or gnuplot 
as the source format, instead of a graphical editor.

- Heikki




pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Audit of logout
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Introducing coarse grain parallelism by postgres_fdw.