Re: gaussian distribution pgbench -- splits v4 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: gaussian distribution pgbench -- splits v4
Date
Msg-id CA+TgmoZLMxTRsK0Hek=aUkWzLrmjxRLMJQXJQhRpa9nbKRA5vA@mail.gmail.com
Whole thread Raw
In response to Re: gaussian distribution pgbench -- splits v4  (Mitsumasa KONDO <kondo.mitsumasa@gmail.com>)
Responses Re: gaussian distribution pgbench -- splits v4  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
On Wed, Jul 30, 2014 at 9:00 PM, Mitsumasa KONDO
<kondo.mitsumasa@gmail.com> wrote:
> Hmm... It doesn't have harm for pgbench source code. And, in general,
> checking script is useful for avoiding bug.

Not if nobody runs it, or if people run it but don't know what the
output should look like.  I think anyone who knows enough to find bugs
by running these scripts probably doesn't need the scripts.

> No, patch B is still needed. Please tell me the reason. I don't like
> deciding by someones feeling,
> and it needs logical reason. Our documentation is better than the past. I
> think it can easy to understand decile probability.
> This part of the discussion is needed to continue...
>
>> Would providing these as additional contrib files be more acceptable?
>> Something like "tpc-b-gauss.sql"... Otherwise there is no example available
>> to show the feature.
>
> I agree the test script and including command line options. It's not harm,
> and it's useful.

As to all of this, I simply don't agree that the stuff has enough
value to justify including it.  Now, of course, that is subjective:
one person may think it has enough value, while another person may
think that it does not have enough value.  So it just comes down to a
question of opinion, and we make those judgements of opinion all the
time.  If we included everything that everyone who works on the code
wants included, we'd end up with a bloated mess of stuff that nobody
cares about; indeed, we have a significant amount of stuff in the
source code that IMHO looks like somebody's debugging leftovers that
should have been removed before commit.  I don't want to add more
unless there is clear and convincing evidence that a significant
number of people want it, and that is not the case here.

Now, if we get a few reports from people saying, hey, I was doing some
benchmarking with pgbench, and I found the new gaussian feature to be
really excellent, but it sucked that there was no command-line option
for it, we can go back and add one.  No problem!  But in the meantime,
we've added the core of the feature without cluttering up the list of
command-line options with things that may or may not prove to be
useful.

One of the concerns that I have about the proposal of simply slapping
a gaussian or exponential modifier onto \setrandom aid 1 :naccounts is
that, while it will allow you to make part of the relation hot and
another part of the relation cold, you really can't get any more
fine-grained than that.  If you use exponential, all the hot accounts
will be near the beginning of the relation, and if you use gaussian,
they'll all be in the middle.  I'm not sure exactly will happen after
some updating has happened; I'm guessing some of the keys will still
be in their original location and others will have been pushed to the
end of the relation following relation-extension.  But there's no way,
with those command line options, to for example have 5 hot spots
distributed uniformly through the relation; or even to have the end of
the relation rather than the beginning or the middle as the hot spot.
You can do those things with the newly-enhanced \setrand *and a custom
script* but not with just a command-line option.  So that makes me
think that people who find these new facilities useful might not get
all that much use out of the command-line option anyway; and we can't
have a command-line option for every behavior anyone ever wants.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: "MauMau"
Date:
Subject: Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
Next
From: Robert Haas
Date:
Subject: Re: Fixed redundant i18n strings in json