Re: gaussian distribution pgbench -- splits v4 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: gaussian distribution pgbench -- splits v4
Date
Msg-id CA+TgmoZjsB-UuzWiaVNSsgwC0bsZnJ1vDfCkqn3vHOeuT45m0A@mail.gmail.com
Whole thread Raw
In response to Re: gaussian distribution pgbench -- splits v4  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: gaussian distribution pgbench -- splits v4  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
On Thu, Jul 31, 2014 at 10:01 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>> One of the concerns that I have about the proposal of simply slapping a
>> gaussian or exponential modifier onto \setrandom aid 1 :naccounts is that,
>> while it will allow you to make part of the relation hot and another part of
>> the relation cold, you really can't get any more fine-grained than that. If
>> you use exponential, all the hot accounts will be near the beginning of the
>> relation, and if you use gaussian, they'll all be in the middle.
>
> That is a very good remark. Although I thought of it, I do not have a very
> good solution yet:-)
>
> From a testing perspective, if we assume that keys have no semantics, a
> reasonable assumption is that the distribution of access for actual
> realistic workloads is probably exponential (of gaussian, anyway hardly
> uniform), but without direct correlation between key values.
>
> In order to simulate that, we would have to apply a fixed (pseudo-)random
> permutation to the exponential-drawn key values. This is a non trivial
> problem. The version zero of solving it is to do nothing... it is the
> current status;-) Version one is "k' = 1 + (a * k + b) modulo n" with "a"
> prime with respect to "n", "n" being the number of keys. This is nearly
> possible, but for the modulo operator which is currently missing, and that
> I'm planning to submit for this very reason, but probably another time.

That's pretty crude, although I don't object to a modulo operator.  It
would be nice to be able to use a truly random permutation, which is
not hard to generate but probably requires O(n) storage, likely a
problem for large scale factors.  Maybe somebody who knows more math
than I do (like you, probably!) can come up with something more
clever.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: [COMMITTERS] pgsql: Move log_newpage and log_newpage_buffer to xlog.c.
Next
From: Robert Haas
Date:
Subject: Re: [COMMITTERS] pgsql: Move log_newpage and log_newpage_buffer to xlog.c.