Re: pgbench randomness initialization - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: pgbench randomness initialization
Date
Msg-id alpine.DEB.2.10.1604071147420.11001@sto
Whole thread Raw
In response to pgbench randomness initialization  (Andres Freund <andres@anarazel.de>)
Responses Re: pgbench randomness initialization
Re: pgbench randomness initialization
List pgsql-hackers
Hello Andres,

> et al I was wondering why it's a good idea for pgbench to do
>     INSTR_TIME_SET_CURRENT(start_time);
>     srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time));
> to initialize randomness and then
>     for (i = 0; i < nthreads; i++)
>         thread->random_state[0] = random();
>         thread->random_state[1] = random();
>         thread->random_state[2] = random();
> to initialize the individual thread random state which is then used by
> pg_erand48().
>
> To me it seems better to instead initialize srandom() with a known value
> (say, uh, 0). Or even better don't use random() at all, and fill a
> global pg_erand48() with a known state; and use pg_erand48() to
> initialize the thread states.
>
> Obviously that doesn't make pgbench entirely reproducible, but it seems
> a lot better than now. Individual threads would do work in a
> reproducible order.
>
> I see very little reason to have the current behaviour, or at the very
> least not by default.

I think that it depends on what you want, which may vary:
 (1) "exactly" reproducible runs, but one run may hit a particular     steady state not representative of what happens
ingeneral.
 
 (2) runs which really vary from one to the next, so as     to have an idea about how much it may vary, what is the
performancestability.
 

Currently pgbench focusses on (2), which may or may not be fine depending 
on what you are doing. From a personal point of view I think that (2) is 
more significant to collect performance data, even if the results are more 
unstable: that simply reflects reality and its intrinsic variations, so 
I'm fine that as the default.

Now for those interested in (1) for some reason, I would suggest to rely a 
PGBENCH_RANDOM_SEED environment variable or --random-seed option which 
could be used to have a oxymoronic "deterministic randomness", if desired.
I do not think that it should be the default, though.

-- 
Fabien.



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Move PinBuffer and UnpinBuffer to atomics
Next
From: Andres Freund
Date:
Subject: Re: pgbench randomness initialization