Re: Re: [HACKERS] pgbench randomness initialization - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: Re: [HACKERS] pgbench randomness initialization
Date
Msg-id alpine.DEB.2.20.1801091027030.6718@lancre
Whole thread Raw
In response to Re: Re: [HACKERS] pgbench randomness initialization  (Chapman Flack <chap@anastigmatix.net>)
Responses Re: Re: [HACKERS] pgbench randomness initialization
List pgsql-hackers
Hello Chapman,

Thanks for the review,

>> The tests assume that stdlib random/srandom behavior is standard thus
>> deterministic between platform.
>
> Is the behavior of srandom() and the system generator really so precisely
> specified that seed 5432 will produce the same values hardcoded in the
> tests on all platforms? [...]

Good question.

I'm hoping that in practice it would be the same, or that their would be 
very few cases (eg linux vs windows vs macos...). I was counting on the 
the buildfarm to tell me if I'm wrong, and fix it if needed.

> To have the test run pgbench twice with the same seed and compare the
> results sounds safer.

Interesting idea. The test script would be more complicated to do that, 
though. I would prefer to bet on "random" determinism (:-) and resort to 
such a solution only if it is not the case. Or maybe just put back some 
"\d+" to keep it simple.

This is a debatable strategy.

> I did some wordsmithing of the doc, which I hope was not presumptuous of 
> me, only as a conversation starter. [...]

Thanks for the doc improvements.

> The documentation doesn't say that --random-seed= (or PGBENCH_RANDOM_SEED=)
> will have the same effect as 'time', and indeed, I really think it should
> be unset (defaulting to 'time'), or 'time', or 'rand', or an integer,
> or an error.

Ok, done.

> The error, right now, says only "expecting an unsigned integer"; it 
> should also mention time and rand.

Ok, done.

> Should 'rand' be something that conveys more about its meaning, 'strong' 
> perhaps?

Hmmm. "Random means random":-). I have no opinion about whether it would 
be better. ISTM that "strong" would require some explanations. I let is as 
"rand" for now.

> The documentation doesn't mention the allowed range of the unsigned
> integer (which I get, because 'unsigned int' is exactly the signature
> for srandom, but somebody might read "unsigned integer" in a more
> generic sense).

Ok. I extended so that it works with octal, decimal and hexadecimal, and 
updated the doc accordingly. I did not add range information though.

> I'm not sure what would be a better way to say it.
> The base (only decimal, as now in the code) isn't specified either.

Sure.

> Maybe the documentation should mention that the output now includes the 
> random seed being used, so that (even without planning ahead) [...]

It does so only if the seed is explicitely set, otherwise it keeps the 
previous behavior of being silent. I added a sentence about that.

> Something more may need to be said in the doc about reproducibility. I think
> the real story is that a run can be reproduced if the number of clients is
> equal to the number of threads.

Yes, good point. I tried to hide the issue under the "as far as random 
numbers are concerned". I've tried to improve this point in the doc.

> Although each thread has its own generator state, each client does not 
> (if there is more than one per thread), and as soon as real select() 
> delays start happening in CSTATE_WAIT_RESULT, the clients dealt out to a 
> given thread may not be hitting that thread's generator in a 
> deterministic order.

Yep. This may evolve, for instance the error handling patch needs to 
restart transactions so it adds a state into the client.

-- 
Fabien.
Attachment

pgsql-hackers by date:

Previous
From: Vik Fearing
Date:
Subject: Re: proposal: alternative psql commands quit and exit
Next
From: Fabien COELHO
Date:
Subject: Re: pgbench - add \if support