Re: Re: [HACKERS] pgbench randomness initialization - Mailing list pgsql-hackers
From | Fabien COELHO |
---|---|
Subject | Re: Re: [HACKERS] pgbench randomness initialization |
Date | |
Msg-id | alpine.DEB.2.20.1801091027030.6718@lancre Whole thread Raw |
In response to | Re: Re: [HACKERS] pgbench randomness initialization (Chapman Flack <chap@anastigmatix.net>) |
Responses |
Re: Re: [HACKERS] pgbench randomness initialization
|
List | pgsql-hackers |
Hello Chapman, Thanks for the review, >> The tests assume that stdlib random/srandom behavior is standard thus >> deterministic between platform. > > Is the behavior of srandom() and the system generator really so precisely > specified that seed 5432 will produce the same values hardcoded in the > tests on all platforms? [...] Good question. I'm hoping that in practice it would be the same, or that their would be very few cases (eg linux vs windows vs macos...). I was counting on the the buildfarm to tell me if I'm wrong, and fix it if needed. > To have the test run pgbench twice with the same seed and compare the > results sounds safer. Interesting idea. The test script would be more complicated to do that, though. I would prefer to bet on "random" determinism (:-) and resort to such a solution only if it is not the case. Or maybe just put back some "\d+" to keep it simple. This is a debatable strategy. > I did some wordsmithing of the doc, which I hope was not presumptuous of > me, only as a conversation starter. [...] Thanks for the doc improvements. > The documentation doesn't say that --random-seed= (or PGBENCH_RANDOM_SEED=) > will have the same effect as 'time', and indeed, I really think it should > be unset (defaulting to 'time'), or 'time', or 'rand', or an integer, > or an error. Ok, done. > The error, right now, says only "expecting an unsigned integer"; it > should also mention time and rand. Ok, done. > Should 'rand' be something that conveys more about its meaning, 'strong' > perhaps? Hmmm. "Random means random":-). I have no opinion about whether it would be better. ISTM that "strong" would require some explanations. I let is as "rand" for now. > The documentation doesn't mention the allowed range of the unsigned > integer (which I get, because 'unsigned int' is exactly the signature > for srandom, but somebody might read "unsigned integer" in a more > generic sense). Ok. I extended so that it works with octal, decimal and hexadecimal, and updated the doc accordingly. I did not add range information though. > I'm not sure what would be a better way to say it. > The base (only decimal, as now in the code) isn't specified either. Sure. > Maybe the documentation should mention that the output now includes the > random seed being used, so that (even without planning ahead) [...] It does so only if the seed is explicitely set, otherwise it keeps the previous behavior of being silent. I added a sentence about that. > Something more may need to be said in the doc about reproducibility. I think > the real story is that a run can be reproduced if the number of clients is > equal to the number of threads. Yes, good point. I tried to hide the issue under the "as far as random numbers are concerned". I've tried to improve this point in the doc. > Although each thread has its own generator state, each client does not > (if there is more than one per thread), and as soon as real select() > delays start happening in CSTATE_WAIT_RESULT, the clients dealt out to a > given thread may not be hitting that thread's generator in a > deterministic order. Yep. This may evolve, for instance the error handling patch needs to restart transactions so it adds a state into the client. -- Fabien.
Attachment
pgsql-hackers by date: