Thread: pgbench randomness initialization
Hi, pondering http://archives.postgresql.org/message-id/CA%2BTgmoZJdA6K7-17K4A48rVB0UPR98HVuaNcfNNLrGsdb1uChg%40mail.gmail.com et al I was wondering why it's a good idea for pgbench to doINSTR_TIME_SET_CURRENT(start_time);srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time)); to initialize randomness and thenfor (i = 0; i < nthreads; i++) thread->random_state[0] = random(); thread->random_state[1]= random(); thread->random_state[2] = random(); to initialize the individual thread random state which is then used by pg_erand48(). To me it seems better to instead initialize srandom() with a known value (say, uh, 0). Or even better don't use random() at all, and fill a global pg_erand48() with a known state; and use pg_erand48() to initialize the thread states. Obviously that doesn't make pgbench entirely reproducible, but it seems a lot better than now. Individual threads would do work in a reproducible order. I see very little reason to have the current behaviour, or at the very least not by default. Greetings, Andres Freund
Hello Andres, > et al I was wondering why it's a good idea for pgbench to do > INSTR_TIME_SET_CURRENT(start_time); > srandom((unsigned int) INSTR_TIME_GET_MICROSEC(start_time)); > to initialize randomness and then > for (i = 0; i < nthreads; i++) > thread->random_state[0] = random(); > thread->random_state[1] = random(); > thread->random_state[2] = random(); > to initialize the individual thread random state which is then used by > pg_erand48(). > > To me it seems better to instead initialize srandom() with a known value > (say, uh, 0). Or even better don't use random() at all, and fill a > global pg_erand48() with a known state; and use pg_erand48() to > initialize the thread states. > > Obviously that doesn't make pgbench entirely reproducible, but it seems > a lot better than now. Individual threads would do work in a > reproducible order. > > I see very little reason to have the current behaviour, or at the very > least not by default. I think that it depends on what you want, which may vary: (1) "exactly" reproducible runs, but one run may hit a particular steady state not representative of what happens ingeneral. (2) runs which really vary from one to the next, so as to have an idea about how much it may vary, what is the performancestability. Currently pgbench focusses on (2), which may or may not be fine depending on what you are doing. From a personal point of view I think that (2) is more significant to collect performance data, even if the results are more unstable: that simply reflects reality and its intrinsic variations, so I'm fine that as the default. Now for those interested in (1) for some reason, I would suggest to rely a PGBENCH_RANDOM_SEED environment variable or --random-seed option which could be used to have a oxymoronic "deterministic randomness", if desired. I do not think that it should be the default, though. -- Fabien.
On 2016-04-07 11:56:12 +0200, Fabien COELHO wrote: > (2) runs which really vary from one to the next, so as > to have an idea about how much it may vary, what is the > performance stability. I don't think this POV makes all that much sense. If you do something non-comparable, then the results aren't, uh, comparable. Which also means there's a lower chance to reproduce observed problems. > Currently pgbench focusses on (2), which may or may not be fine depending on > what you are doing. From a personal point of view I think that (2) is more > significant to collect performance data, even if the results are more > unstable: that simply reflects reality and its intrinsic variations, so I'm > fine that as the default. Uh, and what's the benefit of that variability? pgbench isn't a reality simulation tool, it's a benchmarking tool. And benchmarks with intrisinc variability are bad benchmarks. Greetings, Andres Freund
>> (2) runs which really vary from one to the next, so as >> to have an idea about how much it may vary, what is the >> performance stability. > > I don't think this POV makes all that much sense. If you do something > non-comparable, then the results aren't, uh, comparable. Which also > means there's a lower chance to reproduce observed problems. That also means that you are likely not to hit them if you always do the very same run... Moreover, the Monte Carlo method requires randomness for its convergence result. >> Currently pgbench focusses on (2), which may or may not be fine depending on >> what you are doing. From a personal point of view I think that (2) is more >> significant to collect performance data, even if the results are more >> unstable: that simply reflects reality and its intrinsic variations, so I'm >> fine that as the default. > > Uh, and what's the benefit of that variability? pgbench isn't a reality > simulation tool, it's a benchmarking tool. And benchmarks with intrisinc > variability are bad benchmarks. From a statistical perspective, one run does not mean anything. If you do the exact same run over and over again, then all mathematical results about (slow) convergence towards the average are lost. This is like trying to survey a population by asking the questions to the same person over and over: the result will be biased. Now when you develop, which is the use case you probably have in mind, you want to compare two pg version and check for the performance impact, so having the exact same run seems like a proxy to quickly check for that. However, from a stastistical perspective this is just heresy: you may do a change which improves one given run at the expense of all possible others and you would not know it: Say for instance that there are two different behaviors depending on something, then you will check against one of them only. So I have no mathematical doubt that changing the seed is the right default setting, thus I think that the current behavior is fine. However I'm okay if someone wants to control the randomness for some reason (maybe having "less sure" results, but quickly), so it could be allowed somehow. -- Fabien.
On 2016-04-07 12:25:58 +0200, Fabien COELHO wrote: > > >> (2) runs which really vary from one to the next, so as > >> to have an idea about how much it may vary, what is the > >> performance stability. > > > >I don't think this POV makes all that much sense. If you do something > >non-comparable, then the results aren't, uh, comparable. Which also > >means there's a lower chance to reproduce observed problems. > > That also means that you are likely not to hit them if you always do the > very same run... If you run the test for longer... Or explicitly iterate over IVs. At the very least we need to make pgbench output the IV used, to have some chance of repeating tests. > >Uh, and what's the benefit of that variability? pgbench isn't a reality > >simulation tool, it's a benchmarking tool. And benchmarks with intrisinc > >variability are bad benchmarks. > > From a statistical perspective, one run does not mean anything. If you do > the exact same run over and over again, then all mathematical results about > (slow) convergence towards the average are lost. This is like trying to > survey a population by asking the questions to the same person over and > over: the result will be biased. That comparison pretty much invalidates any point you're making, it's that bad. > Now when you develop, which is the use case you probably have in mind, you > want to compare two pg version and check for the performance impact, so > having the exact same run seems like a proxy to quickly check for that. It's not about "quickly" checking for something. If you look at the results in thread mentioned in the OP, the order of operations drastically and *PERSISTENTLY* changes the observations. Causing *days* of work lost. > However, from a stastistical perspective this is just heresy: you may do a > change which improves one given run at the expense of all possible others > and you would not know it: Say for instance that there are two different > behaviors depending on something, then you will check against one of them > only. Meh. That assumes that we're doing a huge number of pgbench runs; but usually people do maybe a handful. Tops. If you're trying to defend against scenarios like that you need to design your tests so that you'll encounter such problems by running longer. > So I have no mathematical doubt that changing the seed is the right default > setting, thus I think that the current behavior is fine. However I'm okay if > someone wants to control the randomness for some reason (maybe having "less > sure" results, but quickly), so it could be allowed somehow. There might be some statistics arguments, but I think they're pretty ignoring reality. Greetings, Andres Freund
Hello Andres, > If you run the test for longer... Or explicitly iterate over IVs. At the > very least we need to make pgbench output the IV used, to have some > chance of repeating tests. Note that I'm not against providing a way to repeat tests "exactly", and I have suggested two means: environment variable and/or option. > [...] That comparison pretty much invalidates any point you're making, > it's that bad. At least it is simple, if simplistic. Here is another one: I knew a financial institution which needed to evaluate the VAR of exotic financial products every night. They relied on MC for that. Alas, it was not converging quickly enough, results were unstable, so they took your advice: they froze the seed. Day after day the results were mostly the same, the VAR was stable one morning to the other, the management is happy, the risks were under control... That was in the mid 2000s:-) >> However, from a stastistical perspective this is just heresy: you may do a >> change which improves one given run at the expense of all possible others >> and you would not know it: Say for instance that there are two different >> behaviors depending on something, then you will check against one of them >> only. > > Meh. That assumes that we're doing a huge number of pgbench runs; A number of, not necessarily "huge". Or averaging a lot of intermediate values and having a hard look at the distribution, not just the final tps number. > but usually people do maybe a handful. Tops. If you're trying to defend > against scenarios like that you need to design your tests so that you'll > encounter such problems by running longer. People usually do a lot of things, does not mean that it is "right". >> So I have no mathematical doubt that changing the seed is the right >> default setting, thus I think that the current behavior is fine. >> However I'm okay if someone wants to control the randomness for some >> reason (maybe having "less sure" results, but quickly), so it could be >> allowed somehow. > > There might be some statistics arguments, Yep, there is. > but I think they're pretty ignoring reality. Hmmm. If reality wants to ignore mathematics, usually it looses, so this will not be with my blessing. Note that as a committer you do not need me to freeze the seed. I'm just providing an opinion backed by mathematical proofs. -- Fabien.
On Thu, Apr 7, 2016 at 5:56 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > I think that it depends on what you want, which may vary: > > (1) "exactly" reproducible runs, but one run may hit a particular > steady state not representative of what happens in general. > > (2) runs which really vary from one to the next, so as > to have an idea about how much it may vary, what is the > performance stability. > > Currently pgbench focusses on (2), which may or may not be fine depending on > what you are doing. From a personal point of view I think that (2) is more > significant to collect performance data, even if the results are more > unstable: that simply reflects reality and its intrinsic variations, so I'm > fine that as the default. > > Now for those interested in (1) for some reason, I would suggest to rely a > PGBENCH_RANDOM_SEED environment variable or --random-seed option which could > be used to have a oxymoronic "deterministic randomness", if desired. > I do not think that it should be the default, though. I agree entirely. If performance is erratic, that's actually something you want to discover during benchmarking. If different pgbench runs (of non-trivial length) are producing substantially different results, then that's really a problem we need to fix, not just adjust pgbench to cover it up. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2016-04-07 08:58:16 -0400, Robert Haas wrote: > On Thu, Apr 7, 2016 at 5:56 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > I think that it depends on what you want, which may vary: > > > > (1) "exactly" reproducible runs, but one run may hit a particular > > steady state not representative of what happens in general. > > > > (2) runs which really vary from one to the next, so as > > to have an idea about how much it may vary, what is the > > performance stability. > > > > Currently pgbench focusses on (2), which may or may not be fine depending on > > what you are doing. From a personal point of view I think that (2) is more > > significant to collect performance data, even if the results are more > > unstable: that simply reflects reality and its intrinsic variations, so I'm > > fine that as the default. > > > > Now for those interested in (1) for some reason, I would suggest to rely a > > PGBENCH_RANDOM_SEED environment variable or --random-seed option which could > > be used to have a oxymoronic "deterministic randomness", if desired. > > I do not think that it should be the default, though. > > I agree entirely. If performance is erratic, that's actually > something you want to discover during benchmarking. If different > pgbench runs (of non-trivial length) are producing substantially > different results, then that's really a problem we need to fix, not > just adjust pgbench to cover it up. It's not about "covering it up"; it's about actually being able to take action based on benchmark results, and about practically being able to run benchmarks. The argument above means essentially that we need to run a significant number of pgbench runs for *anything*, because running them 3-5 times before/after just isn't meaningful enough. It means that you can't separate between OS caused, and pgbench order caused performance differences. I agree that it's a horrid problem that we can get half the throughput dependent on large machines, dependant on the ordering. But without running queries in the same order before/after a patch there's no way to validate whether $patch caused the problem. And no way to reliably trigger problematic scenarios. I also agree that it's important to be able to vary workloads. But if you do so, you should do so in the same order, both pre/post a patch. Afaics the prime use of pgbench is validation of the performance effects of patches; therefore it should be usable for that, and it's not. Greetings, Andres Freund
On Thu, Apr 7, 2016 at 9:15 AM, Andres Freund <andres@anarazel.de> wrote: > It's not about "covering it up"; it's about actually being able to take > action based on benchmark results, and about practically being able to > run benchmarks. The argument above means essentially that we need to run > a significant number of pgbench runs for *anything*, because running > them 3-5 times before/after just isn't meaningful enough. > > It means that you can't separate between OS caused, and pgbench order > caused performance differences. I'm not objecting to adding an option for this; but I think Fabien is right that it shouldn't be the default. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Andres Freund <andres@anarazel.de> writes: > On 2016-04-07 12:25:58 +0200, Fabien COELHO wrote: >> So I have no mathematical doubt that changing the seed is the right default >> setting, thus I think that the current behavior is fine. However I'm okay if >> someone wants to control the randomness for some reason (maybe having "less >> sure" results, but quickly), so it could be allowed somehow. > There might be some statistics arguments, but I think they're pretty > ignoring reality. Sorry, but I think Fabien is right and you are wrong. There is no point in having randomness in there at all if the thing is constrained to generate the same "random" sequence every time. I don't object to having an option to force the initial seed, but it should not be the default, and it most certainly should not be the only behavior. regards, tom lane
On 2016-04-07 09:46:27 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2016-04-07 12:25:58 +0200, Fabien COELHO wrote: > >> So I have no mathematical doubt that changing the seed is the right default > >> setting, thus I think that the current behavior is fine. However I'm okay if > >> someone wants to control the randomness for some reason (maybe having "less > >> sure" results, but quickly), so it could be allowed somehow. > > > There might be some statistics arguments, but I think they're pretty > > ignoring reality. > > Sorry, but I think Fabien is right and you are wrong. Given that it's 3:1 so far, you might be right... > There is no point in having randomness in there at all if the thing is > constrained to generate the same "random" sequence every time. but that argument seems pretty absurd. It's obviously different to query for different rows over a run, rather than querying the same row again and again, in all backends. The reason we use randomness is to avoid easily discernible patterns in querying. Without randomness, how would you do that? Greetings, Andres Freund
>> It means that you can't separate between OS caused, and pgbench order >> caused performance differences. > > I'm not objecting to adding an option for this; but I think Fabien is > right that it shouldn't be the default. Yep. Andres, attached is a simple POC with an option & environment variable (whereas I should rather have looked at the current checkpointer/vacuum issue which I have reproduced:-(). While testing it I had a funny pattern, something like: pgbench --random-seed=123 -M prepared -T 3 -P 1 -S 1.0: 600 tps 2.0: 600 tps 3.0: 600 tps First rerun just after: pgbench --random-seed=123 -M prepared -T 3 -P 1 -S 1.0: 1800 tps 2.0: 600 tps 3.0: 600 tps The first rerun hits the same pages, so the first 1800 transactions are run in one second, and then it is new pages which are loaded so the performance goes down. Second rerun just after: pgbench --random-seed=123 -M prepared -T 3 -P 1 -S 1.0: 1800 tps 2.0: 1400 tps 3.0: 600 tps The second redun hits the same 3000 transactions than the previous one in about 1.7 seconds, then goes back to 600 tps for new pages... After more iterations the performance is 1800 tps during the 3 seconds. This clearly illustrates that it should be used with caution. -- Fabien.
Fabien COELHO wrote: > While testing it I had a funny pattern, something like: > > pgbench --random-seed=123 -M prepared -T 3 -P 1 -S > 1.0: 600 tps > 2.0: 600 tps > 3.0: 600 tps The output should include the random seed used, whether it was passed with --random-seed, environment variable or randomly determined. That way, the user that later wants to verify why a particular run caused some particular misbehavior knows what seed to use to reproduce that run. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello Alvaro, I revive this patch because controlling the seed is useful for tap testing pgbench. > The output should include the random seed used, whether it was passed > with --random-seed, environment variable or randomly determined. That > way, the user that later wants to verify why a particular run caused > some particular misbehavior knows what seed to use to reproduce that > run. Yep. Here is a new version which output use used seed when a seed is explicitely set with an option or from the environment. However, the default (current) behavior remains silent, so no visible changes unless tinkering with it. The patch also allows to use a "strong" random for seeding the PRNG, thanks to pg_strong_random(). The tests assume that stdlib random/srandom behavior is standard thus deterministic between platform. -- Fabien.
Attachment
> Here is a new version which output use used seed when a seed is explicitely > set with an option or from the environment. It is even better without xml typos, with simpler coding and the doc in the right place... Sorry for the noise. -- Fabien.
Attachment
On 01/02/18 05:57, Fabien COELHO wrote: >> Here is a new version which output use used seed when a seed is >> explicitely set with an option or from the environment. This is a simple patch that does what it says on the tin. I ran into trouble with the pgbench TAP test *even before applying the patch*, but only because I was doing a VPATH build as a user without 'write' on the source tree (001_pgbench_with_server.pl tried to make pgbench create log files there). Bad me. Oddly, that was the only test in the whole tree to have such an issue, so here I add a pre-patch to fix that. Now my review needs a review. :) With that taken care of, the added tests all pass for me. I wonder, though: > The tests assume that stdlib random/srandom behavior is standard thus > deterministic between platform. Is the behavior of srandom() and the system generator really so precisely specified that seed 5432 will produce the same values hardcoded in the tests on all platforms? It does on mine, but could it produce spurious test failures for others? I guess the (or a) spec would be here: http://pubs.opengroup.org/onlinepubs/7908799/xsh/initstate.html It specifies a "non-linear additive feedback random-number generator employing a default state array size of 31 long integers", but it does not pin down parameters or claim only one candidate exists. To have the test run pgbench twice with the same seed and compare the results sounds safer. This revised pgbench-seed-4.patch changes the code not at all, and the TAP test only in whitespace. I did some wordsmithing of the doc, which I hope was not presumptuous of me, only as a conversation starter. I expanded the second sentence because on my first reading I wasn't quite sure of its meaning. Once I had looked at the code, I could see that the sentence was economical and precise already, but I tried to find a version that would also have been clear to the me-before-I-looked. The documentation doesn't say that --random-seed= (or PGBENCH_RANDOM_SEED=) will have the same effect as 'time', and indeed, I really think it should be unset (defaulting to 'time'), or 'time', or 'rand', or an integer, or an error. The error, right now, says only "expecting an unsigned integer"; it should also mention time and rand. Should 'rand' be something that conveys more about its meaning, 'strong' perhaps? The documentation doesn't mention the allowed range of the unsigned integer (which I get, because 'unsigned int' is exactly the signature for srandom, but somebody might read "unsigned integer" in a more generic sense). I'm not sure what would be a better way to say it. The base (only decimal, as now in the code) isn't specified either. Maybe the documentation should mention that the output now includes the random seed being used, so that (even without planning ahead) a session can be re-run with the same seed if necessary. It could just say "an unsigned integer in decimal, as it is shown in pgbench's output" or something like that. Something more may need to be said in the doc about reproducibility. I think the real story is that a run can be reproduced if the number of clients is equal to the number of threads. Although each thread has its own generator state, each client does not (if there is more than one per thread), and as soon as real select() delays start happening in CSTATE_WAIT_RESULT, the clients dealt out to a given thread may not be hitting that thread's generator in a deterministic order. -Chap
Attachment
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation: not tested Initial review is the message this is in reply to. The new status of this patch is: Waiting on Author
Hello Chapman, Thanks for the review, >> The tests assume that stdlib random/srandom behavior is standard thus >> deterministic between platform. > > Is the behavior of srandom() and the system generator really so precisely > specified that seed 5432 will produce the same values hardcoded in the > tests on all platforms? [...] Good question. I'm hoping that in practice it would be the same, or that their would be very few cases (eg linux vs windows vs macos...). I was counting on the the buildfarm to tell me if I'm wrong, and fix it if needed. > To have the test run pgbench twice with the same seed and compare the > results sounds safer. Interesting idea. The test script would be more complicated to do that, though. I would prefer to bet on "random" determinism (:-) and resort to such a solution only if it is not the case. Or maybe just put back some "\d+" to keep it simple. This is a debatable strategy. > I did some wordsmithing of the doc, which I hope was not presumptuous of > me, only as a conversation starter. [...] Thanks for the doc improvements. > The documentation doesn't say that --random-seed= (or PGBENCH_RANDOM_SEED=) > will have the same effect as 'time', and indeed, I really think it should > be unset (defaulting to 'time'), or 'time', or 'rand', or an integer, > or an error. Ok, done. > The error, right now, says only "expecting an unsigned integer"; it > should also mention time and rand. Ok, done. > Should 'rand' be something that conveys more about its meaning, 'strong' > perhaps? Hmmm. "Random means random":-). I have no opinion about whether it would be better. ISTM that "strong" would require some explanations. I let is as "rand" for now. > The documentation doesn't mention the allowed range of the unsigned > integer (which I get, because 'unsigned int' is exactly the signature > for srandom, but somebody might read "unsigned integer" in a more > generic sense). Ok. I extended so that it works with octal, decimal and hexadecimal, and updated the doc accordingly. I did not add range information though. > I'm not sure what would be a better way to say it. > The base (only decimal, as now in the code) isn't specified either. Sure. > Maybe the documentation should mention that the output now includes the > random seed being used, so that (even without planning ahead) [...] It does so only if the seed is explicitely set, otherwise it keeps the previous behavior of being silent. I added a sentence about that. > Something more may need to be said in the doc about reproducibility. I think > the real story is that a run can be reproduced if the number of clients is > equal to the number of threads. Yes, good point. I tried to hide the issue under the "as far as random numbers are concerned". I've tried to improve this point in the doc. > Although each thread has its own generator state, each client does not > (if there is more than one per thread), and as soon as real select() > delays start happening in CSTATE_WAIT_RESULT, the clients dealt out to a > given thread may not be hitting that thread's generator in a > deterministic order. Yep. This may evolve, for instance the error handling patch needs to restart transactions so it adds a state into the client. -- Fabien.
Attachment
> This is a simple patch that does what it says on the tin. I ran into > trouble with the pgbench TAP test *even before applying the patch*, but > only because I was doing a VPATH build as a user without 'write' > on the source tree (001_pgbench_with_server.pl tried to make pgbench > create log files there). Bad me. Oddly, that was the only test in the > whole tree to have such an issue, so here I add a pre-patch to fix that. > Now my review needs a review. :) Yep. I find the multiple chdir solution a little bit too extreme. ISTM that it should rather add the correct path to --log-prefix by prepending $node->basedir, like the pgbench function does for -f scripts. See attached. -- Fabien.
Attachment
Here is a rebase, plus some more changes: I have improved the error message to tell from where the value was provided. I have removed the test to the exact values produced from the expression test run. I have added a test which run from the same seed value several times and checks that the output values are the same. -- Fabien.
Attachment
> Here is a rebase, plus some more changes: > > I have improved the error message to tell from where the value was provided. > > I have removed the test to the exact values produced from the expression test > run. > > I have added a test which run from the same seed value several times > and checks that the output values are the same. This version adds a :random_seed script variable, for information. -- Fabien.
Attachment
>> Here is a rebase, plus some more changes: >> >> I have improved the error message to tell from where the value was >> provided. >> >> I have removed the test to the exact values produced from the expression >> test run. >> >> I have added a test which run from the same seed value several times >> and checks that the output values are the same. > > This version adds a :random_seed script variable, for information. Rebased. -- Fabien.
Attachment
Fabien COELHO <coelho@cri.ensmp.fr> writes: >> This is a simple patch that does what it says on the tin. I ran into >> trouble with the pgbench TAP test *even before applying the patch*, but >> only because I was doing a VPATH build as a user without 'write' >> on the source tree (001_pgbench_with_server.pl tried to make pgbench >> create log files there). Bad me. Oddly, that was the only test in the >> whole tree to have such an issue, so here I add a pre-patch to fix that. >> Now my review needs a review. :) > Yep. I find the multiple chdir solution a little bit too extreme. > ISTM that it should rather add the correct path to --log-prefix by > prepending $node->basedir, like the pgbench function does for -f scripts. > See attached. Hm ... so I tried to replicate this problem, and failed to: the log files get made under the VPATH build directory, as desired, even without this patch. Am I doing something wrong, or is this platform-dependent somehow? regards, tom lane
Hello Tom, > Fabien COELHO <coelho@cri.ensmp.fr> writes: >>> This is a simple patch that does what it says on the tin. I ran into >>> trouble with the pgbench TAP test *even before applying the patch*, but >>> only because I was doing a VPATH build as a user without 'write' >>> on the source tree (001_pgbench_with_server.pl tried to make pgbench >>> create log files there). Bad me. Oddly, that was the only test in the >>> whole tree to have such an issue, so here I add a pre-patch to fix that. >>> Now my review needs a review. :) > >> Yep. I find the multiple chdir solution a little bit too extreme. > >> ISTM that it should rather add the correct path to --log-prefix by >> prepending $node->basedir, like the pgbench function does for -f scripts. >> See attached. > > Hm ... so I tried to replicate this problem, and failed to: the log files > get made under the VPATH build directory, as desired, even without this > patch. Am I doing something wrong, or is this platform-dependent somehow? As I recall, it indeed works if the source directories are rw, but fails if they are ro because then the local mkdir fails. So you would have to do a vpath build with sources that are ro to get the issue the patch is fixing. Otherwise, the issue would have been cought earlier by the buildfarm, which I guess is doing vpath compilation and full validation. -- Fabien.
Fabien COELHO <coelho@cri.ensmp.fr> writes: >> Hm ... so I tried to replicate this problem, and failed to: the log files >> get made under the VPATH build directory, as desired, even without this >> patch. Am I doing something wrong, or is this platform-dependent somehow? > As I recall, it indeed works if the source directories are rw, but fails > if they are ro because then the local mkdir fails. So you would have to do > a vpath build with sources that are ro to get the issue the patch is > fixing. Ah, right you are. Apparently, if the source is rw, the temporary files in question are made there but immediately deleted, so the bug isn't obvious. Fix verified and pushed. regards, tom lane
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation: tested, passed The patch 8 works and addresses the things I noticed earlier. It needs s/explicitely/explicitly/ in the docs. The parsing of the seed involves matters of taste, I guess: if it were a signed int, then sscanf's built-in %i would do everything those three explicit hex/octal/decimal branches do, but there's no unsigned version of %i. Then there's strtoul(..., base=0), which accepts the same choice of bases, but there's no unsigned-int-width version of that. Maybe it would still look cleaner to use strtoul and just check that the result fits in unsigned int? As I began, it comes down to taste ... this code does work. I am not sure about the "idem for :random_seed" part: Does this mean that a value could be given with -Drandom_seed on the command line, and become the value of :random_seed, possibly different from the value given to --random-seed? Is that intended? (Perhaps it is; I'm merely asking.) -Chap The new status of this patch is: Waiting on Author
Hello Chapman, Here is v9. > It needs s/explicitely/explicitly/ in the docs. Done. > The parsing of the seed involves matters of taste, I guess: if it were a > signed int, then sscanf's built-in %i would do everything those three > explicit hex/octal/decimal branches do, but there's no unsigned version > of %i. Then there's strtoul(..., base=0), which accepts the same choice > of bases, but there's no unsigned-int-width version of that. Maybe it > would still look cleaner to use strtoul and just check that the result > fits in unsigned int? As I began, it comes down to taste ... this code > does work. I must admit that I'm not too happy with the result as well, so I dropped the octal/hexadecimal parsing. > I am not sure about the "idem for :random_seed" part: Does this mean that a value > could be given with -Drandom_seed on the command line, and become the value > of :random_seed, possibly different from the value given to --random-seed? > Is that intended? (Perhaps it is; I'm merely asking.) The "idem" is about setting the variable but not overwritting it if it already exists. The intention is that :random_seed is the random seed, unless the user set it to something else in which case it is the user's value. I've improved the variable description in the doc to point out that the value may be overwritten with -D. -- Fabien.
Attachment
I'm sorry, I must have missed your reply on the 5th somehow. On 03/05/18 07:01, Fabien COELHO wrote: > I must admit that I'm not too happy with the result as well, so I dropped > the octal/hexadecimal parsing. That seems perfectly reasonable to me; perfectly adequate to accept only one base. But now the documentation is back to its original state of silence on what base or how many bases might be allowed. Could it just say "or an unsigned decimal integer value"? Then no one will wonder. > The "idem" is about setting the variable but not overwritting it if it > already exists. The intention is that :random_seed is the random seed, > unless the user set it to something else in which case it is the user's > value. I've improved the variable description in the doc to point out that > the value may be overwritten with -D. Ok. -Chap
> But now the documentation is back to its original state of silence on > what base or how many bases might be allowed. Could it just say > "or an unsigned decimal integer value"? Then no one will wonder. Done in the attached. Thanks for the reviews. -- Fabien.
Attachment
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, failed Spec compliant: not tested Documentation: tested, failed This is a simple patch, includes documentation, includes and passes tests, and, in my rookie opinion, is ready for committer. The new status of this patch is: Ready for Committer
Patch isn't applyed cleanly anymore. Fabien COELHO wrote: > >> But now the documentation is back to its original state of silence on >> what base or how many bases might be allowed. Could it just say >> "or an unsigned decimal integer value"? Then no one will wonder. > > Done in the attached. > > Thanks for the reviews. > -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
> Patch isn't applyed cleanly anymore. Indeed. Here is a rebase. All pgbench patches conflict about test cases. -- Fabien.
Attachment
>> Patch isn't applyed cleanly anymore. > > Indeed. Here is a rebase. All pgbench patches conflict about test cases. Patch v12, yet another rebase. -- Fabien.
Attachment
Thank you, pushed Fabien COELHO wrote: > >>> Patch isn't applyed cleanly anymore. >> >> Indeed. Here is a rebase. All pgbench patches conflict about test cases. > > Patch v12, yet another rebase. > -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/