Thread: [HACKERS] Re: [COMMITTERS] pgsql: Remove pgbench "progress" test pendingsolution of its timing is (fwd)

<Bounced from commits to have it on hackers>

Hello Tom,

> Well, I think it's mostly about valgrind making everything really slow. Since 
> we have seen some passes from skink recently, perhaps there was also a 
> component of more-load-on-the-machine-than-usual.  But in the end this is 
> just evidence for my point that regression tests have to be very much not 
> timing-sensitive.  We run them under all kinds of out-of-the-ordinary stress.

Attached is an attempt at improving the situation:

(1) there are added comments to explain the whys, which is to provide
coverage for pgbench time-related features... while still not being
time-sensitive, which is a challenge.

(2) the test now only expects "progress: \d" from the output, so it is enough 
that one progress is shown, whenever it is shown.

(3) if the test is detected to have gone AWOL, detailed log checks are
coldly skipped.

This would have passed on "skink" under the special conditions it encountered.

I cannot guaranty that it would pass under any circumstance, though.

If it still encounters a failure, ISTM that it should only be a missing 
"progress:" in the output, which has not been encountered so far.

If it occurs, a few options would remain, none of them very convincing:

  - give the test some more time, eg 3 seconds (hmmm... could still fail
    after any time...)

  - drop the "progress:" expectation (hmmm... but then it does not check
    anything).

  - make the "progress:" output check conditional to the running time
    (hmmm... it would require changing the command_checks_all function,
     and there is still a chance that the bench was stuck for 2 seconds
     then given time to realize that it has to stop right now...)

  - use an even simpler transaction, eg "SELECT;" which is less likely to
    get stuck (but still could get stuck...).

For the random-ness related test (--sample-rate), we could add a feature to 
pgbench which allows to control the random seed, so that the number of samples 
could be known in advance for testing purposes.

-- 
Fabien.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
On Sun, Sep 24, 2017 at 11:30 PM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> Attached is an attempt at improving the situation:
>
> (1) there are added comments to explain the whys, which is to provide
> coverage for pgbench time-related features... while still not being
> time-sensitive, which is a challenge.
>
> (2) the test now only expects "progress: \d" from the output, so it is
> enough that one progress is shown, whenever it is shown.
>
> (3) if the test is detected to have gone AWOL, detailed log checks are
> coldly skipped.
>
> This would have passed on "skink" under the special conditions it
> encountered.
>
> I cannot guaranty that it would pass under any circumstance, though.
>
> If it still encounters a failure, ISTM that it should only be a missing
> "progress:" in the output, which has not been encountered so far.
>
> If it occurs, a few options would remain, none of them very convincing:
>
>  - give the test some more time, eg 3 seconds (hmmm... could still fail
>    after any time...)
>
>  - drop the "progress:" expectation (hmmm... but then it does not check
>    anything).
>
>  - make the "progress:" output check conditional to the running time
>    (hmmm... it would require changing the command_checks_all function,
>     and there is still a chance that the bench was stuck for 2 seconds
>     then given time to realize that it has to stop right now...)
>
>  - use an even simpler transaction, eg "SELECT;" which is less likely to
>    get stuck (but still could get stuck...).
>
> For the random-ness related test (--sample-rate), we could add a feature to
> pgbench which allows to control the random seed, so that the number of
> samples could be known in advance for testing purposes.

This didn't get any reviews, so bumped to CF 2018-01.
-- 
Michael


Fabien COELHO <coelho@cri.ensmp.fr> writes:
> Attached is an attempt at improving the situation:

I took a quick look at this and can't really convince myself that it
improves our situation.  The fact that it prints nothing if the runtime
is outside of a tightly constrained range seems likely to hide the sort
of bug you might hope such a test would detect.  A simple example is
that, if there's an off-by-one bug causing the test to run for 3 seconds
not 2, this test script won't complain about it.  Worse, if pgbench dumps
core instantly on startup when given -p, this test script won't complain
about it.  And the test is of no value at all on very slow buildfarm
critters such as valgrind or clobber-cache-always animals.

> (2) the test now only expects "progress: \d" from the output, so it is enough
> that one progress is shown, whenever it is shown.

Hm.  Could we get somewhere by making the test look for that, and
adjusting the loop logic inside pgbench so that (maybe only with the
tested switch values) it's guaranteed to print at least one progress
output regardless of timing, because it won't check for exit until after
it's printed a log message?

> For the random-ness related test (--sample-rate), we could add a feature to
> pgbench which allows to control the random seed, so that the number of samples
> could be known in advance for testing purposes.

Don't see how locking down the seed gets us anywhere.  The behavior of
random() can't be assumed identical across different machines, even
with the same seed.

            regards, tom lane


Hello Tom,

Thanks for having a look at this attempt.

>> Attached is an attempt at improving the situation:
>
> I took a quick look at this and can't really convince myself that it
> improves our situation.  The fact that it prints nothing if the runtime
> is outside of a tightly constrained range seems likely to hide the sort
> of bug you might hope such a test would detect.  A simple example is
> that, if there's an off-by-one bug causing the test to run for 3 seconds
> not 2, this test script won't complain about it.  Worse, if pgbench dumps
> core instantly on startup when given -p, this test script won't complain
> about it.  And the test is of no value at all on very slow buildfarm
> critters such as valgrind or clobber-cache-always animals.

Hmmm.

Note that an instant core does not fall under "too slow" and is caught 
anyway because pgbench return status is checked, and I expect it would not 
be zero when core dumped.

Alas, testing time related features with zero assumption about time is 
quite difficult:-)

The compromise I'm proposing is to skip time-related stuff if too slow. 
The value I see is that it provides coverage for all time-related 
features. I can also add a check that the run is *at least* 2 seconds when 
two seconds are asked for, because otherwise something was certainly 
amiss.

However it cannot do the impossible, i.e. check that something happens 
every second if we cannot assume that some cycles are spared for the 
process within a second. There is no miracle to expect.

>> (2) the test now only expects "progress: \d" from the output, so it is enough
>> that one progress is shown, whenever it is shown.
>
> Hm.  Could we get somewhere by making the test look for that, and
> adjusting the loop logic inside pgbench so that (maybe only with the
> tested switch values) it's guaranteed to print at least one progress
> output regardless of timing, because it won't check for exit until after
> it's printed a log message?

I'll look into ensuring structuraly that at least one progress is shown.

>> For the random-ness related test (--sample-rate), we could add a feature to
>> pgbench which allows to control the random seed, so that the number of samples
>> could be known in advance for testing purposes.
>
> Don't see how locking down the seed gets us anywhere.  The behavior of
> random() can't be assumed identical across different machines, even
> with the same seed.

Indeed, I surmised in between that expecting some portability from 
random() is naïve. Controlling the seed could still be useful for testing 
though, so I have submitted a patch for that.

-- 
Fabien.
Hello Tom,

From my previous answer, to motivate these tests:

> The compromise I'm proposing is to skip time-related stuff if too slow. The 
> value I see is that it provides coverage for all time-related features. I can 
> also add a check that the run is *at least* 2 seconds when two seconds are 
> asked for, because otherwise something was certainly amiss.

Also,

>> Hm.  Could we get somewhere by making the test look for that, and
>> adjusting the loop logic inside pgbench so that (maybe only with the
>> tested switch values) it's guaranteed to print at least one progress
>> output regardless of timing, because it won't check for exit until after
>> it's printed a log message?
>
> I'll look into ensuring structuraly that at least one progress is shown.

How pgbenchs prints a progress if none were printed, or if the last 
progress was over 0.5 seconds ago, so as to have kind of a catchup in the 
end. The progress report generation is moved into a separate function, 
which is an improvement of its own for the readability of threadRun.

Also, I have added a slight behavioral change when under tap testing 
(through an environment variable) to avoid the end of processing shortcut 
when there is nothing to do. This ensures that the -T 2 tap test runs for 
at least 2 seconds, whatever. If the host is overload it might be more, 
but it cannot be less unless something was wrong.

All that is not perfect, but ISTM that having some minimal coverage of 
time-related features is worth the compromise.

-- 
Fabien.
Attachment
Minor rebase after vpath fix (e94f2bc809a0c684185666f19d81f6496e732a3a).

-- 
Fabien.
Attachment
On 18/01/18 12:26, Fabien COELHO wrote:
>>> Hm.  Could we get somewhere by making the test look for that, and
>>> adjusting the loop logic inside pgbench so that (maybe only with the
>>> tested switch values) it's guaranteed to print at least one progress
>>> output regardless of timing, because it won't check for exit until after
>>> it's printed a log message?
>>
>> I'll look into ensuring structuraly that at least one progress is shown.
> 
> How pgbenchs prints a progress if none were printed, or if the last
> progress was over 0.5 seconds ago, so as to have kind of a catchup in the
> end.

I don't understand the 0.5 second rule. For the tests, we only need to 
ensure that at least one progress report is printed, right?

Looking at the code as it exists, I think it already works like that, 
although it's by accident. Not sure though, and if we're going to rely 
on that, it makes sense to make it more explicit.

> The progress report generation is moved into a separate function,
> which is an improvement of its own for the readability of threadRun.

Agreed on that.

> Also, I have added a slight behavioral change when under tap testing
> (through an environment variable) to avoid the end of processing shortcut
> when there is nothing to do. This ensures that the -T 2 tap test runs for
> at least 2 seconds, whatever. If the host is overload it might be more,
> but it cannot be less unless something was wrong.

If you want to write a test that checks that a two-second test takes at 
least two seconds, can't you just not use throttling in that test?

- Heikki


Hello Heikki,

Thanks for having a look at this small patch which aim at improving 
pgbench coverage.

>> How pgbenchs prints a progress if none were printed, or if the last
>> progress was over 0.5 seconds ago, so as to have kind of a catchup in the
>> end.
>
> I don't understand the 0.5 second rule. For the tests, we only need to ensure 
> that at least one progress report is printed, right?

I'm not so sure;-) I do not want to trust the threadRun loop in case of 
heavy load or whatever issue a run may encounter, so this approach ensures 
that structurally there is always one output even of the whole loop went 
very wrong.

It also adds a small feature which is that there is always a final 
progress when the run is completed, which can be useful when computing 
progress statistics, otherwise some transactions could not be reported in 
any progress at all.

> Looking at the code as it exists, I think it already works like that, 
> although it's by accident. Not sure though, and if we're going to rely on 
> that, it makes sense to make it more explicit.

Yep.

Also, by default, when running under throttling for 2 seconds at 20 tps, 
there is a slight chance that no transactions are scheduled and that 
pgbench returns immediately, hence the added variable to ensure that it 
lasts something anyway, and that some minimal reporting is always done 
whatever.

>> when there is nothing to do. This ensures that the -T 2 tap test runs for
>> at least 2 seconds, whatever. If the host is overload it might be more,
>> but it cannot be less unless something was wrong.
>
> If you want to write a test that checks that a two-second test takes at least 
> two seconds, can't you just not use throttling in that test?

Indeed… but then throttling would not be tested:-) The point of the test 
is to exercise all time-related options, including throttling with a 
reasonable small value.

Attached is a rebase.

-- 
Fabien.
Attachment
On 12/07/18 19:00, Fabien COELHO wrote:
>>> How pgbenchs prints a progress if none were printed, or if the last
>>> progress was over 0.5 seconds ago, so as to have kind of a catchup in the
>>> end.
>>
>> I don't understand the 0.5 second rule. For the tests, we only need to ensure
>> that at least one progress report is printed, right?
> 
> I'm not so sure;-) I do not want to trust the threadRun loop in case of
> heavy load or whatever issue a run may encounter, so this approach ensures
> that structurally there is always one output even of the whole loop went
> very wrong.

I still don't understand.

For the testing, we just need to make sure that at least one progress 
report is always printed, if -P is used. Right? So where does the 0.5 
second rule come in? Can't we just do "if (no progress reports were 
printed) { print progress report; }" at the end?

> It also adds a small feature which is that there is always a final
> progress when the run is completed, which can be useful when computing
> progress statistics, otherwise some transactions could not be reported in
> any progress at all.

Any transactions in the last 0.5 seconds might still not get reported in 
any progress reports.

>>> when there is nothing to do. This ensures that the -T 2 tap test runs for
>>> at least 2 seconds, whatever. If the host is overload it might be more,
>>> but it cannot be less unless something was wrong.
>>
>> If you want to write a test that checks that a two-second test takes at least
>> two seconds, can't you just not use throttling in that test?
> 
> Indeed… but then throttling would not be tested:-) The point of the test
> is to exercise all time-related options, including throttling with a
> reasonable small value.

Ok. I don't think that's really worthwhile. If we add some code that 
only runs in testing, then we're not really testing the real thing. I 
wouldn't trust the test to tell much. Let's just leave out that magic 
environment variable thing, and try to get the rest of the patch finished.

- Heikki




>>> I don't understand the 0.5 second rule. For the tests, we only need to 
>>> ensure that at least one progress report is printed, right?
>>
>> [...]
>
> I still don't understand.

Let's look at the code:

    if (progress && thread->tid == 0)
    {
       ...
       if (last_report == thread_start || now - last_report >= 500000)
         doProgressReport(thread, &last, &last_report, now, thread_start);

> For the testing, we just need to make sure that at least one progress report 
> is always printed, if -P is used. Right?

Yep. That is the first condition above the last_report is set to 
thread_start meaning that there has been no report.

> So where does the 0.5 second rule come in? Can't we just do "if (no 
> progress reports were printed) { print progress report; }" at the end?

The second 0.5s condition is to print a closing report if some time 
significant time passed since the last one, but we do not want to print
a report at the same second.

   pgbench -T 5 -P 2

Would then print report at 2, 4 and 5. 0.5 ensures that we are not going 
to do "2 4.0[00] 4.0[01]" on -t whatever -P 2, which would look silly.

If you do not like it then the second condition can be removed, fine with 
me.

>> It also adds a small feature which is that there is always a final
>> progress when the run is completed, which can be useful when computing
>> progress statistics, otherwise some transactions could not be reported in
>> any progress at all.
>
> Any transactions in the last 0.5 seconds might still not get reported in any 
> progress reports.

Yep. I wanted a reasonable threshold, given that both -T and -P are in 
seconds anyway, so it probably could only happen with -t.

>> Indeed… but then throttling would not be tested:-) The point of the test
>> is to exercise all time-related options, including throttling with a
>> reasonable small value.
>
> Ok. I don't think that's really worthwhile. If we add some code that only 
> runs in testing, then we're not really testing the real thing. I wouldn't 
> trust the test to tell much. Let's just leave out that magic environment 
> variable thing, and try to get the rest of the patch finished.

If you remove the environment, then some checks need to be removed, 
because the 2 second run may be randomly shorten when there is nothing to 
do. If not, the test will fail underterminiscally, which is not 
acceptable. Hence the hack. I agree that it is not beautiful.

The more reasonable alternative could be to always last 2 seconds under 
-T 2, even if the execution can be shorten because there is nothing to do 
at all, i.e. remove the environment-based condition but keep the sleep.

-- 
Fabien.
>>> Indeed… but then throttling would not be tested:-) The point of the test
>>> is to exercise all time-related options, including throttling with a
>>> reasonable small value.
>> 
>> Ok. I don't think that's really worthwhile. If we add some code that only 
>> runs in testing, then we're not really testing the real thing. I wouldn't 
>> trust the test to tell much. Let's just leave out that magic environment 
>> variable thing, and try to get the rest of the patch finished.
>
> If you remove the environment, then some checks need to be removed, because 
> the 2 second run may be randomly shorten when there is nothing to do. If not, 
> the test will fail underterminiscally, which is not acceptable. Hence the 
> hack. I agree that it is not beautiful.
>
> The more reasonable alternative could be to always last 2 seconds under -T 2, 
> even if the execution can be shorten because there is nothing to do at all, 
> i.e. remove the environment-based condition but keep the sleep.

Yet another option would be to replace the env variable by an option, eg
"--strict-time", that would be used probaly only by the TAP test, but 
would be an available feature.

-- 
Fabien.
On 12/07/18 21:27, Fabien COELHO wrote:
>> For the testing, we just need to make sure that at least one progress report
>> is always printed, if -P is used. Right?
> 
> Yep. That is the first condition above the last_report is set to
> thread_start meaning that there has been no report.
> 
>> So where does the 0.5 second rule come in? Can't we just do "if (no
>> progress reports were printed) { print progress report; }" at the end?
> 
> The second 0.5s condition is to print a closing report if some time
> significant time passed since the last one, but we do not want to print
> a report at the same second.
> 
>     pgbench -T 5 -P 2
> 
> Would then print report at 2, 4 and 5. 0.5 ensures that we are not going
> to do "2 4.0[00] 4.0[01]" on -t whatever -P 2, which would look silly.
> 
> If you do not like it then the second condition can be removed, fine with
> me.

As the code stands, you would get reports at "2 4.0[00]", right? Let's 
keep it that way. I think the only change we need to make in the logic 
is to check at the end, if *any* progress reports at all have been 
printed, and print one if not. And do that only when the -P option is 
smaller than the -T option, I suppose.

>>> It also adds a small feature which is that there is always a final
>>> progress when the run is completed, which can be useful when computing
>>> progress statistics, otherwise some transactions could not be reported in
>>> any progress at all.
>>
>> Any transactions in the last 0.5 seconds might still not get reported in any
>> progress reports.
> 
> Yep. I wanted a reasonable threshold, given that both -T and -P are in
> seconds anyway, so it probably could only happen with -t.

Oh. I'm a bit surprised we don't support decimals, i.e. -P 0.5. 
Actually, it seems to be acceptd, but it's truncated down to the nearest 
integer. That's not very nice :-(. But it's a separate issue.

>>> Indeed… but then throttling would not be tested:-) The point of the test
>>> is to exercise all time-related options, including throttling with a
>>> reasonable small value.
>>
>> Ok. I don't think that's really worthwhile. If we add some code that only
>> runs in testing, then we're not really testing the real thing. I wouldn't
>> trust the test to tell much. Let's just leave out that magic environment
>> variable thing, and try to get the rest of the patch finished.
> 
> If you remove the environment, then some checks need to be removed,
> because the 2 second run may be randomly shorten when there is nothing to
> do. If not, the test will fail underterminiscally, which is not
> acceptable. Hence the hack. I agree that it is not beautiful.
> 
> The more reasonable alternative could be to always last 2 seconds under
> -T 2, even if the execution can be shorten because there is nothing to do
> at all, i.e. remove the environment-based condition but keep the sleep.

That sounds reasonable. It's a bit silly to wait when there's nothing to 
do, but it's also weird if the test exits before the specified time is 
up. Seems less surprising to always sleep.

- Heikki


Hello Heikki,

> [...] Let's keep it that way. I think the only change we need to make in 
> the logic is to check at the end, if *any* progress reports at all have 
> been printed, and print one if not.

Ok, this simplifies the condition.

> And do that only when the -P option is smaller than the -T option, I 
> suppose.

Yep, why not.

> Oh. I'm a bit surprised we don't support decimals, i.e. -P 0.5. Actually, it 
> seems to be acceptd, but it's truncated down to the nearest integer.

Indeed, "atoi" does not detect errors, and it is true of the many uses in 
pgbench: clients, threads, scale, duration, fillfactor...

> That's not very nice :-(. But it's a separate issue.

Yep. For human consumption, seconds seem okay.

>> The more reasonable alternative could be to always last 2 seconds under
>> -T 2, even if the execution can be shorten because there is nothing to do
>> at all, i.e. remove the environment-based condition but keep the sleep.
>
> That sounds reasonable. It's a bit silly to wait when there's nothing to do, 
> but it's also weird if the test exits before the specified time is up. Seems 
> less surprising to always sleep.

I did that in the attached version: no more environment variable hack, and 
no execution shortcut even if there is nothing to do.

I also had to reproduce the progress logic to keep on printing report of 
(no) progress in this tailing phase.

I'm not very happy because it is a change of behavior. I suggest that I 
could add a special "--strict-time-compliance" option to do this only when 
required... and it would only be required by tap tests.

-- 
Fabien.
Attachment
On 18/07/18 01:43, Fabien COELHO wrote:
>>> The more reasonable alternative could be to always last 2 seconds under
>>> -T 2, even if the execution can be shorten because there is nothing to do
>>> at all, i.e. remove the environment-based condition but keep the sleep.
>>
>> That sounds reasonable. It's a bit silly to wait when there's nothing to do,
>> but it's also weird if the test exits before the specified time is up. Seems
>> less surprising to always sleep.
> 
> I did that in the attached version: no more environment variable hack, and
> no execution shortcut even if there is nothing to do.
> 
> I also had to reproduce the progress logic to keep on printing report of
> (no) progress in this tailing phase.

On second thoughts, there's one problem with this approach of always 
waiting until -T is up. What if all the threads died because of errors? 
For example:

pgbench postgres -T 10 -P 1 -c 2
starting vacuum...end.
client 0 aborted in command 5 (SQL) of script 0; ERROR:  relation 
"pgbench_accounts" does not exist
LINE 1: UPDATE pgbench_accounts SET abalance = abalance + -4963 WHER...
                ^

client 1 aborted in command 5 (SQL) of script 0; ERROR:  relation 
"pgbench_accounts" does not exist
LINE 1: UPDATE pgbench_accounts SET abalance = abalance + -2985 WHER...
                ^

progress: 1.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 2.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 3.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 4.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 5.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 6.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 7.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 8.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 9.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 10.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 2
number of threads: 1
duration: 10 s
number of transactions actually processed: 0


I don't think you want to wait in that situation. I think we should wait 
at the end only if there some threads still alive, with nothing to do 
only because of --rate.

- Heikki


Hello Heikki,

>> I did that in the attached version: no more environment variable hack, and
>> no execution shortcut even if there is nothing to do.
>> 
>> I also had to reproduce the progress logic to keep on printing report of
>> (no) progress in this tailing phase.
>
> On second thoughts, there's one problem with this approach of always waiting 
> until -T is up. What if all the threads died because of errors? For example:

Good corner-case catch! This behavior is indeed silly.

> I don't think you want to wait in that situation. I think we should wait at 
> the end only if there some threads still alive, with nothing to do only 
> because of --rate.

Yep. The attached version does only the tailing stuff under -R and not all 
threads were stopped on errors, with comments to tell about the why.

I'm still wondering about a specific option to explicitely require this 
behavioral change.

-- 
Fabien.
Attachment
On 18/07/18 16:01, Fabien COELHO wrote:
>> I don't think you want to wait in that situation. I think we should wait at
>> the end only if there some threads still alive, with nothing to do only
>> because of --rate.
> 
> Yep. The attached version does only the tailing stuff under -R and not all
> threads were stopped on errors, with comments to tell about the why.

Hmm. How about we just remove this special case from doCustom():

>            case CSTATE_START_THROTTLE:
 > ...
> 
>                 /*
>                  * stop client if next transaction is beyond pgbench end of
>                  * execution
>                  */
>                 if (duration > 0 && st->txn_scheduled > end_time)
>                 {
>                     st->state = CSTATE_FINISHED;
>                     break;
>                 }

That way, we let the client go into CSTATE_THROTTLE state, even though 
we know that the timer will run out before we reach txn_scheduled. Then 
it will work the way we want, right? One small difference is that then 
the clients will keep the connections open longer, until the timer 
expires, but I think that's reasonable. Less surprising than the current 
behavior, even.

- Heikki


Hello Heikki,

>> Yep. The attached version does only the tailing stuff under -R and not all
>> threads were stopped on errors, with comments to tell about the why.
>
> Hmm. How about we just remove this special case from doCustom():
>
>>  case CSTATE_START_THROTTLE:
>>    // ...
>>    if (duration > 0 && st->txn_scheduled > end_time)
>>    {
>>     st->state = CSTATE_FINISHED;
>>     break;
>>    }
>
> That way, we let the client go into CSTATE_THROTTLE state, even though we 
> know that the timer will run out before we reach txn_scheduled. Then it will 
> work the way we want, right? One small difference is that then the clients 
> will keep the connections open longer, until the timer expires, but I think 
> that's reasonable. Less surprising than the current behavior, even.

Hmmm... in this instance, and if this test is removed, ISTM that it can 
start the transaction, re-establishing a connection under --connect, and 
the transaction will run to its end even if it is beyond the expected end 
of run. So removing this test does not seem desirable.

-- 
Fabien.


On 18/07/18 22:56, Fabien COELHO wrote:
> 
> Hello Heikki,
> 
>>> Yep. The attached version does only the tailing stuff under -R and not all
>>> threads were stopped on errors, with comments to tell about the why.
>>
>> Hmm. How about we just remove this special case from doCustom():
>>
>>>   case CSTATE_START_THROTTLE:
>>>     // ...
>>>     if (duration > 0 && st->txn_scheduled > end_time)
>>>     {
>>>     st->state = CSTATE_FINISHED;
>>>     break;
>>>     }
>>
>> That way, we let the client go into CSTATE_THROTTLE state, even though we
>> know that the timer will run out before we reach txn_scheduled. Then it will
>> work the way we want, right? One small difference is that then the clients
>> will keep the connections open longer, until the timer expires, but I think
>> that's reasonable. Less surprising than the current behavior, even.
> 
> Hmmm... in this instance, and if this test is removed, ISTM that it can
> start the transaction, re-establishing a connection under --connect, and
> the transaction will run to its end even if it is beyond the expected end
> of run. So removing this test does not seem desirable.

Can you elaborate? I don't think that's how it works. In threadRun(), we 
have this:

>         for (i = 0; i < nstate; i++)
>         {
>             CState       *st = &state[i];
> 
>             if (st->state == CSTATE_THROTTLE && timer_exceeded)
>             {
>                 /* interrupt client that has not started a transaction */
>                 st->state = CSTATE_FINISHED;
>                 finishCon(st);
>                 remains--;
>             }
>             else if (st->state == CSTATE_SLEEP || st->state == CSTATE_THROTTLE)
 >            ...

As soon as the -T timer is exceeded, the above code will close all 
connections that are in CSTATE_THROTTLE state.

- Heikki


>>> Hmm. How about we just remove this special case from doCustom():
>>>
>>>>   case CSTATE_START_THROTTLE:
>>>>     // ...
>>>>     if (duration > 0 && st->txn_scheduled > end_time)
>>>>     {
>>>>     st->state = CSTATE_FINISHED;
>>>>     break;
>>>>     }
>>> 
>>> That way, we let the client go into CSTATE_THROTTLE state, even though 
>>> we know that the timer will run out before we reach txn_scheduled. 
>>> Then it will work the way we want, right? One small difference is that 
>>> then the clients will keep the connections open longer, until the 
>>> timer expires, but I think that's reasonable. Less surprising than the 
>>> current behavior, even.
>> 
>> Hmmm... in this instance, and if this test is removed, ISTM that it can
>> start the transaction, re-establishing a connection under --connect, and
>> the transaction will run to its end even if it is beyond the expected end
>> of run. So removing this test does not seem desirable.
>
> Can you elaborate? I don't think that's how it works. In threadRun(), we have 
> this:

The state path I want to avoid is, without getting out of doCustom, is:

   CHOOSE_SCRIPT:
     -> START_THROTTLE (i.e. under -R)
   START_THROTTLE:
     update txn_schedule, which happen to be after end_time,
     i.e. the transaction is scheduled after the expected end of the run.
     but if the condition is removed, then proceed directly to
     -> THROTTLE
   THROTTLE:
     if now has passed txn_schedule (we are late), no return, proceed
     directly to -> START_TX
   START_TX:
     -> START_COMMAND
   START_COMMAND: executed... & "return" on SQL, but it is too late
     for stopping the tx now, it has started.

Maybe I missed something, but it looks to me that we can get in the 
situation where a transaction scheduled beyond the end of run is started 
anyway if it happens that it was a little late.

> [... threadRun ...]
> As soon as the -T timer is exceeded, the above code will close all 
> connections that are in CSTATE_THROTTLE state.

So threadRun() would not have the opportunity to stop the scheduled 
transaction, even if beyond the end of run, because it would not have got 
out of doCustom, in the case I outlined above.

-- 
Fabien.


On 18/07/18 23:29, Fabien COELHO wrote:
>>>> Hmm. How about we just remove this special case from doCustom():
>>>>
>>>>>    case CSTATE_START_THROTTLE:
>>>>>      // ...
>>>>>      if (duration > 0 && st->txn_scheduled > end_time)
>>>>>      {
>>>>>     st->state = CSTATE_FINISHED;
>>>>>     break;
>>>>>      }
>>>>
>>>> That way, we let the client go into CSTATE_THROTTLE state, even though
>>>> we know that the timer will run out before we reach txn_scheduled.
>>>> Then it will work the way we want, right? One small difference is that
>>>> then the clients will keep the connections open longer, until the
>>>> timer expires, but I think that's reasonable. Less surprising than the
>>>> current behavior, even.
>>>
>>> Hmmm... in this instance, and if this test is removed, ISTM that it can
>>> start the transaction, re-establishing a connection under --connect, and
>>> the transaction will run to its end even if it is beyond the expected end
>>> of run. So removing this test does not seem desirable.
>>
>> Can you elaborate? I don't think that's how it works. In threadRun(), we have
>> this:
> 
> The state path I want to avoid is, without getting out of doCustom, is:
> 
>     CHOOSE_SCRIPT:
>       -> START_THROTTLE (i.e. under -R)
>     START_THROTTLE:
>       update txn_schedule, which happen to be after end_time,
>       i.e. the transaction is scheduled after the expected end of the run.
>       but if the condition is removed, then proceed directly to
>       -> THROTTLE
>     THROTTLE:
>       if now has passed txn_schedule (we are late), no return, proceed
>       directly to -> START_TX
>     START_TX:
>       -> START_COMMAND
>     START_COMMAND: executed... & "return" on SQL, but it is too late
>       for stopping the tx now, it has started.
> 
> Maybe I missed something, but it looks to me that we can get in the
> situation where a transaction scheduled beyond the end of run is started
> anyway if it happens that it was a little late.
 >
>> [... threadRun ...]
>> As soon as the -T timer is exceeded, the above code will close all
>> connections that are in CSTATE_THROTTLE state.
> 
> So threadRun() would not have the opportunity to stop the scheduled
> transaction, even if beyond the end of run, because it would not have got
> out of doCustom, in the case I outlined above.

I see. Instead of moving to FINISHED state, then, we could stay in 
THROTTLE state, and 'return' out of doCustom(), to give the code in 
threadRun() a chance to detect that the timer is up. Something like the 
attached. (I moved the check after the check for latency_limit, because 
that code updates txn_scheduled. Seems more like a more correct place, 
although that's as a separate issue.)

- Heikki

Attachment
Hello Heikki,

>> [...]
>> So threadRun() would not have the opportunity to stop the scheduled
>> transaction, even if beyond the end of run, because it would not have got
>> out of doCustom, in the case I outlined above.
>
> I see. Instead of moving to FINISHED state, then, we could stay in THROTTLE 
> state, and 'return' out of doCustom(), to give the code in threadRun() a 
> chance to detect that the timer is up. Something like the attached. (I moved 
> the check after the check for latency_limit, because that code updates 
> txn_scheduled. Seems more like a more correct place, although that's as a 
> separate issue.)

Although I think it would works, I do not find it better than the previous 
situation: Before the change throttling simply jumps to finished if time 
is up, while with that option the jump cannot be seen from within doCustom 
and relies on threadRun to do so, which is somehow much harder to see from 
the code because things happen in two functions.

I would rather move state changes from threadRun to doCustom only, so that 
they are all in one place where it is easier to check and understand.

As a PoC, see attached which does that and also records all stats instead 
of trying to be clever, and tries to homogeneise comments somehow. As I 
find it strange that a script can be interrupted in sleep and after a 
shell command, but not in other states, rather documents that once it 
started it will run to its end, and only cut it out before or after, but 
not within. Also, there are no state changes outside doCustom, and 
threadRun only looks at the states for decisions.

-- 
Fabien.
Attachment
> On Thu, Jul 19, 2018 at 2:24 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>
> Hello Heikki,
>
> >> [...]
> >> So threadRun() would not have the opportunity to stop the scheduled
> >> transaction, even if beyond the end of run, because it would not have got
> >> out of doCustom, in the case I outlined above.
> >
> > I see. Instead of moving to FINISHED state, then, we could stay in THROTTLE
> > state, and 'return' out of doCustom(), to give the code in threadRun() a
> > chance to detect that the timer is up. Something like the attached. (I moved
> > the check after the check for latency_limit, because that code updates
> > txn_scheduled. Seems more like a more correct place, although that's as a
> > separate issue.)
>
> Although I think it would works, I do not find it better than the previous
> situation: Before the change throttling simply jumps to finished if time
> is up, while with that option the jump cannot be seen from within doCustom
> and relies on threadRun to do so, which is somehow much harder to see from
> the code because things happen in two functions.
>
> I would rather move state changes from threadRun to doCustom only, so that
> they are all in one place where it is easier to check and understand.
>
> As a PoC, see attached which does that and also records all stats instead
> of trying to be clever, and tries to homogeneise comments somehow. As I
> find it strange that a script can be interrupted in sleep and after a
> shell command, but not in other states, rather documents that once it
> started it will run to its end, and only cut it out before or after, but
> not within. Also, there are no state changes outside doCustom, and
> threadRun only looks at the states for decisions.

Unfortunately, there was no activity over the last few commitfests and the
proposed patch pgbench-tap-progress-6 can't be applied anymore without
conflicts. Fabien, what are your plans about it, could you please post a
rebased version?


> Unfortunately, there was no activity over the last few commitfests and the
> proposed patch pgbench-tap-progress-6 can't be applied anymore without
> conflicts. Fabien, what are your plans about it, could you please post a
> rebased version?

Here it is.

-- 
Fabien.
Attachment
On Sat, Nov 24, 2018 at 05:58:38PM +0100, Fabien COELHO wrote:
>> Unfortunately, there was no activity over the last few commitfests and the
>> proposed patch pgbench-tap-progress-6 can't be applied anymore without
>> conflicts. Fabien, what are your plans about it, could you please post a
>> rebased version?
>
> Here it is.

I have moved this patch to CF 2019-03, waiting for review.
--
Michael

Attachment
Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> Unfortunately, there was no activity over the last few commitfests and the
>> proposed patch pgbench-tap-progress-6 can't be applied anymore without
>> conflicts. Fabien, what are your plans about it, could you please post a
>> rebased version?

> Here it is.

I'm confused about the intended scope of this patch.  The thread title
refers only to adding a regression test, but the actual patch includes
nontrivial C-code changes, and a skim of the recent discussion suggests
that there are some bug fixes involved.  Please clarify.

As I think I made clear already, I am not in favor of adding more
timing-sensitive regression tests here.  I do not think there is
value commensurate with the risk of intermittent test failures.
However, if we're fixing bugs or poor behavior, that's certainly
worth doing.

            regards, tom lane


Hello Tom,

>>> Unfortunately, there was no activity over the last few commitfests and the
>>> proposed patch pgbench-tap-progress-6 can't be applied anymore without
>>> conflicts. Fabien, what are your plans about it, could you please post a
>>> rebased version?
>
>> Here it is.
>
> I'm confused about the intended scope of this patch.  The thread title
> refers only to adding a regression test, but the actual patch includes
> nontrivial C-code changes, and a skim of the recent discussion suggests
> that there are some bug fixes involved.  Please clarify.

The C changes ensures a minimal deterministic behavior under time 
uncertainties in a test, so that there is always at least one output. 
Since a piece of code is needed twice, a function is created, which also 
improves readability.

> As I think I made clear already, I am not in favor of adding more
> timing-sensitive regression tests here.

Currently there are none in pgbench: they were removed because of very 
slow beasts and intermittent failures that you rightfully dislike.

> I do not think there is value commensurate with the risk of intermittent
> test failures.

The main point of these tests is to provide coverage, which is currently 
pretty abysmal all over postgres. The time aspect is *very* loose: if the 
time target is not met the test is ignored, so there should not be 
intermittent test failures, unless there is a bug, obviously.

> However, if we're fixing bugs or poor behavior, that's certainly worth 
> doing.

Hmmm. The underlying "bug", which can be categorized as a "feature", is 
that when the process cannot run in time then some output values may be 
skipped (eg if you run with a progress report every second for two seconds 
you may not have any progress report, and the process may run for more 
than two seconds because the process does not have enough cpu power to 
notice that it can stop). The code change does not fix these per se, but 
ensure that whatever there is one progress report, so that it can be 
tested in the TAP.

-- 
Fabien.


Attached is a rebase.

-- 
Fabien.
Attachment
On 10/03/2019 13:26, Fabien COELHO wrote:
> Attached is a rebase.

I pushed the refactoring now, to extract the progress reporting to a 
separate function. That seems like an improvement independently from the 
rest of the patch. I'll take another look at the rest, probably tomorrow.

- Heikki


> I pushed the refactoring now, to extract the progress reporting to a separate 
> function. That seems like an improvement independently from the rest of the 
> patch.

Sure.

> I'll take another look at the rest, probably tomorrow.

Ok.

Attached is the remainder of the patch rebased to current head.

-- 
Fabien.
Attachment
Hello Heikki,

Indeed, yet again, I forgot the attachement:-(

>> I stared at the new test case for a while, and I must say it looks very 
>> cryptic. It's not exactly this patch's fault - all the pgbench tests are 
>> cryptic -
>
> Perl is cryptic. Regexprs are cryptic.
>
>> but I think we need to do something about that before adding any more 
>> tests. I'm not sure what exactly, but I'd like them to be more like 
>> pg_regress tests, where you have an expected output and you compare it with 
>> the actual output. I realize that's not easy, because there are a lot of 
>> varying numbers in the output, but we've got to do something.
>> 
>> As a good first step, I wish the pgbench() function used named arguments. 
>> [...]
>> 
>> You would have something like this:
>> 
>> my $elapsed = pgbench(
>>  test_name => 'pgbench progress',
>>  opts => '-T 2 -P 1 -l --aggregate-interval=1'
>
> I do not like them much in perl because it changes the code significantly, 
> but why not. That would be another patch anyway.
>
> A lighter but efficient option would be to add a few comments on the larger 
> calls, see attached.

Please really find the attachement, and do not hesitate to share spare 
a few grey cells so that I will not forget about them in the futur:-)

-- 
Fabien.
Attachment
V11 is just a rebase after the reindentation patches.

> Indeed, yet again, I forgot the attachement:-(
>
>>> I stared at the new test case for a while, and I must say it looks very 
>>> cryptic. It's not exactly this patch's fault - all the pgbench tests are 
>>> cryptic -
>> 
>> Perl is cryptic. Regexprs are cryptic.
>> 
>>> but I think we need to do something about that before adding any more 
>>> tests. I'm not sure what exactly, but I'd like them to be more like 
>>> pg_regress tests, where you have an expected output and you compare it 
>>> with the actual output. I realize that's not easy, because there are a lot 
>>> of varying numbers in the output, but we've got to do something.
>>> 
>>> As a good first step, I wish the pgbench() function used named arguments. 
>>> [...]
>>> 
>>> You would have something like this:
>>> 
>>> my $elapsed = pgbench(
>>>  test_name => 'pgbench progress',
>>>  opts => '-T 2 -P 1 -l --aggregate-interval=1'
>> 
>> I do not like them much in perl because it changes the code significantly, 
>> but why not. That would be another patch anyway.
>> 
>> A lighter but efficient option would be to add a few comments on the larger 
>> calls, see attached.
>
> Please really find the attachement, and do not hesitate to share spare a few 
> grey cells so that I will not forget about them in the futur:-)
>
>

-- 
Fabien.
Attachment
Hi,

This patch is registered in 2020-01, but the last message in the thread
seems to be from 2019/05/23. The patch itself seems to be OK (it applies
fine etc.) What do we need to get it over the line, instead of just
moving it to the next one CF over and over?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Hi Fabien,

On 1/4/20 5:00 PM, Tomas Vondra wrote:
> 
> This patch is registered in 2020-01, but the last message in the thread
> seems to be from 2019/05/23. The patch itself seems to be OK (it applies
> fine etc.) What do we need to get it over the line, instead of just
> moving it to the next one CF over and over?

It does not look like the remainder of this patch is going to be 
committed and I don't think it makes sense to keep moving the patch 
indefinitely.

Unless something changes by the end of this CF I'll mark it Returned 
With Feedback.

Regards,
-- 
-David
david@pgmasters.net



>> This patch is registered in 2020-01, but the last message in the thread
>> seems to be from 2019/05/23. The patch itself seems to be OK (it applies
>> fine etc.) What do we need to get it over the line, instead of just
>> moving it to the next one CF over and over?
>
> It does not look like the remainder of this patch is going to be committed 
> and I don't think it makes sense to keep moving the patch indefinitely.
>
> Unless something changes by the end of this CF I'll mark it Returned With 
> Feedback.

I'd be rather unclear about what the actual feedback is, though. I'd 
interpret it as "pg does not care much about code coverage". Most clients 
are in the red on coverage.postgresql.org. I'd like pgbench at least to be 
in the green, but it does not look that it will ever be the case.

-- 
Fabien.p



Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> It does not look like the remainder of this patch is going to be committed 
>> and I don't think it makes sense to keep moving the patch indefinitely.
>> Unless something changes by the end of this CF I'll mark it Returned With 
>> Feedback.

> I'd be rather unclear about what the actual feedback is, though. I'd 
> interpret it as "pg does not care much about code coverage". Most clients 
> are in the red on coverage.postgresql.org. I'd like pgbench at least to be 
> in the green, but it does not look that it will ever be the case.

The reason why the first iteration failed was that it was insufficiently
insensitive to timing.  So the problem for a replacement patch is
"how do you test fundamentally timing-sensitive behavior in a
timing-insensitive way?".  It's not really clear to me that that's
possible, so I don't have a lot of faith that this patch wouldn't fail
as well.

I'm also a bit disturbed that the patch seems to be changing pgbench's
behavior for no reason other than to try to make the test produce the
same results no matter what the actual machine performance is ... which
seems, at best, rather contrary to pgbench's real purpose.  So I wonder,
are those behavioral changes a net win from a user's standpoint?  If not
we're letting the tail wag the dog.  (If they are a win, they ought to
be submitted and defended independently, and maybe there ought to be
some documentation changes alongside.)

I'm not against having better test coverage numbers, but it's a means
to an end not an end in itself.  It has to be balanced against the
amount of effort to be put into testing (as opposed to actually
improving our software).

            regards, tom lane



Hello Tom,

Thanks for your feedback,

>> I'd be rather unclear about what the actual feedback is, though. I'd
>> interpret it as "pg does not care much about code coverage". Most clients
>> are in the red on coverage.postgresql.org. I'd like pgbench at least to be
>> in the green, but it does not look that it will ever be the case.

> The reason why the first iteration failed was that it was insufficiently
> insensitive to timing.

Yes.

> So the problem for a replacement patch is "how do you test fundamentally 
> timing-sensitive behavior in a timing-insensitive way?".

The answer is that it depends on the test objective, and there can be no 
miracle because the test environment is not controlled, in particular it
cannot expect the execution to be responsive (i.e. the host not to be 
overloaded for some reason).

As written in the added test comments, these tests mostly aim at coverage, 
so the time-related part checks are loose, although real.

> It's not really clear to me that that's possible, so I don't have a lot 
> of faith that this patch wouldn't fail as well.

AFAICR I think that it is pretty unlikely to fail.

Many other pg test can fail but mostly don't: some rely on random stuff, 
and you can get unlucky once in a while; Other events can occur such as 
file system full or whatever.

> I'm also a bit disturbed that the patch seems to be changing pgbench's
> behavior for no reason other than to try to make the test produce the
> same results no matter what the actual machine performance is ... which
> seems, at best, rather contrary to pgbench's real purpose.

No, not exactly.

The behavior change is to remove a corner-case optimization which creates 
an exception to -T/-P compliance: when under very low test rate (with -R) 
and -T and -P, pgbench shortens the run (i.e. end before the prescribed 
time) if there is nothing to do in the future, so that progress is not 
shown.

This optimization makes it impossible to test that pgbench complies to -T 
and -P, because it does not always complies. Using -R is useful to avoid 
too much assumption on the test host load.

> So I wonder, are those behavioral changes a net win from a user's 
> standpoint?

Would a user complain that they asked for -T 10 -P 1 but the test ran for 
10 seconds and the got 10 progress reports along the way?

The net win is that the feature they asked for has been tested a little 
before they actually ran it. It is small, but better than nothing.

> If not we're letting the tail wag the dog.  (If they are a win, they 
> ought to be submitted and defended independently, and maybe there ought 
> to be some documentation changes alongside.)

The shorten execution time is a non documented corner case that nobody 
really expects as a feature, IMHO. It is a side effect of the 
implementation. I do not think it is worth documenting.

> I'm not against having better test coverage numbers, but it's a means
> to an end not an end in itself.

Sure, numbers are not an end in themselves.

For me what is not tested should not be expected to work, so I like to 
have most/all lines run at least once by some tests, as a minimal 
insurance that it is not completely broken… which means at least green.

> It has to be balanced against the amount of effort to be put into 
> testing (as opposed to actually improving our software).

I'm all for balanced and meaningful testing, obviously.

However, the current balance results in the coverage status being abysmal. 
I do not think it is the right one.

-- 
Fabien.
On 3/28/20 5:27 AM, Fabien COELHO wrote:
> 
> Hello Tom,
> 
> Thanks for your feedback,
> 
>>> I'd be rather unclear about what the actual feedback is, though. I'd
>>> interpret it as "pg does not care much about code coverage". Most 
>>> clients
>>> are in the red on coverage.postgresql.org. I'd like pgbench at least 
>>> to be
>>> in the green, but it does not look that it will ever be the case.
> 
>> The reason why the first iteration failed was that it was insufficiently
>> insensitive to timing.

This patch has been marked Returned with Feedback.

If the TAP tests could be made to work without the special exceptions 
added to pgbench.c I think this patch would have a better chance.

Regards,
-- 
-David
david@pgmasters.net



> On 8 Apr 2020, at 14:58, David Steele <david@pgmasters.net> wrote:
>
> On 3/28/20 5:27 AM, Fabien COELHO wrote:
>> Hello Tom,
>> Thanks for your feedback,
>>>> I'd be rather unclear about what the actual feedback is, though. I'd
>>>> interpret it as "pg does not care much about code coverage". Most clients
>>>> are in the red on coverage.postgresql.org. I'd like pgbench at least to be
>>>> in the green, but it does not look that it will ever be the case.
>>> The reason why the first iteration failed was that it was insufficiently
>>> insensitive to timing.
>
> This patch has been marked Returned with Feedback.
>
> If the TAP tests could be made to work without the special exceptions added to pgbench.c I think this patch would
havea better chance. 

Turns out it wasn't, but I'm doing that now since this has been stalled since.

cheers ./daniel