Thread: [HACKERS] pgbench regression test failure

[HACKERS] pgbench regression test failure

From
Tom Lane
Date:
francolin just showed a non-reproducing failure in the new pgbench tests:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=francolin&dt=2017-09-12%2014%3A00%3A02

The relevant part of the log is

# Running: pgbench -T 2 -P 1 -l --log-prefix=001_pgbench_log_1 --aggregate-interval=1 -S -b se@2 --rate=20
--latency-limit=1000-j 2 -c 3 -r 
ok 198 - pgbench progress status (got 0 vs expected 0)
ok 199 - pgbench progress stdout /(?^:type: multiple)/
ok 200 - pgbench progress stdout /(?^:clients: 3)/
ok 201 - pgbench progress stdout /(?^:threads: 2)/
ok 202 - pgbench progress stdout /(?^:duration: 2 s)/
ok 203 - pgbench progress stdout /(?^:script 1: .* select only)/
ok 204 - pgbench progress stdout /(?^:script 2: .* select only)/
ok 205 - pgbench progress stdout /(?^:statement latencies in milliseconds)/
ok 206 - pgbench progress stdout /(?^:FROM pgbench_accounts)/
ok 207 - pgbench progress stderr /(?^:vacuum)/
ok 208 - pgbench progress stderr /(?^:progress: 1\b)/
ok 209 - number of log files
ok 210 - file name format
not ok 211 - transaction count for 001_pgbench_log_1.31583 (3)

#   Failed test 'transaction count for 001_pgbench_log_1.31583 (3)'
#   at t/001_pgbench_with_server.pl line 438.
ok 212 - transaction format for 001_pgbench_log_1
ok 213 - transaction count for 001_pgbench_log_1.31583.1 (2)
ok 214 - transaction format for 001_pgbench_log_1
ok 215 - remove log files

Apparently, one of the threads ran 3 transactions where the test script
expects it to run at most 2.  Is this a pgbench bug, or is the test
being overoptimistic about how exact the "-T 2" cutoff is?
        regards, tom lane


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench regression test failure

From
Fabien COELHO
Date:
> francolin just showed a non-reproducing failure in the new pgbench tests:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=francolin&dt=2017-09-12%2014%3A00%3A02

> not ok 211 - transaction count for 001_pgbench_log_1.31583 (3)

> #   Failed test 'transaction count for 001_pgbench_log_1.31583 (3)'
> #   at t/001_pgbench_with_server.pl line 438.
>
> Apparently, one of the threads ran 3 transactions where the test script
> expects it to run at most 2.  Is this a pgbench bug, or is the test
> being overoptimistic about how exact the "-T 2" cutoff is?

Probably both? It seems that cutting off on time is not a precise science,
so I suggest to accept 1, 2 and 3 lines, see attached.

-- 
Fabien.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] pgbench regression test failure

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> Apparently, one of the threads ran 3 transactions where the test script
>> expects it to run at most 2.  Is this a pgbench bug, or is the test
>> being overoptimistic about how exact the "-T 2" cutoff is?

> Probably both? It seems that cutting off on time is not a precise science,
> so I suggest to accept 1, 2 and 3 lines, see attached.

Before I'd deciphered the test output fully, I was actually guessing that
the problem was the opposite, namely too few lines.  Isn't it possible
that some thread is slow enough to start up that it doesn't get to run
any transactions?  IOW, do we need to allow 0 to 3 lines?
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench regression test failure

From
Fabien COELHO
Date:
>>> Apparently, one of the threads ran 3 transactions where the test script
>>> expects it to run at most 2.  Is this a pgbench bug, or is the test
>>> being overoptimistic about how exact the "-T 2" cutoff is?
>
>> Probably both? It seems that cutting off on time is not a precise science,
>> so I suggest to accept 1, 2 and 3 lines, see attached.
>
> Before I'd deciphered the test output fully, I was actually guessing that
> the problem was the opposite, namely too few lines.

The test was waiting for betwen 1 and 2 lines, so I assumed that the 3
should the number of lines found.

> Isn't it possible that some thread is slow enough to start up that it 
> doesn't get to run any transactions?  IOW, do we need to allow 0 to 3 
> lines?

By definition, parallelism induces non determinism. When I put 2 seconds, 
the intention was that I would get a non empty trace with a "every second" 
aggregation. I would rather take a longer test rather than allowing an 
empty file: the point is to check that something is generated, but 
avoiding a longer test is desirable. So I would suggest to stick to 
between 1 and 3, and if it fails then maybe add one second...

-- 
Fabien.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench regression test failure

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
> By definition, parallelism induces non determinism. When I put 2 seconds, 
> the intention was that I would get a non empty trace with a "every second" 
> aggregation. I would rather take a longer test rather than allowing an 
> empty file: the point is to check that something is generated, but 
> avoiding a longer test is desirable. So I would suggest to stick to 
> between 1 and 3, and if it fails then maybe add one second...

That's a losing game.  You can't ever guarantee that N seconds is
enough for slow, heavily loaded machines, and cranking up N just
penalizes developers who are testing under normal circumstances.

I have a serious, serious dislike for tests that seem to work until
they're run on a heavily loaded machine.  So unless there is some
reason why pgbench is *guaranteed* to run at least one transaction
per thread, I'd rather the test not assume that.

I would not necessarily object to doing something in the code that
would guarantee that, though.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench regression test failure

From
Fabien COELHO
Date:
> I have a serious, serious dislike for tests that seem to work until
> they're run on a heavily loaded machine.

I'm not that sure the error message was because of that. ISTM that it was 
rather finding 3 seconds in two because it started just at the right time, 
or maybe because of slowness induce by load and the order in which the 
different checks are performed.

> So unless there is some reason why pgbench is *guaranteed* to run at 
> least one transaction per thread, I'd rather the test not assume that.

Well, pgbench is for testing performance... so if the checks allow zero 
performance that's quite annoying as well:-) The tests are designed to 
require very low performance (eg there are a lot of -t 1 when only one 
transaction is enough to check a point), but maybe some test assume a 
minimal requirement, maybe 10 tps with 2 threads...

> I would not necessarily object to doing something in the code that
> would guarantee that, though.

Hmmm. Interesting point.

There could be a client-side synchronization barrier, eg something like 
"\sync :nclients/nthreads" could be easy enough to implement with pthread, 
and quite error prone to use, but probably that could be okay for 
validation purposes. Or maybe we could expose something at the SQL level, 
eg "SELECT synchro('synchroname', whomanyclientstowait);" which would be 
harder to implement server-side but possibly doable as well.

A simpler option may be to introduce a synchronization barrier at thread 
start, so that all threads start together and that would set the "zero" 
time. Not sure that would solve the potential issue you raise, although 
that would help.

Currently the statistics collection and outputs are performed by thread 0 
in addition to the client it runs, so that pgbench would work even if 
there are no threads, but it also means that under a heavy load some 
things may not be done on the target time but a little bit later, if some 
thread is stuck somewhere. Although the async protocol try to avoid that.

-- 
Fabien.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench regression test failure

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> I have a serious, serious dislike for tests that seem to work until
>> they're run on a heavily loaded machine.

> I'm not that sure the error message was because of that.

No, this particular failure (probably) wasn't.  But now that I've realized
that this test case is timing-sensitive, I'm worried about what will
happen when it's run on a sufficiently slow or loaded machine.

>> I would not necessarily object to doing something in the code that
>> would guarantee that, though.

> Hmmm. Interesting point.

It could be as simple as putting the check-for-done at the bottom of the
loop not the top, perhaps.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench regression test failure

From
Fabien COELHO
Date:
>>> I have a serious, serious dislike for tests that seem to work until
>>> they're run on a heavily loaded machine.
>
>> I'm not that sure the error message was because of that.
>
> No, this particular failure (probably) wasn't.  But now that I've realized
> that this test case is timing-sensitive, I'm worried about what will
> happen when it's run on a sufficiently slow or loaded machine.
>
>>> I would not necessarily object to doing something in the code that
>>> would guarantee that, though.
>
>> Hmmm. Interesting point.
>
> It could be as simple as putting the check-for-done at the bottom of the
> loop not the top, perhaps.

I agree that it is best if tests should work in all reasonable conditions, 
including a somehow overloaded host...

I'm going to think about it, but I'm not sure of the best approach. In the 
mean time, ISTM that the issue has not been encountered (yet), so this is 
not a pressing issue. Maybe under -T > --aggregate-interval pgbench could 
go on over the limit if the log file has not been written at all, but that 
would be some kind of kludge for this specific test...

Note that to get test coverage for -T and have to assume that maybe a 
loaded host would not be able to generate just a one line log every second 
during that time is kind of a hard assumption...

Maybe some test could be "warnings", i.e. it could be acceptable to accept 
a failure once in a while in specific conditions, if this is rare enough 
and documented. ISTM that there is such a test for random output.

-- 
Fabien.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench regression test failure

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> It could be as simple as putting the check-for-done at the bottom of the
>> loop not the top, perhaps.

> I agree that it is best if tests should work in all reasonable conditions, 
> including a somehow overloaded host...

> I'm going to think about it, but I'm not sure of the best approach. In the 
> mean time, ISTM that the issue has not been encountered (yet), so this is 
> not a pressing issue. Maybe under -T > --aggregate-interval pgbench could 
> go on over the limit if the log file has not been written at all, but that 
> would be some kind of kludge for this specific test...

After another week of buildfarm runs, we have a few more cases of 3 rows
of output, and none of more than 3 or less than 1.  So I went ahead and
pushed your patch.  I'm still suspicious of these results, but we might
as well try to make the buildfarm green pending investigation of how
this is happening.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench regression test failure

From
Fabien COELHO
Date:
> [...] After another week of buildfarm runs, we have a few more cases of 
> 3 rows of output, and none of more than 3 or less than 1.  So I went 
> ahead and pushed your patch.  I'm still suspicious of these results, but 
> we might as well try to make the buildfarm green pending investigation 
> of how this is happening.

Yep. I keep the issue of pgbench tap test determinism in my todo list, 
among other things.

I think that it should be at least clearer under which condition (load ? 
luck ? bug ?) the result may be 1 or 3 when 2 are expected, which needs 
some thinking.

-- 
Fabien.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench regression test failure

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> [...] After another week of buildfarm runs, we have a few more cases of
>> 3 rows of output, and none of more than 3 or less than 1.  So I went
>> ahead and pushed your patch.  I'm still suspicious of these results, but
>> we might as well try to make the buildfarm green pending investigation
>> of how this is happening.

> Yep. I keep the issue of pgbench tap test determinism in my todo list,
> among other things.

> I think that it should be at least clearer under which condition (load ?
> luck ? bug ?) the result may be 1 or 3 when 2 are expected, which needs
> some thinking.

skink blew up real good just now:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-09-23%2010%3A50%3A01

the critical bit being

#   Failed test 'pgbench progress stderr /(?^:progress: 1\b)/'
#   at /home/andres/build/buildfarm/HEAD/pgsql.build/../pgsql/src/test/perl/TestLib.pm line 369.
#                   'starting vacuum...end.
# progress: 2.6 s, 6.9 tps, lat 0.000 ms stddev 0.000, lag 0.000 ms, 18 skipped
# progress: 3.0 s, 0.0 tps, lat -nan ms stddev -nan, lag -nan ms, 0 skipped
# progress: 4.0 s, 1.0 tps, lat 2682.730 ms stddev 0.000, lag 985.509 ms, 0 skipped
# '
#     doesn't match '(?^:progress: 1\b)'

#   Failed test 'transaction count for 001_pgbench_log_1.15981 (5)'
#   at t/001_pgbench_with_server.pl line 438.

#   Failed test 'transaction count for 001_pgbench_log_1.15981.1 (4)'
#   at t/001_pgbench_with_server.pl line 438.
# Looks like you failed 3 tests of 233.

That's exceeded my patience with this test case, so I've removed it
for the moment.  We can put it back as soon as we figure some way
to make it more robust.  (BTW, the "-nan" bits suggest an actual
pgbench bug, independently of anything else.)

Possibly you can duplicate skink's issue by running things under
valgrind.
        regards, tom lane


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench regression test failure

From
Fabien COELHO
Date:
Hello Tom,

> # progress: 2.6 s, 6.9 tps, lat 0.000 ms stddev 0.000, lag 0.000 ms, 18 skipped
> # progress: 3.0 s, 0.0 tps, lat -nan ms stddev -nan, lag -nan ms, 0 skipped
> # progress: 4.0 s, 1.0 tps, lat 2682.730 ms stddev 0.000, lag 985.509 ms, 0 skipped

> (BTW, the "-nan" bits suggest an actual pgbench bug, independently of 
> anything else.)

From my point of view, NaN is expected when no test were executed in the 
interval: if there was no transaction, it does not make sense to talk 
about its latency, so NaN is the right answer.

However, the above "6.9 tps, lat 0.000, stddev 0.000, lag 0.000" is 
inconsistent. As "6.9 = 18 / 2.6", it means that progress tps calculation 
should remove skipped transactions...

Attached patch attempts to report more consistent figures in the progress 
and in final report when transactions are skipped.

   sh> cat sleep-100.sql
   \sleep 100 ms
   SELECT 1;

   sh> ./pgbench -P 1 -t 100 -f sleep-100.sql -R 20 -L 1
   [...]
   progress: 1.0 s, 7.0 tps, lat 100.145 ms stddev 0.042, lag 0.000 ms, 16 skipped
   progress: 2.0 s, 6.0 tps, lat 100.133 ms stddev 0.040, lag 0.021 ms, 7 skipped
   progress: 3.0 s, 9.0 tps, lat 100.115 ms stddev 0.016, lag 0.000 ms, 11 skipped
   [...]
   number of transactions actually processed: 38/100
   number of transactions skipped: 62 (62.000 %)
   number of transactions above the 1.0 ms latency limit: 38 (38.000 %)
   latency average = 100.142 ms
   tps = 7.091010 (including connections establishing)
   tps = 7.094144 (excluding connections establishing)
   script statistics:
    - number of transactions skipped: 62 (62.000%)

-- 
Fabien.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] pgbench regression test failure

From
Steve Singer
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested

This causes the pgbench tests to fail (consistently) with

not ok 194 - pgbench late throttling stdout /(?^:processed: [01]/10)/


When I run pgbench manually I get (-t 10 --rate=100000 --latency-limit=1 -n -r)

number of transactions actually processed: 10/10
number of transactions skipped: 10 (100.000 %)

Prior to the patch I was getting.

number of transactions actually processed: 0/10
number of transactions skipped: 10 (100.000 %)



@@ -3539,7 +3542,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,^M       {^M
printf("numberof transactions per client: %d\n", nxacts);^M               printf("number of transactions actually
processed:" INT64_FORMAT "/%d\n",^M
 
-                          total->cnt - total->skipped, nxacts * nclients);^M
+                          total->cnt, nxacts * nclients);^M

I think you want ntx instead of total->cnt here. 




The new status of this patch is: Waiting on Author

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench regression test failure

From
Fabien COELHO
Date:
Hello Steve,

>                printf("number of transactions actually processed: " INT64_FORMAT "/%d\n",
> -                          total->cnt - total->skipped, nxacts * nclients);
> +                          total->cnt, nxacts * nclients);
>
> I think you want ntx instead of total->cnt here.

Indeed... and this is also what my git branch contains... I just sent the 
wrong version, sorry:-( The same fix is also needed in the else branch.

Here is the hopefully right version, which passes tests here.

-- 
Fabien.
Attachment

Re: [HACKERS] pgbench regression test failure

From
Steve Singer
Date:
On Mon, 13 Nov 2017, Fabien COELHO wrote:

>
> Hello Steve,
>
>>                printf("number of transactions actually processed: " 
>> INT64_FORMAT "/%d\n",
>> -                          total->cnt - total->skipped, nxacts * nclients);
>> +                          total->cnt, nxacts * nclients);
>> 
>> I think you want ntx instead of total->cnt here.
>
> Indeed... and this is also what my git branch contains... I just sent the 
> wrong version, sorry:-( The same fix is also needed in the else branch.
>
> Here is the hopefully right version, which passes tests here.

This version seems fine.

I think it is ready for a committer


>
> -- 
> Fabien.



Re: [HACKERS] pgbench regression test failure

From
Fabien COELHO
Date:
Hello Steve,

>> Here is the hopefully right version, which passes tests here.
>
> This version seems fine.
>
> I think it is ready for a committer

Ok, thanks for the debug!

-- 
Fabien.


Re: [HACKERS] pgbench regression test failure

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
> Here is the hopefully right version, which passes tests here.

I took a look at this.  What I get for the "sleep-100" test case posted in
https://postgr.es/m/alpine.DEB.2.20.1709240820160.4999@lancre
with current HEAD is

progress: 1.0 s, 16.0 tps, lat 37.746 ms stddev 48.730, lag 0.039 ms, 10 skipped
progress: 2.0 s, 26.0 tps, lat 27.115 ms stddev 44.672, lag 0.068 ms, 19 skipped
progress: 3.0 s, 25.0 tps, lat 32.184 ms stddev 46.915, lag 0.034 ms, 17 skipped
progress: 4.0 s, 22.0 tps, lat 22.868 ms stddev 42.167, lag 0.033 ms, 17 skipped
transaction type: sleep-100.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 100
number of transactions actually processed: 29/100
number of transactions skipped: 71 (71.000 %)
number of transactions above the 1.0 ms latency limit: 29 (29.000 %)
latency average = 100.636 ms
latency stddev = 0.185 ms
rate limit schedule lag: avg 0.044 (max 0.956) ms
tps = 22.467484 (including connections establishing)
tps = 22.488151 (excluding connections establishing)
script statistics:
 - number of transactions skipped: 0 (-nan%)

The patch changes that to

progress: 1.0 s, 6.0 tps, lat 100.658 ms stddev 0.193, lag 0.101 ms, 10 skipped
progress: 2.0 s, 8.0 tps, lat 100.684 ms stddev 0.264, lag 0.208 ms, 14 skipped
progress: 3.0 s, 8.0 tps, lat 100.576 ms stddev 0.026, lag 0.101 ms, 22 skipped
progress: 4.0 s, 6.0 tps, lat 100.542 ms stddev 0.080, lag 0.102 ms, 10 skipped
transaction type: sleep-100.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 100
number of transactions actually processed: 33/100
number of transactions skipped: 67 (67.000 %)
number of transactions above the 1.0 ms latency limit: 33 (33.000 %)
latency average = 100.613 ms
latency stddev = 0.169 ms
rate limit schedule lag: avg 0.043 (max 0.921) ms
tps = 7.138174 (including connections establishing)
tps = 7.144332 (excluding connections establishing)
script statistics:
 - number of transactions skipped: 67 (67.000%)

So we fixed the reported TPS rate, which was nowhere near reality,
and the per-script stats are sane now.  Good so far, but this
still has two problems IMO:

1. The per-script stats shouldn't be printed at all if there's
only one script.  They're redundant with the overall stats.

2. ISTM that we should report that 100% of the transactions were
above the latency limit, not 33%; that is, the appropriate base
for the "number of transactions above the latency limit" percentage
is the number of actual transactions not the number of scheduled
transactions.

I also noticed that if I specify "-f sleep-100.sql" more than once,
the per-script TPS reports are out of line.  This is evidently because
that calculation isn't excluding skipped xacts; but if we're going to
define tps as excluding skipped xacts, surely we should do so there too.

I'm also still exceedingly unhappy about the NaN business.
You have this comment in printSimpleStats:
    /* print NaN if no transactions where executed */
but I find that unduly optimistic.  It should really read more like
"if no transactions were executed, at best we'll get some platform-
dependent spelling of NaN.  At worst we'll get a SIGFPE."  I do not think
that a pedantic argument about NaN being the "correct" answer justifies
taking such portability risks, even if I bought that argument, which
I don't particularly.  It's also inconsistent with this basic decision in
printResults:

    /* Remaining stats are nonsensical if we failed to execute any xacts */
    if (total->cnt <= 0)
        return;

If we thought NaN was the "correct" answer for no xacts then we'd just
bull ahead and print NaNs all over the place.  I think we would be best
advised to deal with this by printing zeroes in the progress-report case,
and by skipping the output altogether in printSimpleStats.  (Or we could
make printSimpleStats print zeroes; I'm indifferent to that choice.)

Attached is an updated patch addressing these points.  Comments?

            regards, tom lane
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index ceb2fcc..1d4814b 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -2612,7 +2612,7 @@ processXactStats(TState *thread, CState *st, instr_time *now,
         doLog(thread, st, agg, skipped, latency, lag);

     /* XXX could use a mutex here, but we choose not to */
-    if (per_script_stats)
+    if (per_script_stats || latency_limit)
         accumStats(&sql_script[st->use_file].stats, skipped, latency, lag);
 }

@@ -3664,12 +3664,14 @@ addScript(ParsedScript script)
 static void
 printSimpleStats(const char *prefix, SimpleStats *ss)
 {
-    /* print NaN if no transactions where executed */
-    double        latency = ss->sum / ss->count;
-    double        stddev = sqrt(ss->sum2 / ss->count - latency * latency);
+    if (ss->count > 0)
+    {
+        double        latency = ss->sum / ss->count;
+        double        stddev = sqrt(ss->sum2 / ss->count - latency * latency);

-    printf("%s average = %.3f ms\n", prefix, 0.001 * latency);
-    printf("%s stddev = %.3f ms\n", prefix, 0.001 * stddev);
+        printf("%s average = %.3f ms\n", prefix, 0.001 * latency);
+        printf("%s stddev = %.3f ms\n", prefix, 0.001 * stddev);
+    }
 }

 /* print out results */
@@ -3680,11 +3682,14 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
     double        time_include,
                 tps_include,
                 tps_exclude;
+    int64        ntx = total->cnt - total->skipped;

     time_include = INSTR_TIME_GET_DOUBLE(total_time);
-    tps_include = total->cnt / time_include;
-    tps_exclude = total->cnt / (time_include -
-                                (INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients));
+
+    /* tps is about actually executed transactions */
+    tps_include = ntx / time_include;
+    tps_exclude = ntx /
+        (time_include - (INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients));

     /* Report test parameters. */
     printf("transaction type: %s\n",
@@ -3697,13 +3702,13 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
     {
         printf("number of transactions per client: %d\n", nxacts);
         printf("number of transactions actually processed: " INT64_FORMAT "/%d\n",
-               total->cnt - total->skipped, nxacts * nclients);
+               ntx, nxacts * nclients);
     }
     else
     {
         printf("duration: %d s\n", duration);
         printf("number of transactions actually processed: " INT64_FORMAT "\n",
-               total->cnt);
+               ntx);
     }

     /* Remaining stats are nonsensical if we failed to execute any xacts */
@@ -3718,7 +3723,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
     if (latency_limit)
         printf("number of transactions above the %.1f ms latency limit: %d (%.3f %%)\n",
                latency_limit / 1000.0, latency_late,
-               100.0 * latency_late / total->cnt);
+               (ntx > 0) ? 100.0 * latency_late / ntx : 0.0);

     if (throttle_delay || progress || latency_limit)
         printSimpleStats("latency", &total->latency);
@@ -3752,40 +3757,48 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
         for (i = 0; i < num_scripts; i++)
         {
             if (num_scripts > 1)
+            {
+                StatsData  *sstats = &sql_script[i].stats;
+
                 printf("SQL script %d: %s\n"
                        " - weight: %d (targets %.1f%% of total)\n"
                        " - " INT64_FORMAT " transactions (%.1f%% of total, tps = %f)\n",
                        i + 1, sql_script[i].desc,
                        sql_script[i].weight,
                        100.0 * sql_script[i].weight / total_weight,
-                       sql_script[i].stats.cnt,
-                       100.0 * sql_script[i].stats.cnt / total->cnt,
-                       sql_script[i].stats.cnt / time_include);
-            else
-                printf("script statistics:\n");
+                       sstats->cnt,
+                       100.0 * sstats->cnt / total->cnt,
+                       (sstats->cnt - sstats->skipped) / time_include);

-            if (latency_limit)
-                printf(" - number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
-                       sql_script[i].stats.skipped,
-                       100.0 * sql_script[i].stats.skipped / sql_script[i].stats.cnt);
+                if (latency_limit && sstats->cnt > 0)
+                    printf(" - number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n",
+                           sstats->skipped,
+                           100.0 * sstats->skipped / sstats->cnt);

-            if (num_scripts > 1)
-                printSimpleStats(" - latency", &sql_script[i].stats.latency);
+                printSimpleStats(" - latency", &sstats->latency);
+            }

             /* Report per-command latencies */
             if (is_latencies)
             {
                 Command   **commands;

-                printf(" - statement latencies in milliseconds:\n");
+                if (num_scripts > 1)
+                    printf(" - statement latencies in milliseconds:\n");
+                else
+                    printf("statement latencies in milliseconds:\n");

                 for (commands = sql_script[i].commands;
                      *commands != NULL;
                      commands++)
+                {
+                    SimpleStats *cstats = &(*commands)->stats;
+
                     printf("   %11.3f  %s\n",
-                           1000.0 * (*commands)->stats.sum /
-                           (*commands)->stats.count,
+                           (cstats->count > 0) ?
+                           1000.0 * cstats->sum / cstats->count : 0.0,
                            (*commands)->line);
+                }
             }
         }
     }
@@ -4861,7 +4874,8 @@ threadRun(void *arg)
             {
                 /* generate and show report */
                 StatsData    cur;
-                int64        run = now - last_report;
+                int64        run = now - last_report,
+                            ntx;
                 double        tps,
                             total_run,
                             latency,
@@ -4876,7 +4890,7 @@ threadRun(void *arg)
                  * XXX: No locking. There is no guarantee that we get an
                  * atomic snapshot of the transaction count and latencies, so
                  * these figures can well be off by a small amount. The
-                 * progress is report's purpose is to give a quick overview of
+                 * progress report's purpose is to give a quick overview of
                  * how the test is going, so that shouldn't matter too much.
                  * (If a read from a 64-bit integer is not atomic, you might
                  * get a "torn" read and completely bogus latencies though!)
@@ -4890,15 +4904,21 @@ threadRun(void *arg)
                     cur.skipped += thread[i].stats.skipped;
                 }

+                /* we count only actually executed transactions */
+                ntx = (cur.cnt - cur.skipped) - (last.cnt - last.skipped);
                 total_run = (now - thread_start) / 1000000.0;
-                tps = 1000000.0 * (cur.cnt - last.cnt) / run;
-                latency = 0.001 * (cur.latency.sum - last.latency.sum) /
-                    (cur.cnt - last.cnt);
-                sqlat = 1.0 * (cur.latency.sum2 - last.latency.sum2)
-                    / (cur.cnt - last.cnt);
-                stdev = 0.001 * sqrt(sqlat - 1000000.0 * latency * latency);
-                lag = 0.001 * (cur.lag.sum - last.lag.sum) /
-                    (cur.cnt - last.cnt);
+                tps = 1000000.0 * ntx / run;
+                if (ntx > 0)
+                {
+                    latency = 0.001 * (cur.latency.sum - last.latency.sum) / ntx;
+                    sqlat = 1.0 * (cur.latency.sum2 - last.latency.sum2) / ntx;
+                    stdev = 0.001 * sqrt(sqlat - 1000000.0 * latency * latency);
+                    lag = 0.001 * (cur.lag.sum - last.lag.sum) / ntx;
+                }
+                else
+                {
+                    latency = sqlat = stdev = lag = 0;
+                }

                 if (progress_timestamp)
                 {
@@ -4915,7 +4935,10 @@ threadRun(void *arg)
                              (long) tv.tv_sec, (long) (tv.tv_usec / 1000));
                 }
                 else
+                {
+                    /* round seconds are expected, but the thread may be late */
                     snprintf(tbuf, sizeof(tbuf), "%.1f s", total_run);
+                }

                 fprintf(stderr,
                         "progress: %s, %.1f tps, lat %.3f ms stddev %.3f",

Re: [HACKERS] pgbench regression test failure

From
Fabien COELHO
Date:
Hello Tom,

Thanks for having a look at this bug fix.

> So we fixed the reported TPS rate, which was nowhere near reality,
> and the per-script stats are sane now.  Good so far, but this
> still has two problems IMO:
>
> 1. The per-script stats shouldn't be printed at all if there's
> only one script.  They're redundant with the overall stats.

Indeed.

I think the output should tend to be the same for possible automatic 
processing, whether there is one script or more, even at the price of some 
redundancy.

Obviously this is highly debatable.

> 2. ISTM that we should report that 100% of the transactions were
> above the latency limit, not 33%; that is, the appropriate base
> for the "number of transactions above the latency limit" percentage
> is the number of actual transactions not the number of scheduled
> transactions.

Hmmm. Allow me to disagree.

If the user asked for something, then this is the reference to take and 
the whole report should be relative to that. Also, changing this makes the 
report figures not adding up:

before:  number of transactions skipped: 417 (90.260 %)  number of transactions above the 1.0 ms latency limit: 45
(9.740%)
 

Both above percent are on the same reference, fine. Can be added, 100% of 
transactions we required had issues.

after:  number of transactions skipped: 457 (90.855 %)  number of transactions above the 1.0 ms latency limit: 46
(100.000%)
 

Percents do not add up anymore. I do not think that this is an 
improvement.

> I also noticed that if I specify "-f sleep-100.sql" more than once,
> the per-script TPS reports are out of line.  This is evidently because
> that calculation isn't excluding skipped xacts; but if we're going to
> define tps as excluding skipped xacts, surely we should do so there too.

I do not think that we should exclude skipped xacts.

> I'm also still exceedingly unhappy about the NaN business.
> You have this comment in printSimpleStats:
>     /* print NaN if no transactions where executed */
> but I find that unduly optimistic.  It should really read more like
> "if no transactions were executed, at best we'll get some platform-
> dependent spelling of NaN.  At worst we'll get a SIGFPE."

Hmmm. Alas you must be right about spelling. There has been no report of 
SIGFPE issue, so I would not bother with that.

> I do not think that a pedantic argument about NaN being the "correct" 
> answer justifies taking such portability risks, even if I bought that 
> argument, which I don't particularly.

ISTM that the portability risk, i.e. about SIGFPE issue, would require a 
non IEEE 754 conforming box, and there has been no complaint yet about 
that.

> It's also inconsistent with this basic decision in printResults:
>
>     /* Remaining stats are nonsensical if we failed to execute any xacts */
>     if (total->cnt <= 0)
>         return;

According to praise, this decision was by Tom Lane. It is about the whole 
execution not executing anything, which may be seen as a special error 
case, whereas a progress report without xact execution within some period 
can be perfectly normal.

> If we thought NaN was the "correct" answer for no xacts then we'd just
> bull ahead and print NaNs all over the place.

Why not, but for me the issue wrt to the final report is distinct.

> I think we would be best advised to deal with this by printing zeroes in 
> the progress-report case, and by skipping the output altogether in 
> printSimpleStats.  (Or we could make printSimpleStats print zeroes; I'm 
> indifferent to that choice.)

I disagree with printing a zero if the measure is undefined. Zero does not 
mean undefined. "Zero means zero":-)

Maybe we could replace platform-specific printing for "NaN" with something 
else, either a deterministic "NaN" or some other string, eg "-" or "?" or 
whatever. The benefit with sticking to some "NaN" or some platform 
specific NaN is that processing the output can retrieve it as a float.

So basically I would have stayed with the current version which seems 
consistent to me, but I agree that it is debatable. I'm not sure it is 
worth being "exceedingly unhappy" about it, because these are special 
corner cases anyway, not likely to be seen without somehow looking for it 
or having pretty strange pgbench runs.

> Attached is an updated patch addressing these points.  Comments?

I somehow disagree with your points above, for various reasons.

However, you are a committer, and I think best that the bug is fixed even 
if I happen to disagree with unrelated changes. As an academic, I probably 
have a number of silly ideas about a lot of things. This is part of the 
job description:-)

-- 
Fabien.


Re: [HACKERS] pgbench regression test failure

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> 1. The per-script stats shouldn't be printed at all if there's
>> only one script.  They're redundant with the overall stats.

> Indeed.
> I think the output should tend to be the same for possible automatic 
> processing, whether there is one script or more, even at the price of some 
> redundancy.
> Obviously this is highly debatable.

I think that ship has already sailed.  It's certainly silly that the
code prints *only* per-script latency stats, and not all the per-script
stats, when there's just one script.  To me the answer is to make the
latency stats conform to the rest, not make the printout even more
redundant.  None of this output was designed for machine-friendliness.

(Maybe there is an argument for a "--machine-readable-output" switch
that would dump the data in some more-machine-friendly format.  Though
I'm sure we'd have lots of debates about exactly what that is...)

>> 2. ISTM that we should report that 100% of the transactions were
>> above the latency limit, not 33%; that is, the appropriate base
>> for the "number of transactions above the latency limit" percentage
>> is the number of actual transactions not the number of scheduled
>> transactions.

> Hmmm. Allow me to disagree.

I dunno, it just looks odd to me that when we've set up a test case in
which every one of the transactions is guaranteed to exceed the latency
limit, that it doesn't say that they all did.  I don't particularly buy
your assumption that the percentages should sum.  Anybody else have an
opinion there?

>> I also noticed that if I specify "-f sleep-100.sql" more than once,
>> the per-script TPS reports are out of line.  This is evidently because
>> that calculation isn't excluding skipped xacts; but if we're going to
>> define tps as excluding skipped xacts, surely we should do so there too.

> I do not think that we should exclude skipped xacts.

Uh ... why not?

>> I'm also still exceedingly unhappy about the NaN business.
>> You have this comment in printSimpleStats:
>> /* print NaN if no transactions where executed */
>> but I find that unduly optimistic.  It should really read more like
>> "if no transactions were executed, at best we'll get some platform-
>> dependent spelling of NaN.  At worst we'll get a SIGFPE."

> Hmmm. Alas you must be right about spelling. There has been no report of 
> SIGFPE issue, so I would not bother with that.

The core issue here really is that you're assuming IEEE float arithmetic.
We have not gone as far as deciding that Postgres will only run on IEEE
hardware, and I don't want to start in pgbench, especially not in
seldom-exercised corner cases.
        regards, tom lane


Re: [HACKERS] pgbench regression test failure

From
Fabien COELHO
Date:
Hello Tom,

>>> 2. ISTM that we should report that 100% of the transactions were
>>> above the latency limit, not 33%; that is, the appropriate base
>>> for the "number of transactions above the latency limit" percentage
>>> is the number of actual transactions not the number of scheduled
>>> transactions.
>
>> Hmmm. Allow me to disagree.
>
> I dunno, it just looks odd to me that when we've set up a test case in
> which every one of the transactions is guaranteed to exceed the latency
> limit, that it doesn't say that they all did.  I don't particularly buy
> your assumption that the percentages should sum.

This is a side effect. The reason for me is that the user asked for some 
transactions, and the results should be given relative to what was asked.

> Anybody else have an opinion there?

Good question.

>>> I also noticed that if I specify "-f sleep-100.sql" more than once,
>>> the per-script TPS reports are out of line.  This is evidently because
>>> that calculation isn't excluding skipped xacts; but if we're going to
>>> define tps as excluding skipped xacts, surely we should do so there too.
>
>> I do not think that we should exclude skipped xacts.
>
> Uh ... why not?

Because I totally misinterpreted your sentence.

Indeed, the skipped transactions needs to be substracted from the count.
This is yet another bug.

>>> but I find that unduly optimistic.  It should really read more like
>>> "if no transactions were executed, at best we'll get some platform-
>>> dependent spelling of NaN.  At worst we'll get a SIGFPE."
>
>> Hmmm. Alas you must be right about spelling. There has been no report of
>> SIGFPE issue, so I would not bother with that.
>
> The core issue here really is that you're assuming IEEE float arithmetic.
> We have not gone as far as deciding that Postgres will only run on IEEE
> hardware, and I don't want to start in pgbench, especially not in
> seldom-exercised corner cases.

Hmmm. It has already started for some years without complaint. Do as you 
feel about NaN. I can only say that I do not like much having zero to 
stand for undefined.

-- 
Fabien.


Re: [HACKERS] pgbench regression test failure

From
Robert Haas
Date:
On Mon, Nov 20, 2017 at 1:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I dunno, it just looks odd to me that when we've set up a test case in
> which every one of the transactions is guaranteed to exceed the latency
> limit, that it doesn't say that they all did.  I don't particularly buy
> your assumption that the percentages should sum.  Anybody else have an
> opinion there?

I agree with you, but I don't think either approach is free from
possible confusion.  I think it would help to show the numerator and
the denominator explicitly, e.g.:

number of clients: 1
number of threads: 1
number of transactions per client: 100
number of transactions actually processed: 33/100
number of transactions skipped: 67 (67.000 %)
number of transactions above the 1.0 ms latency limit: 33 (33 of 33, 100.000 %)

(My proposed change is in the last line.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] pgbench regression test failure

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Nov 20, 2017 at 1:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I dunno, it just looks odd to me that when we've set up a test case in
>> which every one of the transactions is guaranteed to exceed the latency
>> limit, that it doesn't say that they all did.  I don't particularly buy
>> your assumption that the percentages should sum.  Anybody else have an
>> opinion there?

> I agree with you, but I don't think either approach is free from
> possible confusion.  I think it would help to show the numerator and
> the denominator explicitly, e.g.:

> number of clients: 1
> number of threads: 1
> number of transactions per client: 100
> number of transactions actually processed: 33/100
> number of transactions skipped: 67 (67.000 %)
> number of transactions above the 1.0 ms latency limit: 33 (33 of 33, 100.000 %)

> (My proposed change is in the last line.)

Seems like a good idea, but the way you've written it is inconsistent
with the "n/m" notation used just above.  I'd suggest

... latency limit: 33 (33/33, 100.000 %)

or just

... latency limit: 33/33 (100.000 %)
        regards, tom lane


Re: [HACKERS] pgbench regression test failure

From
Robert Haas
Date:
On Tue, Nov 21, 2017 at 3:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Seems like a good idea, but the way you've written it is inconsistent
> with the "n/m" notation used just above.  I'd suggest
>
> ... latency limit: 33 (33/33, 100.000 %)
>
> or just
>
> ... latency limit: 33/33 (100.000 %)

Oh, yeah.  That last one sounds good; no reason to print the same
value more than once.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] pgbench regression test failure

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Nov 21, 2017 at 3:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> or just
>> ... latency limit: 33/33 (100.000 %)

> Oh, yeah.  That last one sounds good; no reason to print the same
> value more than once.

Sold; I'll go make it so.
        regards, tom lane