Thread: Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: >> After: >> interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency >> { failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries] > I think that the explanatory paragraph is way too long now, particularly > since it explains --failures-detailed starting in the middle. Also, the > example output doesn't include the failures-detailed mode. I think the problem is not merely one of documentation, but one of bad design. Up to now it was possible to tell what was what from counting the number of columns in the output; but with this design, that is impossible. That should be fixed. The first thing you have got to do is drop the alternation { failures | serialization_failures deadlock_failures }. That doesn't make any sense in the first place: counting serialization and deadlock failures doesn't make it impossible for other errors to occur. It'd probably make the most sense to have three columns always, serialization, deadlock and total. Now maybe that change alone is sufficient, but I'm not convinced, because the multiple options at the end of the line mean we will never again be able to add any more columns without reintroducing ambiguity. I would be happier if the syntax diagram were such that columns could only be dropped from right to left. regards, tom lane
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes: >>> After: >>> interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency >>> { failures | serialization_failures deadlock_failures } [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] [ retried retries] > >> I think that the explanatory paragraph is way too long now, particularly >> since it explains --failures-detailed starting in the middle. Also, the >> example output doesn't include the failures-detailed mode. > > I think the problem is not merely one of documentation, but one of > bad design. Up to now it was possible to tell what was what from > counting the number of columns in the output; but with this design, > that is impossible. That should be fixed. The first thing you have > got to do is drop the alternation { failures | serialization_failures > deadlock_failures }. That doesn't make any sense in the first place: > counting serialization and deadlock failures doesn't make it impossible > for other errors to occur. It'd probably make the most sense to have > three columns always, serialization, deadlock and total. +1. > Now maybe > that change alone is sufficient, but I'm not convinced, because the > multiple options at the end of the line mean we will never again be > able to add any more columns without reintroducing ambiguity. I > would be happier if the syntax diagram were such that columns could > only be dropped from right to left. Or those three columns always, sum_lag sum_lag_2, min_lag max_lag, skipped, retried retries? Anyway now that current CF is closing, it will not be possible to change those logging design soon. Or can we change the logging design even after CF is closed? Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
> Or those three columns always, sum_lag sum_lag_2, min_lag max_lag, > skipped, retried retries? > > Anyway now that current CF is closing, it will not be possible to > change those logging design soon. Or can we change the logging design > even after CF is closed? My 0.02€: I'm not sure how the official guidelines are to be interpreted in that case, but if the design is to be changed, ISTM that it is better to do it before a release instead of letting the release out with one format and changing it in the next release? -- Fabien.
>> I think the problem is not merely one of documentation, but one of >> bad design. Up to now it was possible to tell what was what from >> counting the number of columns in the output; but with this design, >> that is impossible. That should be fixed. The first thing you have >> got to do is drop the alternation { failures | serialization_failures >> deadlock_failures }. That doesn't make any sense in the first place: >> counting serialization and deadlock failures doesn't make it impossible >> for other errors to occur. It'd probably make the most sense to have >> three columns always, serialization, deadlock and total. > > +1. > >> Now maybe >> that change alone is sufficient, but I'm not convinced, because the >> multiple options at the end of the line mean we will never again be >> able to add any more columns without reintroducing ambiguity. I >> would be happier if the syntax diagram were such that columns could >> only be dropped from right to left. > > Or those three columns always, sum_lag sum_lag_2, min_lag max_lag, skipped, retried retries? What about this? (a log line is not actually folded) interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency failures serialization_failures deadlock_failures retried retries [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] failures: always 0 (if --max-tries is 1, the default) sum of serialization_failures and deadlock_failures (if --max-tries is not 1) serialization_failures and deadlock_failures: always 0 (if --max-tries is 1, the default) 0 or more (if --max-tries is not 1) retried and retries: always 0 (if --max-tries is 1, the default) 0 or more (if --max-tries is not 1) Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
>> Or those three columns always, sum_lag sum_lag_2, min_lag max_lag, >> skipped, retried retries? > > What about this? (a log line is not actually folded) > interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency > failures serialization_failures deadlock_failures retried retries [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] My 0.02€: I agree that it would be better to have a more deterministic aggregated log format. ISTM that it should skip failures and lags if no fancy options has been selected, i.e.: [ fails ... retries [ sum_lag ... [ skipped ] ] ? Alterlatively, as the failure stuff is added to the format, maybe it could be at the end: [ sum_lag ... [ skipped [ fails ... retries ] ] ] ? > failures: > always 0 (if --max-tries is 1, the default) > sum of serialization_failures and deadlock_failures (if --max-tries is not 1) > > serialization_failures and deadlock_failures: > always 0 (if --max-tries is 1, the default) > 0 or more (if --max-tries is not 1) > > retried and retries: > always 0 (if --max-tries is 1, the default) > 0 or more (if --max-tries is not 1) > > Best reagards, > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese:http://www.sraoss.co.jp > -- Fabien.
>>> Or those three columns always, sum_lag sum_lag_2, min_lag max_lag, >>> skipped, retried retries? >> >> What about this? (a log line is not actually folded) >> interval_start num_transactions sum_latency sum_latency_2 min_latency >> max_latency >> failures serialization_failures deadlock_failures retried retries [ >> sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] > > My 0.02€: > > I agree that it would be better to have a more deterministic > aggregated log format. > > ISTM that it should skip failures and lags if no fancy options has > been selected, i.e.: > > [ fails ... retries [ sum_lag ... [ skipped ] ] ? > > Alterlatively, as the failure stuff is added to the format, maybe it > could be at the end: > > [ sum_lag ... [ skipped [ fails ... retries ] ] ] ? I like this one. interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency [sum_lag sum_lag_2 min_lag max_lag [ skipped [ failures serialization_failures deadlock_failures retried retries ] ] ] Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On 2022-Apr-03, Fabien COELHO wrote: > > What about this? (a log line is not actually folded) > > interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency > > failures serialization_failures deadlock_failures retried retries [ sum_lag sum_lag_2 min_lag max_lag [ skipped ] ] > > My 0.02€: > > I agree that it would be better to have a more deterministic aggregated log > format. > > ISTM that it should skip failures and lags if no fancy options has been > selected, i.e.: > > [ fails ... retries [ sum_lag ... [ skipped ] ] ? I think it's easier to just say "if feature X is not enabled, then columns XYZ are always zeroes". -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
>> My 0.02€: >> >> I agree that it would be better to have a more deterministic aggregated log >> format. >> >> ISTM that it should skip failures and lags if no fancy options has been >> selected, i.e.: >> >> [ fails ... retries [ sum_lag ... [ skipped ] ] ? > > I think it's easier to just say "if feature X is not enabled, then > columns XYZ are always zeroes". Ok, I will come up with a patch in this direction. Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > I think it's easier to just say "if feature X is not enabled, then > columns XYZ are always zeroes". +1, that's pretty much what I was thinking. regards, tom lane
>> I think it's easier to just say "if feature X is not enabled, then >> columns XYZ are always zeroes". > > Ok, I will come up with a patch in this direction. Please find attached patch for this. With the patch, the log line is as follows (actually no line foldings of course): interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency sum_lag sum_lag_2 min_lag max_lag skipped failures serialization_failures deadlock_failures retried retries I updated the doc as well: - fold the log line using line feed to avoid error in rendering PDF. I did not use &zwsp; because it does not enhance HTML output. - split explanation of the log output into multiple paragraphs to enhance readability. - replace the example output with full options are specified. Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index ebdb4b3f46..e1f98ae228 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -2401,7 +2401,9 @@ END; format is used for the log files: <synopsis> -<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable><replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable>{ <replaceable>failures</replaceable> | <replaceable>serialization_failures</replaceable><replaceable>deadlock_failures</replaceable> } <optional> <replaceable>sum_lag</replaceable><replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable><optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried</replaceable><replaceable>retries</replaceable> </optional> +<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable><replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> +<replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable><replaceable>skipped</replaceable> +<replaceable>failures</replaceable> <replaceable>serialization_failures</replaceable> <replaceable>deadlock_failures</replaceable><replaceable>retried</replaceable> <replaceable>retries</replaceable> </synopsis> where @@ -2417,41 +2419,55 @@ END; and <replaceable>max_latency</replaceable> is the maximum latency within the interval, <replaceable>failures</replaceable> is the number of transactions that ended - with a failed SQL command within the interval. If you use option - <option>--failures-detailed</option>, instead of the sum of all failed - transactions you will get more detailed statistics for the failed - transactions grouped by the following types: - <replaceable>serialization_failures</replaceable> is the number of - transactions that got a serialization error and were not retried after this, - <replaceable>deadlock_failures</replaceable> is the number of transactions - that got a deadlock error and were not retried after this. + with a failed SQL command within the interval. + </para> + <para> The next fields, <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>, - and <replaceable>max_lag</replaceable>, are only present if the <option>--rate</option> - option is used. + and <replaceable>max_lag</replaceable>, only meaningful if the <option>--rate</option> + option is used. Otherwise, they are all 0.0. They provide statistics about the time each transaction had to wait for the previous one to finish, i.e., the difference between each transaction's scheduled start time and the time it actually started. The next field, <replaceable>skipped</replaceable>, - is only present if the <option>--latency-limit</option> option is used, too. + is only meaningful if the <option>--latency-limit</option> option is used, too. Otherwise it is 0. It counts the number of transactions skipped because they would have started too late. - The <replaceable>retried</replaceable> and <replaceable>retries</replaceable> - fields are present only if the <option>--max-tries</option> option is not - equal to 1. They report the number of retried transactions and the sum of all - retries after serialization or deadlock errors within the interval. - Each transaction is counted in the interval when it was committed. + </para> + <para> + <replaceable>failures</replaceable> is the sum of all failed transactions. + If <option>--failures-detailed</option> is specified, instead of the sum of + all failed transactions you will get more detailed statistics for the + failed transactions grouped by the following types: + <replaceable>serialization_failures</replaceable> is the number of + transactions that got a serialization error and were not retried after this, + <replaceable>deadlock_failures</replaceable> is the number of transactions + that got a deadlock error and were not retried after this. + If <option>--failures-detailed</option> is not + specified, <replaceable>serialization_failures</replaceable> + and <replaceable>deadlock_failures</replaceable> are always 0. + </para> + <para> + The <replaceable>retried</replaceable> + and <replaceable>retries</replaceable> fields are only meaningful if + the <option>--max-tries</option> option is not equal to 1. Otherwise they + are 0. They report the number of retried transactions and the sum of all + retries after serialization or deadlock errors within the interval. Each + transaction is counted in the interval when it was committed. </para> <para> - Here is some example output: + Here is some example output with following options: <screen> -1345828501 5601 1542744 483552416 61 2573 0 -1345828503 7884 1979812 565806736 60 1479 0 -1345828505 7208 1979422 567277552 59 1391 0 -1345828507 7685 1980268 569784714 60 1398 0 -1345828509 7073 1979779 573489941 236 1411 0 -</screen></para> +pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000 +--latency-limit=10 --failures-detailed --max-tries=10 test +</screen> + +<screen> +1649033235 5398 26186948 170797073304 1051 15347 2471626 6290343560 0 8261 0 3925 3925 0 7524 29534 +1649033245 5651 27345519 210270761637 1011 67780 2480555 6835066067 0 9999 496 3839 3839 0 7533 30118 +</screen> + </para> <para> Notice that while the plain (unaggregated) log file shows which script diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index acf3e56413..aea27b1383 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -4494,6 +4494,17 @@ doLog(TState *thread, CState *st, while ((next = agg->start_time + agg_interval * INT64CONST(1000000)) <= now) { + double lag_sum = 0.0; + double lag_sum2 = 0.0; + double lag_min = 0.0; + double lag_max = 0.0; + int64 skipped = 0; + int64 serialization_failures = 0; + int64 deadlock_failures = 0; + int64 serialization_or_deadlock_failures = 0; + int64 retried = 0; + int64 retries = 0; + /* print aggregated report to logfile */ fprintf(logfile, INT64_FORMAT " " INT64_FORMAT " %.0f %.0f %.0f %.0f", agg->start_time / 1000000, /* seconds since Unix epoch */ @@ -4503,27 +4514,41 @@ doLog(TState *thread, CState *st, agg->latency.min, agg->latency.max); - if (failures_detailed) - fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT, - agg->serialization_failures, - agg->deadlock_failures); - else - fprintf(logfile, " " INT64_FORMAT, getFailures(agg)); - if (throttle_delay) { - fprintf(logfile, " %.0f %.0f %.0f %.0f", - agg->lag.sum, - agg->lag.sum2, - agg->lag.min, - agg->lag.max); - if (latency_limit) - fprintf(logfile, " " INT64_FORMAT, agg->skipped); + lag_sum = agg->lag.sum; + lag_sum2 = agg->lag.sum2; + lag_min = agg->lag.min; + lag_max = agg->lag.max; + } + fprintf(logfile, " %.0f %.0f %.0f %.0f", + lag_sum, + lag_sum2, + lag_min, + lag_max); + + if (latency_limit) + skipped = agg->skipped; + fprintf(logfile, " " INT64_FORMAT, skipped); + + if (failures_detailed) + { + serialization_failures = agg->serialization_failures; + deadlock_failures = agg->deadlock_failures; } + serialization_or_deadlock_failures = serialization_failures + deadlock_failures; + fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT " " INT64_FORMAT, + serialization_or_deadlock_failures, + serialization_failures, + deadlock_failures); + if (max_tries != 1) - fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT, - agg->retried, - agg->retries); + { + retried = agg->retried; + retries = agg->retries; + } + fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT, retried, retries); + fputc('\n', logfile); /* reset data and move to next interval */
Hello Tatsuo-san, > interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency > sum_lag sum_lag_2 min_lag max_lag skipped > failures serialization_failures deadlock_failures retried retries I would suggest to reorder the last chunk to: ... retried retries failures serfail dlfail because I intend to add connection failures handling at some point, and it would make more sense to add the corresponding count at the end with other fails. -- Fabien Coelho - CRI, MINES ParisTech
Hi Fabien, > Hello Tatsuo-san, > >> interval_start num_transactions sum_latency sum_latency_2 min_latency >> max_latency >> sum_lag sum_lag_2 min_lag max_lag skipped >> failures serialization_failures deadlock_failures retried retries > > I would suggest to reorder the last chunk to: > > ... retried retries failures serfail dlfail > > because I intend to add connection failures handling at some point, > and it would make more sense to add the corresponding count at the end > with other fails. Ok, I have adjusted the patch. V2 patch attached. Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index ebdb4b3f46..d1818ff316 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -2401,7 +2401,9 @@ END; format is used for the log files: <synopsis> -<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable><replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable>{ <replaceable>failures</replaceable> | <replaceable>serialization_failures</replaceable><replaceable>deadlock_failures</replaceable> } <optional> <replaceable>sum_lag</replaceable><replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable><optional> <replaceable>skipped</replaceable> </optional> </optional> <optional> <replaceable>retried</replaceable><replaceable>retries</replaceable> </optional> +<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable><replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> +<replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable><replaceable>skipped</replaceable> +<replaceable>retried</replaceable> <replaceable>retries</replaceable> <replaceable>failures</replaceable> <replaceable>serialization_failures</replaceable><replaceable>deadlock_failures</replaceable> </synopsis> where @@ -2417,41 +2419,55 @@ END; and <replaceable>max_latency</replaceable> is the maximum latency within the interval, <replaceable>failures</replaceable> is the number of transactions that ended - with a failed SQL command within the interval. If you use option - <option>--failures-detailed</option>, instead of the sum of all failed - transactions you will get more detailed statistics for the failed - transactions grouped by the following types: - <replaceable>serialization_failures</replaceable> is the number of - transactions that got a serialization error and were not retried after this, - <replaceable>deadlock_failures</replaceable> is the number of transactions - that got a deadlock error and were not retried after this. + with a failed SQL command within the interval. + </para> + <para> The next fields, <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>, - and <replaceable>max_lag</replaceable>, are only present if the <option>--rate</option> - option is used. + and <replaceable>max_lag</replaceable>, only meaningful if the <option>--rate</option> + option is used. Otherwise, they are all 0.0. They provide statistics about the time each transaction had to wait for the previous one to finish, i.e., the difference between each transaction's scheduled start time and the time it actually started. The next field, <replaceable>skipped</replaceable>, - is only present if the <option>--latency-limit</option> option is used, too. + is only meaningful if the <option>--latency-limit</option> option is used, too. Otherwise it is 0. It counts the number of transactions skipped because they would have started too late. - The <replaceable>retried</replaceable> and <replaceable>retries</replaceable> - fields are present only if the <option>--max-tries</option> option is not - equal to 1. They report the number of retried transactions and the sum of all - retries after serialization or deadlock errors within the interval. - Each transaction is counted in the interval when it was committed. + </para> + <para> + The <replaceable>retried</replaceable> + and <replaceable>retries</replaceable> fields are only meaningful if + the <option>--max-tries</option> option is not equal to 1. Otherwise they + are 0. They report the number of retried transactions and the sum of all + retries after serialization or deadlock errors within the interval. Each + transaction is counted in the interval when it was committed. + </para> + <para> + <replaceable>failures</replaceable> is the sum of all failed transactions. + If <option>--failures-detailed</option> is specified, instead of the sum of + all failed transactions you will get more detailed statistics for the + failed transactions grouped by the following types: + <replaceable>serialization_failures</replaceable> is the number of + transactions that got a serialization error and were not retried after this, + <replaceable>deadlock_failures</replaceable> is the number of transactions + that got a deadlock error and were not retried after this. + If <option>--failures-detailed</option> is not + specified, <replaceable>serialization_failures</replaceable> + and <replaceable>deadlock_failures</replaceable> are always 0. </para> <para> - Here is some example output: + Here is some example output with following options: <screen> -1345828501 5601 1542744 483552416 61 2573 0 -1345828503 7884 1979812 565806736 60 1479 0 -1345828505 7208 1979422 567277552 59 1391 0 -1345828507 7685 1980268 569784714 60 1398 0 -1345828509 7073 1979779 573489941 236 1411 0 -</screen></para> +pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000 +--latency-limit=10 --failures-detailed --max-tries=10 test +</screen> + +<screen> +1649114136 5815 27552565 177846919143 1078 21716 2756787 7264696105 0 9661 0 7854 31472 4022 4022 0 +1649114146 5958 28460110 182785513108 1083 20391 2539395 6411761497 0 7268 0 8127 32595 4101 4101 0 +</screen> + </para> <para> Notice that while the plain (unaggregated) log file shows which script diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index acf3e56413..4d4b979e4f 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -4494,6 +4494,17 @@ doLog(TState *thread, CState *st, while ((next = agg->start_time + agg_interval * INT64CONST(1000000)) <= now) { + double lag_sum = 0.0; + double lag_sum2 = 0.0; + double lag_min = 0.0; + double lag_max = 0.0; + int64 skipped = 0; + int64 serialization_failures = 0; + int64 deadlock_failures = 0; + int64 serialization_or_deadlock_failures = 0; + int64 retried = 0; + int64 retries = 0; + /* print aggregated report to logfile */ fprintf(logfile, INT64_FORMAT " " INT64_FORMAT " %.0f %.0f %.0f %.0f", agg->start_time / 1000000, /* seconds since Unix epoch */ @@ -4503,27 +4514,41 @@ doLog(TState *thread, CState *st, agg->latency.min, agg->latency.max); - if (failures_detailed) - fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT, - agg->serialization_failures, - agg->deadlock_failures); - else - fprintf(logfile, " " INT64_FORMAT, getFailures(agg)); - if (throttle_delay) { - fprintf(logfile, " %.0f %.0f %.0f %.0f", - agg->lag.sum, - agg->lag.sum2, - agg->lag.min, - agg->lag.max); - if (latency_limit) - fprintf(logfile, " " INT64_FORMAT, agg->skipped); + lag_sum = agg->lag.sum; + lag_sum2 = agg->lag.sum2; + lag_min = agg->lag.min; + lag_max = agg->lag.max; } + fprintf(logfile, " %.0f %.0f %.0f %.0f", + lag_sum, + lag_sum2, + lag_min, + lag_max); + + if (latency_limit) + skipped = agg->skipped; + fprintf(logfile, " " INT64_FORMAT, skipped); + if (max_tries != 1) - fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT, - agg->retried, - agg->retries); + { + retried = agg->retried; + retries = agg->retries; + } + fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT, retried, retries); + + if (failures_detailed) + { + serialization_failures = agg->serialization_failures; + deadlock_failures = agg->deadlock_failures; + } + serialization_or_deadlock_failures = serialization_failures + deadlock_failures; + fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT " " INT64_FORMAT, + serialization_or_deadlock_failures, + serialization_failures, + deadlock_failures); + fputc('\n', logfile); /* reset data and move to next interval */
>> I would suggest to reorder the last chunk to: >> >> ... retried retries failures serfail dlfail >> >> because I intend to add connection failures handling at some point, >> and it would make more sense to add the corresponding count at the end >> with other fails. > > Ok, I have adjusted the patch. V2 patch attached. Patch pushed. Thanks. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Tatsuo Ishii <ishii@sraoss.co.jp> writes: > Patch pushed. Thanks. The buildfarm is still complaining about the synopsis being too wide for PDF format. I think what we ought to do is give up on using a <synopsis> for log lines at all, and instead convert the documentation into a tabular list of fields. Proposal attached, which also fixes a couple of outright errors. One thing that this doesn't fix is that the existing text appears to suggest that the "failures" column is something different from the sum of the serialization_failures and deadlock_failures columns, which it's obvious from the code is not so. If this isn't a code bug then I think we ought to just drop that column entirely, because it's redundant. (BTW, now that I've read this stuff I am quite horrified by how the non-aggregated log format has been mangled for error retries, and will be probably be submitting a proposal to change that. But that's a different issue.) regards, tom lane diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 9ba26e5e86..d12cbaa8ab 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -2289,33 +2289,95 @@ END; </para> <para> - The format of the log is: - -<synopsis> -<replaceable>client_id</replaceable> <replaceable>transaction_no</replaceable> <replaceable>time</replaceable> <replaceable>script_no</replaceable><replaceable>time_epoch</replaceable> <replaceable>time_us</replaceable> <optional> <replaceable>schedule_lag</replaceable></optional> <optional> <replaceable>retries</replaceable> </optional> -</synopsis> - - where - <replaceable>client_id</replaceable> indicates which client session ran the transaction, - <replaceable>transaction_no</replaceable> counts how many transactions have been - run by that session, - <replaceable>time</replaceable> is the total elapsed transaction time in microseconds, - <replaceable>script_no</replaceable> identifies which script file was used (useful when - multiple scripts were specified with <option>-f</option> or <option>-b</option>), - and <replaceable>time_epoch</replaceable>/<replaceable>time_us</replaceable> are a - Unix-epoch time stamp and an offset - in microseconds (suitable for creating an ISO 8601 - time stamp with fractional seconds) showing when - the transaction completed. - The <replaceable>schedule_lag</replaceable> field is the difference between the - transaction's scheduled start time, and the time it actually started, in - microseconds. It is only present when the <option>--rate</option> option is used. + Each line in a log file describes one transaction. + It contains the following space-separated fields: + + <variablelist> + <varlistentry> + <term><replaceable>client_id</replaceable></term> + <listitem> + <para> + identifies the client session that ran the transaction + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>transaction_no</replaceable></term> + <listitem> + <para> + counts how many transactions have been run by that session + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>time</replaceable></term> + <listitem> + <para> + transaction's elapsed time, in microseconds + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>script_no</replaceable></term> + <listitem> + <para> + identifies the script file that was used for the transaction + (useful when multiple scripts are specified + with <option>-f</option> or <option>-b</option>) + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>time_epoch</replaceable></term> + <listitem> + <para> + transaction's completion time, as a Unix-epoch time stamp + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>time_us</replaceable></term> + <listitem> + <para> + fractional-second part of transaction's completion time, in + microseconds + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>schedule_lag</replaceable></term> + <listitem> + <para> + difference between the transaction's scheduled start time and the + time it actually started, in microseconds + (present only if <option>--rate</option> is specified) + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>retries</replaceable></term> + <listitem> + <para> + count of retries after serialization or deadlock errors during the + transaction + (present only if <option>--max-tries</option> is not equal to one) + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + + <para> When both <option>--rate</option> and <option>--latency-limit</option> are used, the <replaceable>time</replaceable> for a skipped transaction will be reported as <literal>skipped</literal>. - <replaceable>retries</replaceable> is the sum of all retries after the - serialization or deadlock errors during the current script execution. It is - present only if the <option>--max-tries</option> option is not equal to 1. If the transaction ends with a failure, its <replaceable>time</replaceable> will be reported as <literal>failed</literal>. If you use the <option>--failures-detailed</option> option, the @@ -2398,66 +2460,171 @@ END; <para> With the <option>--aggregate-interval</option> option, a different - format is used for the log files: - -<synopsis> -<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable> <replaceable>sum_latency</replaceable><replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable> <replaceable>max_latency</replaceable> -<replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable> <replaceable>max_lag</replaceable><replaceable>skipped</replaceable> -<replaceable>retried</replaceable> <replaceable>retries</replaceable> <replaceable>failures</replaceable> <replaceable>serialization_failures</replaceable><replaceable>deadlock_failures</replaceable> -</synopsis> - - where - <replaceable>interval_start</replaceable> is the start of the interval (as a Unix - epoch time stamp), - <replaceable>num_transactions</replaceable> is the number of transactions - within the interval, - <replaceable>sum_latency</replaceable> is the sum of the transaction - latencies within the interval, - <replaceable>sum_latency_2</replaceable> is the sum of squares of the - transaction latencies within the interval, - <replaceable>min_latency</replaceable> is the minimum latency within the interval, - and - <replaceable>max_latency</replaceable> is the maximum latency within the interval, - <replaceable>failures</replaceable> is the number of transactions that ended - with a failed SQL command within the interval. - </para> - <para> - The next fields, - <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>, - and <replaceable>max_lag</replaceable>, only meaningful if the <option>--rate</option> - option is used. Otherwise, they are all 0.0. - They provide statistics about the time each transaction had to wait for the - previous one to finish, i.e., the difference between each transaction's - scheduled start time and the time it actually started. - The next field, <replaceable>skipped</replaceable>, - is only meaningful if the <option>--latency-limit</option> option is used, too. Otherwise it is 0. - It counts the number of transactions skipped because they would have - started too late. - </para> - <para> - The <replaceable>retried</replaceable> - and <replaceable>retries</replaceable> fields are only meaningful if - the <option>--max-tries</option> option is not equal to 1. Otherwise they - are 0. They report the number of retried transactions and the sum of all - retries after serialization or deadlock errors within the interval. Each - transaction is counted in the interval when it was committed. - </para> - <para> - <replaceable>failures</replaceable> is the sum of all failed transactions. - If <option>--failures-detailed</option> is specified, instead of the sum of - all failed transactions you will get more detailed statistics for the - failed transactions grouped by the following types: - <replaceable>serialization_failures</replaceable> is the number of - transactions that got a serialization error and were not retried after this, - <replaceable>deadlock_failures</replaceable> is the number of transactions - that got a deadlock error and were not retried after this. - If <option>--failures-detailed</option> is not - specified, <replaceable>serialization_failures</replaceable> - and <replaceable>deadlock_failures</replaceable> are always 0. + format is used for the log files. Each log line describes one + aggregation interval. It contains the following space-separated + fields: + + <variablelist> + <varlistentry> + <term><replaceable>interval_start</replaceable></term> + <listitem> + <para> + start time of the interval, as a Unix-epoch time stamp + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>num_transactions</replaceable></term> + <listitem> + <para> + number of transactions within the interval + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>sum_latency</replaceable></term> + <listitem> + <para> + sum of transaction latencies + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>sum_latency_2</replaceable></term> + <listitem> + <para> + sum of squares of transaction latencies + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>min_latency</replaceable></term> + <listitem> + <para> + minimum transaction latency + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>max_latency</replaceable></term> + <listitem> + <para> + maximum transaction latency + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>sum_lag</replaceable></term> + <listitem> + <para> + sum of transaction start delays + (zero unless <option>--rate</option> is specified) + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>sum_lag_2</replaceable></term> + <listitem> + <para> + sum of squares of transaction start delays + (zero unless <option>--rate</option> is specified) + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>min_lag</replaceable></term> + <listitem> + <para> + minimum transaction start delay + (zero unless <option>--rate</option> is specified) + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>max_lag</replaceable></term> + <listitem> + <para> + maximum transaction start delay + (zero unless <option>--rate</option> is specified) + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>skipped</replaceable></term> + <listitem> + <para> + number of transactions skipped because they would have started too late + (zero unless <option>--rate</option> + and <option>--latency-limit</option> are specified) + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>retried</replaceable></term> + <listitem> + <para> + number of retried transactions + (zero unless <option>--max-tries</option> is not equal to one) + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>retries</replaceable></term> + <listitem> + <para> + number of retries after serialization or deadlock errors + (zero unless <option>--max-tries</option> is not equal to one) + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>failures</replaceable></term> + <listitem> + <para> + number of transactions that ended with a failed SQL command + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>serialization_failures</replaceable></term> + <listitem> + <para> + number of transactions that got a serialization error and were not + retried afterwards + (zero unless <option>--failures-detailed</option> is specified) + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><replaceable>deadlock_failures</replaceable></term> + <listitem> + <para> + number of transactions that got a deadlock error and were not + retried afterwards + (zero unless <option>--failures-detailed</option> is specified) + </para> + </listitem> + </varlistentry> + </variablelist> </para> <para> - Here is some example output with following options: + Here is some example output generated with these options: <screen> <userinput>pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000 --latency-limit=10 --failures-detailed --max-tries=10 test</userinput> @@ -2468,8 +2635,8 @@ END; </para> <para> - Notice that while the plain (unaggregated) log file shows which script - was used for each transaction, the aggregated log does not. Therefore if + Notice that while the plain (unaggregated) log format shows which script + was used for each transaction, the aggregated format does not. Therefore if you need per-script data, you need to aggregate the data on your own. </para>
> The buildfarm is still complaining about the synopsis being too > wide for PDF format. I think what we ought to do is give up on > using a <synopsis> for log lines at all, and instead convert the > documentation into a tabular list of fields. Proposal attached, > which also fixes a couple of outright errors. Once I thought about that too. Looks good to me. > One thing that this doesn't fix is that the existing text appears > to suggest that the "failures" column is something different from > the sum of the serialization_failures and deadlock_failures > columns, which it's obvious from the code is not so. If this isn't > a code bug then I think we ought to just drop that column entirely, > because it's redundant. +1. > (BTW, now that I've read this stuff I am quite horrified by how > the non-aggregated log format has been mangled for error retries, > and will be probably be submitting a proposal to change that. > But that's a different issue.) Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Hello Tom, > The buildfarm is still complaining about the synopsis being too > wide for PDF format. I think what we ought to do is give up on > using a <synopsis> for log lines at all, and instead convert the > documentation into a tabular list of fields. Proposal attached, > which also fixes a couple of outright errors. Looks ok. Html doc generation ok. While looking at the html outpout, the "pgbench" command line just below wraps strangely: pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000 --latency-limit=10 --failures-detailed --max-tries=10 test ISTM that there should be no nl in the <textinput>pgbench …</textinput> section, although maybe it would trigger a complaint in the pdf format. > One thing that this doesn't fix is that the existing text appears > to suggest that the "failures" column is something different from > the sum of the serialization_failures and deadlock_failures > columns, which it's obvious from the code is not so. If this isn't > a code bug then I think we ought to just drop that column entirely, > because it's redundant. Ok. Fine with me. Possibly at some point there was the idea that there could be other failures counted, but there are none. Also, there has been questions about the failures detailed option, or whether the reports should always be detailed, and the result may be some kind of not convincing compromise. > (BTW, now that I've read this stuff I am quite horrified by how > the non-aggregated log format has been mangled for error retries, > and will be probably be submitting a proposal to change that. > But that's a different issue.) Indeed, any improvement is welcome! -- Fabien.
Fabien COELHO <coelho@cri.ensmp.fr> writes: > While looking at the html outpout, the "pgbench" command line just below > wraps strangely: > pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000 > --latency-limit=10 --failures-detailed --max-tries=10 test > ISTM that there should be no nl in the <textinput>pgbench …</textinput> > section, although maybe it would trigger a complaint in the pdf format. PDF wraps that text where it wants to anyway, so I removed the newline. regards, tom lane
>> One thing that this doesn't fix is that the existing text appears >> to suggest that the "failures" column is something different from >> the sum of the serialization_failures and deadlock_failures >> columns, which it's obvious from the code is not so. If this isn't >> a code bug then I think we ought to just drop that column entirely, >> because it's redundant. > > Ok. Fine with me. Possibly at some point there was the idea that there > could be other failures counted, but there are none. Also, there has > been questions about the failures detailed option, or whether the > reports should always be detailed, and the result may be some kind of > not convincing compromise. Attached is the patch to remove the column. Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index 387a836287..dbae4e2321 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -2591,15 +2591,6 @@ END; </listitem> </varlistentry> - <varlistentry> - <term><replaceable>failures</replaceable></term> - <listitem> - <para> - number of transactions that ended with a failed SQL command - </para> - </listitem> - </varlistentry> - <varlistentry> <term><replaceable>serialization_failures</replaceable></term> <listitem> @@ -2629,8 +2620,8 @@ END; <screen> <userinput>pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000 --latency-limit=10 --failures-detailed--max-tries=10 test</userinput> -1649114136 5815 27552565 177846919143 1078 21716 2756787 7264696105 0 9661 0 7854 31472 4022 4022 0 -1649114146 5958 28460110 182785513108 1083 20391 2539395 6411761497 0 7268 0 8127 32595 4101 4101 0 +1650260552 5178 26171317 177284491527 1136 44462 2647617 7321113867 0 9866 64 7564 28340 4148 0 +1650260562 4808 25573984 220121792172 1171 62083 3037380 9666800914 0 9998 598 7392 26621 4527 0 </screen> </para> diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index e63cea56a1..f8bcb1ab6d 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -4498,7 +4498,6 @@ doLog(TState *thread, CState *st, int64 skipped = 0; int64 serialization_failures = 0; int64 deadlock_failures = 0; - int64 serialization_or_deadlock_failures = 0; int64 retried = 0; int64 retries = 0; @@ -4540,9 +4539,7 @@ doLog(TState *thread, CState *st, serialization_failures = agg->serialization_failures; deadlock_failures = agg->deadlock_failures; } - serialization_or_deadlock_failures = serialization_failures + deadlock_failures; - fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT " " INT64_FORMAT, - serialization_or_deadlock_failures, + fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT, serialization_failures, deadlock_failures);
>> Ok. Fine with me. Possibly at some point there was the idea that there >> could be other failures counted, but there are none. Also, there has >> been questions about the failures detailed option, or whether the >> reports should always be detailed, and the result may be some kind of >> not convincing compromise. > > Attached is the patch to remove the column. Patch applies cleanly. Compilation ok. Global and local "make check" ok. Doc build ok. It find it a little annoying that there is no change in tests, it means that the format is not checked at all:-( -- Fabien.
>>> Ok. Fine with me. Possibly at some point there was the idea that there >>> could be other failures counted, but there are none. Also, there has >>> been questions about the failures detailed option, or whether the >>> reports should always be detailed, and the result may be some kind of >>> not convincing compromise. >> >> Attached is the patch to remove the column. > > Patch applies cleanly. Compilation ok. Global and local "make check" > ok. > Doc build ok. Thank you for reviewing. Patch pushed. > It find it a little annoying that there is no change in tests, it > means that the format is not checked at all:-( Yeah. Perhaps it's a little bit hard to perform this kind of tests in the TAP test? Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
>> It find it a little annoying that there is no change in tests, it >> means that the format is not checked at all:-( > > Yeah. Perhaps it's a little bit hard to perform this kind of tests in > the TAP test? Not really. I'll look into it. -- Fabien.