Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Date
Msg-id 20220404.101431.333095021655639605.t-ishii@sranhm.sra.co.jp
Whole thread Raw
In response to Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors  (Tatsuo Ishii <ishii@sraoss.co.jp>)
Responses Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
>> I think it's easier to just say "if feature X is not enabled, then
>> columns XYZ are always zeroes".
> 
> Ok, I will come up with a patch in this direction.

Please find attached patch for this.

With the patch, the log line is as follows (actually no line foldings of course): 

interval_start num_transactions sum_latency sum_latency_2 min_latency max_latency
sum_lag sum_lag_2 min_lag max_lag skipped
failures serialization_failures deadlock_failures retried retries

I updated the doc as well:

- fold the log line using line feed to avoid error in rendering PDF. I
  did not use &zwsp; because it does not enhance HTML output.

- split explanation of the log output into multiple paragraphs to
  enhance readability.

- replace the example output with full options are specified.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ebdb4b3f46..e1f98ae228 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -2401,7 +2401,9 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable>
<replaceable>sum_latency</replaceable><replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable>
<replaceable>max_latency</replaceable>{ <replaceable>failures</replaceable> |
<replaceable>serialization_failures</replaceable><replaceable>deadlock_failures</replaceable> } <optional>
<replaceable>sum_lag</replaceable><replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable>
<replaceable>max_lag</replaceable><optional> <replaceable>skipped</replaceable> </optional> </optional> <optional>
<replaceable>retried</replaceable><replaceable>retries</replaceable> </optional>
 
+<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable>
<replaceable>sum_latency</replaceable><replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable>
<replaceable>max_latency</replaceable>
+<replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable>
<replaceable>max_lag</replaceable><replaceable>skipped</replaceable>
 
+<replaceable>failures</replaceable> <replaceable>serialization_failures</replaceable>
<replaceable>deadlock_failures</replaceable><replaceable>retried</replaceable> <replaceable>retries</replaceable>
 
 </synopsis>
 
    where
@@ -2417,41 +2419,55 @@ END;
    and
    <replaceable>max_latency</replaceable> is the maximum latency within the interval,
    <replaceable>failures</replaceable> is the number of transactions that ended
-   with a failed SQL command within the interval. If you use option
-   <option>--failures-detailed</option>, instead of the sum of all failed
-   transactions you will get more detailed statistics for the failed
-   transactions grouped by the following types:
-   <replaceable>serialization_failures</replaceable> is the number of
-   transactions that got a serialization error and were not retried after this,
-   <replaceable>deadlock_failures</replaceable> is the number of transactions
-   that got a deadlock error and were not retried after this.
+   with a failed SQL command within the interval.
+  </para>
+  <para>
    The next fields,
    <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>,
-   and <replaceable>max_lag</replaceable>, are only present if the <option>--rate</option>
-   option is used.
+   and <replaceable>max_lag</replaceable>, only meaningful if the <option>--rate</option>
+   option is used. Otherwise, they are all 0.0.
    They provide statistics about the time each transaction had to wait for the
    previous one to finish, i.e., the difference between each transaction's
    scheduled start time and the time it actually started.
    The next field, <replaceable>skipped</replaceable>,
-   is only present if the <option>--latency-limit</option> option is used, too.
+   is only meaningful if the <option>--latency-limit</option> option is used, too. Otherwise it is 0.
    It counts the number of transactions skipped because they would have
    started too late.
-   The <replaceable>retried</replaceable> and <replaceable>retries</replaceable>
-   fields are present only if the <option>--max-tries</option> option is not
-   equal to 1. They report the number of retried transactions and the sum of all
-   retries after serialization or deadlock errors within the interval.
-   Each transaction is counted in the interval when it was committed.
+  </para>
+  <para>
+   <replaceable>failures</replaceable> is the sum of all failed transactions.
+   If <option>--failures-detailed</option> is specified, instead of the sum of
+   all failed transactions you will get more detailed statistics for the
+   failed transactions grouped by the following types:
+   <replaceable>serialization_failures</replaceable> is the number of
+   transactions that got a serialization error and were not retried after this,
+   <replaceable>deadlock_failures</replaceable> is the number of transactions
+   that got a deadlock error and were not retried after this.
+   If <option>--failures-detailed</option> is not
+   specified, <replaceable>serialization_failures</replaceable>
+   and <replaceable>deadlock_failures</replaceable> are always 0.
+  </para>
+  <para>
+   The <replaceable>retried</replaceable>
+   and <replaceable>retries</replaceable> fields are only meaningful if
+   the <option>--max-tries</option> option is not equal to 1. Otherwise they
+   are 0. They report the number of retried transactions and the sum of all
+   retries after serialization or deadlock errors within the interval.  Each
+   transaction is counted in the interval when it was committed.
   </para>
 
   <para>
-   Here is some example output:
+   Here is some example output with following options:
 <screen>
-1345828501 5601 1542744 483552416 61 2573 0
-1345828503 7884 1979812 565806736 60 1479 0
-1345828505 7208 1979422 567277552 59 1391 0
-1345828507 7685 1980268 569784714 60 1398 0
-1345828509 7073 1979779 573489941 236 1411 0
-</screen></para>
+pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000
+--latency-limit=10 --failures-detailed --max-tries=10 test
+</screen>
+
+<screen>
+1649033235 5398 26186948 170797073304 1051 15347 2471626 6290343560 0 8261 0 3925 3925 0 7524 29534
+1649033245 5651 27345519 210270761637 1011 67780 2480555 6835066067 0 9999 496 3839 3839 0 7533 30118
+</screen>
+  </para>
 
   <para>
    Notice that while the plain (unaggregated) log file shows which script
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index acf3e56413..aea27b1383 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -4494,6 +4494,17 @@ doLog(TState *thread, CState *st,
 
         while ((next = agg->start_time + agg_interval * INT64CONST(1000000)) <= now)
         {
+            double        lag_sum = 0.0;
+            double        lag_sum2 = 0.0;
+            double        lag_min = 0.0;
+            double        lag_max = 0.0;
+            int64        skipped = 0;
+            int64        serialization_failures = 0;
+            int64        deadlock_failures = 0;
+            int64        serialization_or_deadlock_failures = 0;
+            int64        retried = 0;
+            int64        retries = 0;
+
             /* print aggregated report to logfile */
             fprintf(logfile, INT64_FORMAT " " INT64_FORMAT " %.0f %.0f %.0f %.0f",
                     agg->start_time / 1000000,    /* seconds since Unix epoch */
@@ -4503,27 +4514,41 @@ doLog(TState *thread, CState *st,
                     agg->latency.min,
                     agg->latency.max);
 
-            if (failures_detailed)
-                fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
-                        agg->serialization_failures,
-                        agg->deadlock_failures);
-            else
-                fprintf(logfile, " " INT64_FORMAT, getFailures(agg));
-
             if (throttle_delay)
             {
-                fprintf(logfile, " %.0f %.0f %.0f %.0f",
-                        agg->lag.sum,
-                        agg->lag.sum2,
-                        agg->lag.min,
-                        agg->lag.max);
-                if (latency_limit)
-                    fprintf(logfile, " " INT64_FORMAT, agg->skipped);
+                lag_sum = agg->lag.sum;
+                lag_sum2 = agg->lag.sum2;
+                lag_min = agg->lag.min;
+                lag_max = agg->lag.max;
+            }
+            fprintf(logfile, " %.0f %.0f %.0f %.0f",
+                        lag_sum,
+                        lag_sum2,
+                        lag_min,
+                        lag_max);
+
+            if (latency_limit)
+                skipped = agg->skipped;
+            fprintf(logfile, " " INT64_FORMAT, skipped);
+
+            if (failures_detailed)
+            {
+                serialization_failures = agg->serialization_failures;
+                deadlock_failures = agg->deadlock_failures;
             }
+            serialization_or_deadlock_failures = serialization_failures + deadlock_failures;
+            fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT " " INT64_FORMAT,
+                    serialization_or_deadlock_failures,
+                    serialization_failures,
+                    deadlock_failures);
+
             if (max_tries != 1)
-                fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
-                        agg->retried,
-                        agg->retries);
+            {
+                retried = agg->retried;
+                retries = agg->retries;
+            }
+            fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT, retried, retries);
+
             fputc('\n', logfile);
 
             /* reset data and move to next interval */

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: JSON constructors and window functions
Next
From: "David G. Johnston"
Date:
Subject: Re: unlogged sequences