Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors - Mailing list pgsql-hackers

From Tatsuo Ishii
Subject Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Date
Msg-id 20220405.083805.1138012899895218258.t-ishii@sranhm.sra.co.jp
Whole thread Raw
In response to Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors  (Tatsuo Ishii <ishii@sraoss.co.jp>)
List pgsql-hackers
Hi Fabien,

> Hello Tatsuo-san,
> 
>> interval_start num_transactions sum_latency sum_latency_2 min_latency
>> max_latency
>> sum_lag sum_lag_2 min_lag max_lag skipped
>> failures serialization_failures deadlock_failures retried retries
> 
> I would suggest to reorder the last chunk to:
> 
>    ... retried retries failures serfail dlfail
> 
> because I intend to add connection failures handling at some point,
> and it would make more sense to add the corresponding count at the end
> with other fails.

Ok, I have adjusted the patch. V2 patch attached.

Best reagards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ebdb4b3f46..d1818ff316 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -2401,7 +2401,9 @@ END;
    format is used for the log files:
 
 <synopsis>
-<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable>
<replaceable>sum_latency</replaceable><replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable>
<replaceable>max_latency</replaceable>{ <replaceable>failures</replaceable> |
<replaceable>serialization_failures</replaceable><replaceable>deadlock_failures</replaceable> } <optional>
<replaceable>sum_lag</replaceable><replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable>
<replaceable>max_lag</replaceable><optional> <replaceable>skipped</replaceable> </optional> </optional> <optional>
<replaceable>retried</replaceable><replaceable>retries</replaceable> </optional>
 
+<replaceable>interval_start</replaceable> <replaceable>num_transactions</replaceable>
<replaceable>sum_latency</replaceable><replaceable>sum_latency_2</replaceable> <replaceable>min_latency</replaceable>
<replaceable>max_latency</replaceable>
+<replaceable>sum_lag</replaceable> <replaceable>sum_lag_2</replaceable> <replaceable>min_lag</replaceable>
<replaceable>max_lag</replaceable><replaceable>skipped</replaceable>
 
+<replaceable>retried</replaceable> <replaceable>retries</replaceable> <replaceable>failures</replaceable>
<replaceable>serialization_failures</replaceable><replaceable>deadlock_failures</replaceable>
 
 </synopsis>
 
    where
@@ -2417,41 +2419,55 @@ END;
    and
    <replaceable>max_latency</replaceable> is the maximum latency within the interval,
    <replaceable>failures</replaceable> is the number of transactions that ended
-   with a failed SQL command within the interval. If you use option
-   <option>--failures-detailed</option>, instead of the sum of all failed
-   transactions you will get more detailed statistics for the failed
-   transactions grouped by the following types:
-   <replaceable>serialization_failures</replaceable> is the number of
-   transactions that got a serialization error and were not retried after this,
-   <replaceable>deadlock_failures</replaceable> is the number of transactions
-   that got a deadlock error and were not retried after this.
+   with a failed SQL command within the interval.
+  </para>
+  <para>
    The next fields,
    <replaceable>sum_lag</replaceable>, <replaceable>sum_lag_2</replaceable>, <replaceable>min_lag</replaceable>,
-   and <replaceable>max_lag</replaceable>, are only present if the <option>--rate</option>
-   option is used.
+   and <replaceable>max_lag</replaceable>, only meaningful if the <option>--rate</option>
+   option is used. Otherwise, they are all 0.0.
    They provide statistics about the time each transaction had to wait for the
    previous one to finish, i.e., the difference between each transaction's
    scheduled start time and the time it actually started.
    The next field, <replaceable>skipped</replaceable>,
-   is only present if the <option>--latency-limit</option> option is used, too.
+   is only meaningful if the <option>--latency-limit</option> option is used, too. Otherwise it is 0.
    It counts the number of transactions skipped because they would have
    started too late.
-   The <replaceable>retried</replaceable> and <replaceable>retries</replaceable>
-   fields are present only if the <option>--max-tries</option> option is not
-   equal to 1. They report the number of retried transactions and the sum of all
-   retries after serialization or deadlock errors within the interval.
-   Each transaction is counted in the interval when it was committed.
+  </para>
+  <para>
+   The <replaceable>retried</replaceable>
+   and <replaceable>retries</replaceable> fields are only meaningful if
+   the <option>--max-tries</option> option is not equal to 1. Otherwise they
+   are 0. They report the number of retried transactions and the sum of all
+   retries after serialization or deadlock errors within the interval.  Each
+   transaction is counted in the interval when it was committed.
+  </para>
+  <para>
+   <replaceable>failures</replaceable> is the sum of all failed transactions.
+   If <option>--failures-detailed</option> is specified, instead of the sum of
+   all failed transactions you will get more detailed statistics for the
+   failed transactions grouped by the following types:
+   <replaceable>serialization_failures</replaceable> is the number of
+   transactions that got a serialization error and were not retried after this,
+   <replaceable>deadlock_failures</replaceable> is the number of transactions
+   that got a deadlock error and were not retried after this.
+   If <option>--failures-detailed</option> is not
+   specified, <replaceable>serialization_failures</replaceable>
+   and <replaceable>deadlock_failures</replaceable> are always 0.
   </para>
 
   <para>
-   Here is some example output:
+   Here is some example output with following options:
 <screen>
-1345828501 5601 1542744 483552416 61 2573 0
-1345828503 7884 1979812 565806736 60 1479 0
-1345828505 7208 1979422 567277552 59 1391 0
-1345828507 7685 1980268 569784714 60 1398 0
-1345828509 7073 1979779 573489941 236 1411 0
-</screen></para>
+pgbench --aggregate-interval=10 --time=20 --client=10 --log --rate=1000
+--latency-limit=10 --failures-detailed --max-tries=10 test
+</screen>
+
+<screen>
+1649114136 5815 27552565 177846919143 1078 21716 2756787 7264696105 0 9661 0 7854 31472 4022 4022 0
+1649114146 5958 28460110 182785513108 1083 20391 2539395 6411761497 0 7268 0 8127 32595 4101 4101 0
+</screen>
+  </para>
 
   <para>
    Notice that while the plain (unaggregated) log file shows which script
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index acf3e56413..4d4b979e4f 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -4494,6 +4494,17 @@ doLog(TState *thread, CState *st,
 
         while ((next = agg->start_time + agg_interval * INT64CONST(1000000)) <= now)
         {
+            double        lag_sum = 0.0;
+            double        lag_sum2 = 0.0;
+            double        lag_min = 0.0;
+            double        lag_max = 0.0;
+            int64        skipped = 0;
+            int64        serialization_failures = 0;
+            int64        deadlock_failures = 0;
+            int64        serialization_or_deadlock_failures = 0;
+            int64        retried = 0;
+            int64        retries = 0;
+
             /* print aggregated report to logfile */
             fprintf(logfile, INT64_FORMAT " " INT64_FORMAT " %.0f %.0f %.0f %.0f",
                     agg->start_time / 1000000,    /* seconds since Unix epoch */
@@ -4503,27 +4514,41 @@ doLog(TState *thread, CState *st,
                     agg->latency.min,
                     agg->latency.max);
 
-            if (failures_detailed)
-                fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
-                        agg->serialization_failures,
-                        agg->deadlock_failures);
-            else
-                fprintf(logfile, " " INT64_FORMAT, getFailures(agg));
-
             if (throttle_delay)
             {
-                fprintf(logfile, " %.0f %.0f %.0f %.0f",
-                        agg->lag.sum,
-                        agg->lag.sum2,
-                        agg->lag.min,
-                        agg->lag.max);
-                if (latency_limit)
-                    fprintf(logfile, " " INT64_FORMAT, agg->skipped);
+                lag_sum = agg->lag.sum;
+                lag_sum2 = agg->lag.sum2;
+                lag_min = agg->lag.min;
+                lag_max = agg->lag.max;
             }
+            fprintf(logfile, " %.0f %.0f %.0f %.0f",
+                        lag_sum,
+                        lag_sum2,
+                        lag_min,
+                        lag_max);
+
+            if (latency_limit)
+                skipped = agg->skipped;
+            fprintf(logfile, " " INT64_FORMAT, skipped);
+
             if (max_tries != 1)
-                fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT,
-                        agg->retried,
-                        agg->retries);
+            {
+                retried = agg->retried;
+                retries = agg->retries;
+            }
+            fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT, retried, retries);
+
+            if (failures_detailed)
+            {
+                serialization_failures = agg->serialization_failures;
+                deadlock_failures = agg->deadlock_failures;
+            }
+            serialization_or_deadlock_failures = serialization_failures + deadlock_failures;
+            fprintf(logfile, " " INT64_FORMAT " " INT64_FORMAT " " INT64_FORMAT,
+                    serialization_or_deadlock_failures,
+                    serialization_failures,
+                    deadlock_failures);
+
             fputc('\n', logfile);
 
             /* reset data and move to next interval */

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: shared-memory based stats collector - v67
Next
From: Michael Paquier
Date:
Subject: Re: Run pg_amcheck in 002_pg_upgrade.pl and 027_stream_regress.pl?