Re: pgbench logging broken by time logic changes - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: pgbench logging broken by time logic changes
Date
Msg-id YNKu2RVefp1lPjv2@paquier.xyz
Whole thread Raw
In response to Re: pgbench logging broken by time logic changes  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: pgbench logging broken by time logic changes
List pgsql-hackers
On Tue, Jun 22, 2021 at 12:06:45PM +0200, Fabien COELHO wrote:
> Attached an updated v8 patch which adds (reinstate) an improved TAP test
> which would have caught the various regressions on logs.

> Given that such tests were removed once before, I'm unsure whether they will
> be acceptable, despite that their usefulness has been clearly demonstrated.
> At least it is for the record. Sigh:-(

Thanks!

This v8 is an addition of the fix for the epoch with the adjustments
for the aggregate reports in the logs.  The maths look rather right
after a read and after some tests.

+# note: this test is time sensitive, and may fail on a very
+#       loaded host.
+# note: --progress-timestamp is not tested
+my $delay = pgbench(
+    '-T 2 -P 1 -l --aggregate-interval=1 -S -b se@2'
+    . ' --rate=20 --latency-limit=1000 -j ' . $nthreads
+    . ' -c 3 -r',
+    0,
+    [   qr{type: multiple},
+        qr{clients: 3},
+        qr{threads: $nthreads},
+        qr{duration: 2 s},
+        qr{script 1: .* select only},
+        qr{script 2: .* select only},
+        qr{statement latencies in milliseconds},
+        qr{FROM pgbench_accounts} ],
+    [ qr{vacuum}, qr{progress: 1\b} ],
+    'pgbench progress', undef,
+    "--log-prefix=$bdir/001_pgbench_log_1");
Could we make that shorter at 1s?  That will shorten the duration of
the test run.  It is easy to miss that this test is for
--aggregate-interval (aka the logAgg() path) without a comment.

+# cool check that we are around 2 seconds
+# The rate may results in an unlucky schedule which triggers
+# an early exit, hence the loose bound.
+ok(0.0 < $delay && $delay < 4.0, "-T 2 run around 2 seconds");

The command itself would not fail, but we would just fail on a machine
where the difference in start/stop time is higher than 4 seconds,
right?  On RPI-level machines, this could fail reliably.  I am not
completely sure what's the additional value we can get from that extra
test, to be honest.

+# $nthreads threads, 2 seconds, but due to timing imprecision we might get
+# only 1 or as many as 3 progress reports per thread.
+check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3,
+    qr{^\d{10,} \d{1,2} \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+$});
+
Now this one is good and actually stable thanks to the fact that we'd
get at least the final logs, and the main complain we got to discuss
about on this thread was the format of the logs.  I would say to give
up on the first test, and keep the second.  But, is this regex
correct?  Shouldn't we check for six integer fields only with the
first one having a minimal number of digits for the epoch?
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Ajin Cherian
Date:
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Next
From: Tom Lane
Date:
Subject: Re: Assertion failure in HEAD and 13 after calling COMMIT in a stored proc