Re: pgbench logging broken by time logic changes - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: pgbench logging broken by time logic changes
Date
Msg-id CA+hUKG+jhatozihmqy4=Ufd0OHwZadEd_vg-_+26hE3f4C3TgQ@mail.gmail.com
Whole thread Raw
In response to pgbench logging broken by time logic changes  (Gregory Smith <gregsmithpgsql@gmail.com>)
Responses Re: pgbench logging broken by time logic changes
Re: pgbench logging broken by time logic changes
List pgsql-hackers
Hi Greg,

On Thu, Jun 17, 2021 at 2:49 AM Gregory Smith <gregsmithpgsql@gmail.com> wrote:
> Back on March 10 Thomas Munro committed and wrestled multiple reworks of the pgbench code from Fabien and the crew.
Thefeature to synchronize startup I'm looking forward to testing now that I have a packaged beta.  Variations on that
problemhave bit me so many times I added code last year to my pgbench processing pipeline to just throw out the first
andlast 10% of every data set. 

Yeah, commit aeb57af8 is a nice improvement and was the main thing I
wanted to get into the tree for 14 in this area, because it was
measuring the wrong thing.

> Before I could get to startup timing I noticed the pgbench logging output was broken via commit 547f04e7 "Improve
timelogic":  https://www.postgresql.org/message-id/E1lJqpF-00064e-C6%40gemulon.postgresql.org 

It does suck that we broke the logging and that it took 3 months for
anyone to notice and report it to the list.  Seems like it should be
straightforward to fix, though, with fixes already proposed (though I
haven't studied them yet, will do).

> I have a lot of community oriented work backed up behind this right now, so I'm gonna be really honest.  This time
reworkcommit in its current form makes me uncomfortable at this point in the release schedule.  The commit has already
foughtthrough two rounds of platform specific bug fixes.  But since the buildfarm doesn't test the logging feature,
thatwhole process is suspect. 

It's true that this work produced a few rounds of small portability
follow-ups: c427de42 (work around strange hacks elsewhere in the tree
for AIX), 68b34b23 (missing calling convention specifier on Windows),
and de91c3b9 (adjust pthread missing-function code for threadless
builds).  These were problems that didn't show up on developer or CI
systems (including threadless and Windows), and IMHO are typical sorts
of problems you expect to have to work through when stuff hits the
build farm, especially when using new system interfaces.  So I don't
think any of that, on its own, supports reverting anything here.

> My take on the PostgreSQL way to proceed:  this bug exposes that pgbench logging is a feature we finally need to
designtesting for.  We need a new buildfarm test and then a march through a full release phase to see how it goes.
Onlythen should we start messing with the time logic.  Even if we fixed the source today on both my test platforms, I'd
stillbe nervous that beta 2 could ship and more performance testing could fall over from this modification.  And that's
cuttingthings a little close. 
>
> The fastest way to get me back to comfortable would be to unwind 547f04e7 and its associated fixes and take it back
toreview.  I understand the intent and value; I appreciate the work so far.  The big industry architecture shift from
Intelto ARM has me worried about time overhead again, the old code is wonky, and in the PG15 release cycle I already
haveresources planned around this area. 

Let me study the proposed fixes on this and the other thread about
pgbench logging for a bit.

Glad to hear that you're working on this area.  I guess you might be
researching stuff along the same sorts of lines as in the thread
"Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?" (though
that's about the executor).  As I already expressed in that thread, if
the backend's instrumentation code is improved as proposed there,
we'll probably want to rip some of these pgbench changes out anyway
and go back to common instrumentation code.

For that reason, I'm not super attached to that new pg_time_usec_t
stuff at all, and wouldn't be sad if we reverted that piece.  I am
moderately attached to the sync changes, though.  pgbench 13 is
objectively producing incorrect results in that respect.



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: PoC: Using Count-Min Sketch for join cardinality estimation
Next
From: Michael Paquier
Date:
Subject: Re: pgbench logging broken by time logic changes