pgbench logging broken by time logic changes - Mailing list pgsql-hackers

From Gregory Smith
Subject pgbench logging broken by time logic changes
Date
Msg-id CAHLJuCW_8Vpcr0=t6O_gozrg3wXXWXZXDioYJd3NhvKriqgpfQ@mail.gmail.com
Whole thread Raw
Responses Re: pgbench logging broken by time logic changes  (Yugo NAGATA <nagata@sraoss.co.jp>)
Re: pgbench logging broken by time logic changes  (Fabien COELHO <coelho@cri.ensmp.fr>)
Re: pgbench logging broken by time logic changes  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
Back on March 10 Thomas Munro committed and wrestled multiple reworks of the pgbench code from Fabien and the crew.  The feature to synchronize startup I'm looking forward to testing now that I have a packaged beta.  Variations on that problem have bit me so many times I added code last year to my pgbench processing pipeline to just throw out the first and last 10% of every data set.

Before I could get to startup timing I noticed the pgbench logging output was broken via commit 547f04e7 "Improve time logic":  https://www.postgresql.org/message-id/E1lJqpF-00064e-C6%40gemulon.postgresql.org

A lot of things are timed in pgbench now so I appreciate the idea.  Y'all started that whole big thread about sync on my birthday though and I didn't follow the details of what that was reviewed against.  For the logging use case I suspect it's just broken everywhere.  The two platforms I tested were PGDG Ubuntu beta1 apt install and Mac git build.  Example:

    $ createdb pgbench
    $ pgbench -i -s 1 pgbench
    $ pgbench -S -T 1 -l pgbench
    $ head pgbench_log.*
    0 1 1730 0 1537380 70911
    0 2 541 0 1537380 71474

The epoch time is the 5th column in the output, and this week it should look like this:

    0 1 1411 0 1623767029 732926
    0 2 711 0 1623767029 733660

If you're not an epoch guru who recognizes what's wrong already, you might grab https://github.com/gregs1104/pgbench-tools/ and party like it's 1970 to see it:

    $ ~/repos/pgbench-tools/log-to-csv 1 local < pgbench_log* | head
    1970-01-18 14:03:00.070911,0,1.73,1,local
    1970-01-18 14:03:00.071474,0,0.541,1,local

I have a lot of community oriented work backed up behind this right now, so I'm gonna be really honest.  This time rework commit in its current form makes me uncomfortable at this point in the release schedule.  The commit has already fought through two rounds of platform specific bug fixes.  But since the buildfarm doesn't test the logging feature, that whole process is suspect.

My take on the PostgreSQL way to proceed:  this bug exposes that pgbench logging is a feature we finally need to design testing for.  We need a new buildfarm test and then a march through a full release phase to see how it goes.  Only then should we start messing with the time logic.  Even if we fixed the source today on both my test platforms, I'd still be nervous that beta 2 could ship and more performance testing could fall over from this modification.  And that's cutting things a little close.

The fastest way to get me back to comfortable would be to unwind 547f04e7 and its associated fixes and take it back to review.  I understand the intent and value; I appreciate the work so far.  The big industry architecture shift from Intel to ARM has me worried about time overhead again, the old code is wonky, and in the PG15 release cycle I already have resources planned around this area.

# PG15 Plans

I didn't intend to roll back in after time away and go right to a revert review.  But I also really don't want to start my public PG14 story documenting the reality that I had to use PG13's pgbench to generate my examples either.  I can't fight much with this logging problem while also doing my planned public performance testing of PG14.  I already had to push back a solid bit of Beta 1 PR from this week, some "community PG is great!" promotional blogging.

Let me offer what I can commit to from Crunchy corporate.  I'm about to submit multiple pgbench feature changes to the open CF starting July, with David Christiansen.  We and the rest of Crunchy will happily help re-review this time change idea, its logging issues, testing, rejoin the study of platform time call overhead, and bash the whole mess into shape for PG15.  I personally am looking forward to it.

The commit made a functional change to the way connection time is displayed; that I can take or leave as committed.  I'm not sure it can be decoupled from the rest of the changes.  It did cause a small breaking pgbench output parsing problem for me, just trivial regex adjustment.  That break would fit in fine with my upcoming round of submissions.

--
Greg Smith greg.smith@crunchydata.com
Director of Open Source Strategy, Crunchy Data

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Decoding speculative insert with toast leaks memory
Next
From: Ha Ka
Date:
Subject: Re: Unresolved repliaction hang and stop problem.