Thread: pgbench logging broken by time logic changes

pgbench logging broken by time logic changes

From
Gregory Smith
Date:
Back on March 10 Thomas Munro committed and wrestled multiple reworks of the pgbench code from Fabien and the crew.  The feature to synchronize startup I'm looking forward to testing now that I have a packaged beta.  Variations on that problem have bit me so many times I added code last year to my pgbench processing pipeline to just throw out the first and last 10% of every data set.

Before I could get to startup timing I noticed the pgbench logging output was broken via commit 547f04e7 "Improve time logic":  https://www.postgresql.org/message-id/E1lJqpF-00064e-C6%40gemulon.postgresql.org

A lot of things are timed in pgbench now so I appreciate the idea.  Y'all started that whole big thread about sync on my birthday though and I didn't follow the details of what that was reviewed against.  For the logging use case I suspect it's just broken everywhere.  The two platforms I tested were PGDG Ubuntu beta1 apt install and Mac git build.  Example:

    $ createdb pgbench
    $ pgbench -i -s 1 pgbench
    $ pgbench -S -T 1 -l pgbench
    $ head pgbench_log.*
    0 1 1730 0 1537380 70911
    0 2 541 0 1537380 71474

The epoch time is the 5th column in the output, and this week it should look like this:

    0 1 1411 0 1623767029 732926
    0 2 711 0 1623767029 733660

If you're not an epoch guru who recognizes what's wrong already, you might grab https://github.com/gregs1104/pgbench-tools/ and party like it's 1970 to see it:

    $ ~/repos/pgbench-tools/log-to-csv 1 local < pgbench_log* | head
    1970-01-18 14:03:00.070911,0,1.73,1,local
    1970-01-18 14:03:00.071474,0,0.541,1,local

I have a lot of community oriented work backed up behind this right now, so I'm gonna be really honest.  This time rework commit in its current form makes me uncomfortable at this point in the release schedule.  The commit has already fought through two rounds of platform specific bug fixes.  But since the buildfarm doesn't test the logging feature, that whole process is suspect.

My take on the PostgreSQL way to proceed:  this bug exposes that pgbench logging is a feature we finally need to design testing for.  We need a new buildfarm test and then a march through a full release phase to see how it goes.  Only then should we start messing with the time logic.  Even if we fixed the source today on both my test platforms, I'd still be nervous that beta 2 could ship and more performance testing could fall over from this modification.  And that's cutting things a little close.

The fastest way to get me back to comfortable would be to unwind 547f04e7 and its associated fixes and take it back to review.  I understand the intent and value; I appreciate the work so far.  The big industry architecture shift from Intel to ARM has me worried about time overhead again, the old code is wonky, and in the PG15 release cycle I already have resources planned around this area.

# PG15 Plans

I didn't intend to roll back in after time away and go right to a revert review.  But I also really don't want to start my public PG14 story documenting the reality that I had to use PG13's pgbench to generate my examples either.  I can't fight much with this logging problem while also doing my planned public performance testing of PG14.  I already had to push back a solid bit of Beta 1 PR from this week, some "community PG is great!" promotional blogging.

Let me offer what I can commit to from Crunchy corporate.  I'm about to submit multiple pgbench feature changes to the open CF starting July, with David Christiansen.  We and the rest of Crunchy will happily help re-review this time change idea, its logging issues, testing, rejoin the study of platform time call overhead, and bash the whole mess into shape for PG15.  I personally am looking forward to it.

The commit made a functional change to the way connection time is displayed; that I can take or leave as committed.  I'm not sure it can be decoupled from the rest of the changes.  It did cause a small breaking pgbench output parsing problem for me, just trivial regex adjustment.  That break would fit in fine with my upcoming round of submissions.

--
Greg Smith greg.smith@crunchydata.com
Director of Open Source Strategy, Crunchy Data

Re: pgbench logging broken by time logic changes

From
Yugo NAGATA
Date:
On Wed, 16 Jun 2021 10:49:36 -0400
Gregory Smith <gregsmithpgsql@gmail.com> wrote:

> A lot of things are timed in pgbench now so I appreciate the idea.  Y'all
> started that whole big thread about sync on my birthday though and I didn't
> follow the details of what that was reviewed against.  For the logging use
> case I suspect it's just broken everywhere.  The two platforms I tested
> were PGDG Ubuntu beta1 apt install and Mac git build.  Example:
> 
>     $ createdb pgbench
>     $ pgbench -i -s 1 pgbench
>     $ pgbench -S -T 1 -l pgbench
>     $ head pgbench_log.*
>     0 1 1730 0 1537380 70911
>     0 2 541 0 1537380 71474
> 
> The epoch time is the 5th column in the output, and this week it should
> look like this:
> 
>     0 1 1411 0 1623767029 732926
>     0 2 711 0 1623767029 733660
> 
> If you're not an epoch guru who recognizes what's wrong already, you might
> grab https://github.com/gregs1104/pgbench-tools/ and party like it's 1970
> to see it:
> 
>     $ ~/repos/pgbench-tools/log-to-csv 1 local < pgbench_log* | head
>     1970-01-18 14:03:00.070911,0,1.73,1,local
>     1970-01-18 14:03:00.071474,0,0.541,1,local

After the commit, pgbench tries to get the current timestamp by calling
pg_time_now(). This uses INSTR_TIME_SET_CURRENT in it, but this macro
can call clock_gettime(CLOCK_MONOTONIC[_RAW], ) instead of gettimeofday
or clock_gettime(CLOCK_REALTIME, ). When CLOCK_MONOTONIC[_RAW] is used,
clock_gettime doesn't return epoch time. Therefore, we can use
INSTR_TIME_SET_CURRENT aiming to calculate a duration, but we should
not have used this to get the current timestamp.

I think we can fix this issue by using gettimeofday for logging as before
this was changed. I attached the patch.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Attachment

Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
Hello Greg,

> I have a lot of community oriented work backed up behind this right now, so
> I'm gonna be really honest.  This time rework commit in its current form
> makes me uncomfortable at this point in the release schedule.  The commit
> has already fought through two rounds of platform specific bug fixes.  But
> since the buildfarm doesn't test the logging feature, that whole process is
> suspect.

Logging is/should going to be fixed.

> My take on the PostgreSQL way to proceed:  this bug exposes that pgbench
> logging is a feature we finally need to design testing for.

Sure.

The key feedback for me is the usual one: what is not tested does 
not work. Wow:-)

I'm unhappy because I already added tap tests for time-sensitive features 
(-T and others, maybe logging aggregates, cannot remember), which have 
been removed because they could fail under some circonstances (eg very 
very very very slow hosts), or required some special handling (a few lines 
of code) in pgbench, and the net result of this is there is not a single 
test in place for some features:-(

There is no problem with proposing tests, the problem is that they are 
accepted, or if they are accepted then that they are not removed at the 
first small issue but rather fixed, or their limitations accepted, because 
testing time-sensitive features is not as simple as testing functional 
features.

Note that currently there is not a single test of psql with autocommit off 
or with "on error rollback". Last time a submitted tap tests to raise psql 
test coverage from 50% to 90%, it was rejected. I'll admit that I'm tired 
arguing that more testing is required, and I'm very happy if someone else 
is ready to try again. Good luck! :-)

> We need a new buildfarm test and then a march through a full release 
> phase to see how it goes.

> Only then should we start messing with the time logic.  Even if we fixed 
> the source today on both my test platforms, I'd still be nervous that 
> beta 2 could ship and more performance testing could fall over from this 
> modification.  And that's cutting things a little close.

Well, the point beta is to discover bugs not caught by reviews and dev 
tests, fix them, and possibly add tests which would have caught them.

If you revert all features on the first issue in a corner case and put it 
back to the queue, then I do not see why the review and dev tests will be 
much better on the next round, so it does not really help moving things 
forward.

IMHO, the pragmatic approach is to look at fixing first, and maybe revert 
if the problems are deep. I'm not sure this is obviously the case here.

-- 
Fabien.



Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
> pg_time_now(). This uses INSTR_TIME_SET_CURRENT in it, but this macro
> can call clock_gettime(CLOCK_MONOTONIC[_RAW], ) instead of gettimeofday
> or clock_gettime(CLOCK_REALTIME, ). When CLOCK_MONOTONIC[_RAW] is used,
> clock_gettime doesn't return epoch time. Therefore, we can use
> INSTR_TIME_SET_CURRENT aiming to calculate a duration, but we should
> not have used this to get the current timestamp.
>
> I think we can fix this issue by using gettimeofday for logging as before
> this was changed. I attached the patch.

I cannot say that I'm thrilled by having multiple tv stuff back in several 
place. I can be okay with one, though. What about the attached? Does it 
make sense?

-- 
Fabien.
Attachment

Re: pgbench logging broken by time logic changes

From
Andrew Dunstan
Date:
On 6/16/21 2:59 PM, Fabien COELHO wrote:
>
> Hello Greg,
>
>> I have a lot of community oriented work backed up behind this right
>> now, so
>> I'm gonna be really honest.  This time rework commit in its current form
>> makes me uncomfortable at this point in the release schedule.  The
>> commit
>> has already fought through two rounds of platform specific bug
>> fixes.  But
>> since the buildfarm doesn't test the logging feature, that whole
>> process is
>> suspect.
>
> Logging is/should going to be fixed.
>
>> My take on the PostgreSQL way to proceed:  this bug exposes that pgbench
>> logging is a feature we finally need to design testing for.
>
> Sure.
>
> The key feedback for me is the usual one: what is not tested does not
> work. Wow:-)


Agreed.


>
> I'm unhappy because I already added tap tests for time-sensitive
> features (-T and others, maybe logging aggregates, cannot remember),
> which have been removed because they could fail under some
> circonstances (eg very very very very slow hosts), or required some
> special handling (a few lines of code) in pgbench, and the net result
> of this is there is not a single test in place for some features:-(


I'm not familiar with exactly what happened in this case, but tests need
to be resilient over a wide range of performance characteristics. One
way around this issue might be to have a way of detecting that it's on a
slow platform and if so either skipping tests (Test::More provides
plenty of support for this) or expecting different results.


>
> There is no problem with proposing tests, the problem is that they are
> accepted, or if they are accepted then that they are not removed at
> the first small issue but rather fixed, or their limitations accepted,
> because testing time-sensitive features is not as simple as testing
> functional features.
>
> Note that currently there is not a single test of psql with autocommit
> off or with "on error rollback". Last time a submitted tap tests to
> raise psql test coverage from 50% to 90%, it was rejected. I'll admit
> that I'm tired arguing that more testing is required, and I'm very
> happy if someone else is ready to try again. Good luck! :-)


:-(


>
>> We need a new buildfarm test and then a march through a full release
>> phase to see how it goes.
>
>> Only then should we start messing with the time logic.  Even if we
>> fixed the source today on both my test platforms, I'd still be
>> nervous that beta 2 could ship and more performance testing could
>> fall over from this modification.  And that's cutting things a little
>> close.
>
> Well, the point beta is to discover bugs not caught by reviews and dev
> tests, fix them, and possibly add tests which would have caught them.
>
> If you revert all features on the first issue in a corner case and put
> it back to the queue, then I do not see why the review and dev tests
> will be much better on the next round, so it does not really help
> moving things forward.
>
> IMHO, the pragmatic approach is to look at fixing first, and maybe
> revert if the problems are deep. I'm not sure this is obviously the
> case here.



It does look like the submitted fix basically reverts the changes w.r.t.
this timestamp logging.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: pgbench logging broken by time logic changes

From
Thomas Munro
Date:
Hi Greg,

On Thu, Jun 17, 2021 at 2:49 AM Gregory Smith <gregsmithpgsql@gmail.com> wrote:
> Back on March 10 Thomas Munro committed and wrestled multiple reworks of the pgbench code from Fabien and the crew.
Thefeature to synchronize startup I'm looking forward to testing now that I have a packaged beta.  Variations on that
problemhave bit me so many times I added code last year to my pgbench processing pipeline to just throw out the first
andlast 10% of every data set. 

Yeah, commit aeb57af8 is a nice improvement and was the main thing I
wanted to get into the tree for 14 in this area, because it was
measuring the wrong thing.

> Before I could get to startup timing I noticed the pgbench logging output was broken via commit 547f04e7 "Improve
timelogic":  https://www.postgresql.org/message-id/E1lJqpF-00064e-C6%40gemulon.postgresql.org 

It does suck that we broke the logging and that it took 3 months for
anyone to notice and report it to the list.  Seems like it should be
straightforward to fix, though, with fixes already proposed (though I
haven't studied them yet, will do).

> I have a lot of community oriented work backed up behind this right now, so I'm gonna be really honest.  This time
reworkcommit in its current form makes me uncomfortable at this point in the release schedule.  The commit has already
foughtthrough two rounds of platform specific bug fixes.  But since the buildfarm doesn't test the logging feature,
thatwhole process is suspect. 

It's true that this work produced a few rounds of small portability
follow-ups: c427de42 (work around strange hacks elsewhere in the tree
for AIX), 68b34b23 (missing calling convention specifier on Windows),
and de91c3b9 (adjust pthread missing-function code for threadless
builds).  These were problems that didn't show up on developer or CI
systems (including threadless and Windows), and IMHO are typical sorts
of problems you expect to have to work through when stuff hits the
build farm, especially when using new system interfaces.  So I don't
think any of that, on its own, supports reverting anything here.

> My take on the PostgreSQL way to proceed:  this bug exposes that pgbench logging is a feature we finally need to
designtesting for.  We need a new buildfarm test and then a march through a full release phase to see how it goes.
Onlythen should we start messing with the time logic.  Even if we fixed the source today on both my test platforms, I'd
stillbe nervous that beta 2 could ship and more performance testing could fall over from this modification.  And that's
cuttingthings a little close. 
>
> The fastest way to get me back to comfortable would be to unwind 547f04e7 and its associated fixes and take it back
toreview.  I understand the intent and value; I appreciate the work so far.  The big industry architecture shift from
Intelto ARM has me worried about time overhead again, the old code is wonky, and in the PG15 release cycle I already
haveresources planned around this area. 

Let me study the proposed fixes on this and the other thread about
pgbench logging for a bit.

Glad to hear that you're working on this area.  I guess you might be
researching stuff along the same sorts of lines as in the thread
"Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?" (though
that's about the executor).  As I already expressed in that thread, if
the backend's instrumentation code is improved as proposed there,
we'll probably want to rip some of these pgbench changes out anyway
and go back to common instrumentation code.

For that reason, I'm not super attached to that new pg_time_usec_t
stuff at all, and wouldn't be sad if we reverted that piece.  I am
moderately attached to the sync changes, though.  pgbench 13 is
objectively producing incorrect results in that respect.



Re: pgbench logging broken by time logic changes

From
Michael Paquier
Date:
On Thu, Jun 17, 2021 at 12:36:10PM +1200, Thomas Munro wrote:
> For that reason, I'm not super attached to that new pg_time_usec_t
> stuff at all, and wouldn't be sad if we reverted that piece.  I am
> moderately attached to the sync changes, though.  pgbench 13 is
> objectively producing incorrect results in that respect.

There is another item in this area where pgbench uses incorrect maths
when aggregating the stats of transactions mid-run and at the end of a
thread, issue caused by 547f04e as this code path forgot to handle the
s <-> us conversion:
https://www.postgresql.org/message-id/CAF7igB1r6wRfSCEAB-iZBKxkowWY6+dFF2jObSdd9+iVK+vHJg@mail.gmail.com

Wouldn't it be better to put all those fixes into the same bag?  If
you drop the business with pg_time_usec_t, it looks like we don't
really need to do anything there.
--
Michael

Attachment

Re: pgbench logging broken by time logic changes

From
Yugo NAGATA
Date:
On Wed, 16 Jun 2021 21:11:41 +0200 (CEST)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

> 
> > pg_time_now(). This uses INSTR_TIME_SET_CURRENT in it, but this macro
> > can call clock_gettime(CLOCK_MONOTONIC[_RAW], ) instead of gettimeofday
> > or clock_gettime(CLOCK_REALTIME, ). When CLOCK_MONOTONIC[_RAW] is used,
> > clock_gettime doesn't return epoch time. Therefore, we can use
> > INSTR_TIME_SET_CURRENT aiming to calculate a duration, but we should
> > not have used this to get the current timestamp.
> >
> > I think we can fix this issue by using gettimeofday for logging as before
> > this was changed. I attached the patch.
> 
> I cannot say that I'm thrilled by having multiple tv stuff back in several 
> place. I can be okay with one, though. What about the attached? Does it 
> make sense?

At first, I also thought of fixing pg_time_now() to use gettimeofday() instead
of INSTR_TIME_SET_CURRENT, but I noticed that using INSTR_TIME_SET_CURRENT is
proper to measure time interval.  I mean, this macro uses
lock_gettime(CLOCK_MONOTONIC, ) if avilable, which give reliable interval
timing even in the face of changes to the system clock due to NTP.

The commit 547f04e7 changed all of INSTR_TIME_SET_CURRENT, gettimeofday(), and
time() to pg_now_time() which calls INSTR_TIME_SET_CURRENT in it. So, my patch
intented to revert these changes only about gettimeofday() and time(), and remain
changes about INSTR_TIME_SET_CURRENT.

I attached the updated patch because I forgot to revert pg_now_time() to time(NULL).

Another idea to fix is adding 'use_epoch' flag to pg_now_time() like below, 

 pg_time_now(bool use_epoch)
 {
     if (use_epoch)
      {
        struct timeval tv;
        gettimeofday(&tv, NULL);
        return tv.tv_sec * 1000000 + tv.tv_usec;
       }
     else
       {
         instr_time      now;
         INSTR_TIME_SET_CURRENT(now);
         return (pg_time_usec_t) INSTR_TIME_GET_MICROSEC(now);
       }
 }

or making an additional function that returns epoch time.


By the way, there is another advantage of using clock_gettime(). That is, this
returns precise results in nanoseconds. However, we can not benefit from it because
pg_time_now() converts the value to uint64 by using INSTR_TIME_GET_MICROSEC. So,
if we would like more precious statistics in pgbench, it might be better to use
INSTR_TIME_GET_MICROSEC instead of pg_time_now in other places.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Attachment

Re: pgbench logging broken by time logic changes

From
Kyotaro Horiguchi
Date:
At Thu, 17 Jun 2021 12:23:42 +0900, Yugo NAGATA <nagata@sraoss.co.jp> wrote in 
> On Wed, 16 Jun 2021 21:11:41 +0200 (CEST)
> Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> > I cannot say that I'm thrilled by having multiple tv stuff back in several 
> > place. I can be okay with one, though. What about the attached? Does it 
> > make sense?

+1 The patch rounds down sd->start_time from ms to s but it seems to
me a degradation.

> At first, I also thought of fixing pg_time_now() to use gettimeofday() instead
> of INSTR_TIME_SET_CURRENT, but I noticed that using INSTR_TIME_SET_CURRENT is
> proper to measure time interval.  I mean, this macro uses
> lock_gettime(CLOCK_MONOTONIC, ) if avilable, which give reliable interval
> timing even in the face of changes to the system clock due to NTP.

If I understand the op correctly, the problem here is the time values
in pgbench log file are based on a bogus epoch. If it's the only issue
here and and if we just want to show the time based on the unix epoch
time, just recording the difference would work as I scketched in the
attached.  (Precisely theepoch would move if we set the system clock
but I don't think that matters:p)

> By the way, there is another advantage of using clock_gettime(). That is, this
> returns precise results in nanoseconds. However, we can not benefit from it because
> pg_time_now() converts the value to uint64 by using INSTR_TIME_GET_MICROSEC. So,
> if we would like more precious statistics in pgbench, it might be better to use
> INSTR_TIME_GET_MICROSEC instead of pg_time_now in other places.

I'm not sure we have transaction lasts for very short time that
nanoseconds matters.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index d7479925cb..0ce9315a2b 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -342,6 +342,8 @@ typedef struct StatsData
     SimpleStats lag;
 } StatsData;
 
+pg_time_usec_t epoch_shift;
+
 /*
  * Struct to keep random state.
  */
@@ -3803,7 +3805,7 @@ doLog(TState *thread, CState *st,
         {
             /* print aggregated report to logfile */
             fprintf(logfile, INT64_FORMAT " " INT64_FORMAT " %.0f %.0f %.0f %.0f",
-                    agg->start_time,
+                    agg->start_time + epoch_shift,
                     agg->cnt,
                     agg->latency.sum,
                     agg->latency.sum2,
@@ -3834,12 +3836,12 @@ doLog(TState *thread, CState *st,
         if (skipped)
             fprintf(logfile, "%d " INT64_FORMAT " skipped %d " INT64_FORMAT " "
                     INT64_FORMAT,
-                    st->id, st->cnt, st->use_file, now / 1000000, now % 1000000);
+                    st->id, st->cnt, st->use_file, (now + epoch_shift) / 1000000, (now + epoch_shift) % 1000000);
         else
             fprintf(logfile, "%d " INT64_FORMAT " %.0f %d " INT64_FORMAT " "
                     INT64_FORMAT,
                     st->id, st->cnt, latency, st->use_file,
-                    now / 1000000, now % 1000000);
+                    (now + epoch_shift) / 1000000, (now + epoch_shift) % 1000000);
         if (throttle_delay)
             fprintf(logfile, " %.0f", lag);
         fputc('\n', logfile);
@@ -5775,7 +5777,12 @@ main(int argc, char **argv)
     char       *env;
 
     int            exit_code = 0;
+    struct timeval tv;
 
+    gettimeofday(&tv, NULL);
+    
+    epoch_shift = tv.tv_sec * 1000000 + tv.tv_usec - pg_time_now();
+        
     pg_logging_init(argv[0]);
     progname = get_progname(argv[0]);


Re: pgbench logging broken by time logic changes

From
Yugo NAGATA
Date:
On Thu, 17 Jun 2021 14:17:56 +0900 (JST)
Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

> At Thu, 17 Jun 2021 12:23:42 +0900, Yugo NAGATA <nagata@sraoss.co.jp> wrote in 
> > On Wed, 16 Jun 2021 21:11:41 +0200 (CEST)
> > Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> > > I cannot say that I'm thrilled by having multiple tv stuff back in several 
> > > place. I can be okay with one, though. What about the attached? Does it 
> > > make sense?
> 
> +1 The patch rounds down sd->start_time from ms to s but it seems to
> me a degradation.

I don't think that's matter because sd->start_time is used only for
log aggregation and aggregate-interval is specified in seconds, though.

> > At first, I also thought of fixing pg_time_now() to use gettimeofday() instead
> > of INSTR_TIME_SET_CURRENT, but I noticed that using INSTR_TIME_SET_CURRENT is
> > proper to measure time interval.  I mean, this macro uses
> > lock_gettime(CLOCK_MONOTONIC, ) if avilable, which give reliable interval
> > timing even in the face of changes to the system clock due to NTP.
> 
> If I understand the op correctly, the problem here is the time values
> in pgbench log file are based on a bogus epoch. If it's the only issue
> here and and if we just want to show the time based on the unix epoch
> time, just recording the difference would work as I scketched in the
> attached.  (Precisely theepoch would move if we set the system clock
> but I don't think that matters:p)

That makes sense. If the system clock is shifted due to NTP (for example)
it would not affect the measurement although timestamps in logs could be shifted
because gettimeofday is called only once.

If we fix it in this way, we should fix also printProgressReport().

     if (progress_timestamp)
     {
-         snprintf(tbuf, sizeof(tbuf), "%.3f s", PG_TIME_GET_DOUBLE(now));
+         snprintf(tbuf, sizeof(tbuf), "%.3f s", PG_TIME_GET_DOUBLE(now * epoch_shift));
     }

> > By the way, there is another advantage of using clock_gettime(). That is, this
> > returns precise results in nanoseconds. However, we can not benefit from it because
> > pg_time_now() converts the value to uint64 by using INSTR_TIME_GET_MICROSEC. So,
> > if we would like more precious statistics in pgbench, it might be better to use
> > INSTR_TIME_GET_MICROSEC instead of pg_time_now in other places.
> 
> I'm not sure we have transaction lasts for very short time that
> nanoseconds matters.

I thought it might affect the accuracy when statistics are accumulated
through a huge numbers of transactions, but I am fine with it if no one
cares it.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>



Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
Hello Yugo-san,

>>> I think we can fix this issue by using gettimeofday for logging as before
>>> this was changed. I attached the patch.
>>
>> I cannot say that I'm thrilled by having multiple tv stuff back in several
>> place. I can be okay with one, though. What about the attached? Does it
>> make sense?
>
> At first, I also thought of fixing pg_time_now() to use gettimeofday() instead
> of INSTR_TIME_SET_CURRENT, but I noticed that using INSTR_TIME_SET_CURRENT is
> proper to measure time interval.  I mean, this macro uses
> lock_gettime(CLOCK_MONOTONIC, ) if avilable, which give reliable interval
> timing even in the face of changes to the system clock due to NTP.

Ok, I was thinking that it was possible that there was this kind of 
implication. Now, does it matter that much if a few transactions are 
skewed by NTP from time to time? Having to deal with different time 
functions in the same file seems painful.

> The commit 547f04e7 changed all of INSTR_TIME_SET_CURRENT, gettimeofday(), and
> time() to pg_now_time() which calls INSTR_TIME_SET_CURRENT in it. So, my patch
> intented to revert these changes only about gettimeofday() and time(), and remain
> changes about INSTR_TIME_SET_CURRENT.

Hmmm.

> pg_time_now(bool use_epoch)
> {
>     if (use_epoch)
>      {
>        struct timeval tv;
>        gettimeofday(&tv, NULL);
>        return tv.tv_sec * 1000000 + tv.tv_usec;
>       }
>     else
>       {
>         instr_time      now;
>         INSTR_TIME_SET_CURRENT(now);
>         return (pg_time_usec_t) INSTR_TIME_GET_MICROSEC(now);
>       }
> }
>
> or making an additional function that returns epoch time.

Yes, but when to call which version? How to avoid confusion? After giving 
it some thoughts, ISTM that the best short-term decision is just to have 
epoch everywhere, i.e. having now() to rely on gettimeofday, because:

  - at least one user is unhappy with not having epoch in the log file,
    and indeed it makes sense to be unhappy about that if they want to
    correlated logs. So I agree to undo that, or provide an option to undo
    it.

  - having different times with different origins at different point in
    the code makes it really hard to understand and maintain, and if we
    trade maintainability for precision it should really be worth it, and
    I'm not sure that working around NTP adjustment is worth it right now.

In the not so short term, I'd say that the best approach would be use 
relative time internally and just to offset these with a global epoch 
start time when displaying a timestamp.

> By the way, there is another advantage of using clock_gettime(). That is, this
> returns precise results in nanoseconds. However, we can not benefit from it because
> pg_time_now() converts the value to uint64 by using INSTR_TIME_GET_MICROSEC. So,
> if we would like more precious statistics in pgbench, it might be better to use
> INSTR_TIME_GET_MICROSEC instead of pg_time_now in other places.

The INSTR_TIME macros are pretty ugly and inefficient, especially when 
time arithmetic is involved because they re-implements 64 bits operations 
based on 32 bit ints. I really wanted to get rid of that as much as 
possible. From a database benchmarking perspective ISTM that µs is the 
right smallest unit, given that a transaction implies significant delays 
such as networks communications, parsing, and so on. So I do not think we 
should ever need nanos.

In conclusion, ISTM that it is enough to simply change now to call 
gettimeofday to fix the issue raised by Greg. This is patch v1 on the 
thread.

-- 
Fabien.

Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
Hello,

>>> I cannot say that I'm thrilled by having multiple tv stuff back in several
>>> place. I can be okay with one, though. What about the attached? Does it
>>> make sense?
>
> +1 The patch rounds down sd->start_time from ms to s but it seems to
> me a degradation.

Yes, please we should not use time.

>> At first, I also thought of fixing pg_time_now() to use gettimeofday() instead
>> of INSTR_TIME_SET_CURRENT, but I noticed that using INSTR_TIME_SET_CURRENT is
>> proper to measure time interval.  I mean, this macro uses
>> lock_gettime(CLOCK_MONOTONIC, ) if avilable, which give reliable interval
>> timing even in the face of changes to the system clock due to NTP.
>
> If I understand the op correctly, the problem here is the time values
> in pgbench log file are based on a bogus epoch.

It is not "bogus", but is not necessary epoch depending on the underlying 
function called behind by INSTR_TIME macros, and people are entitled to 
expect epoch for log correlations.

> If it's the only issue
> here and and if we just want to show the time based on the unix epoch
> time, just recording the difference would work as I scketched in the
> attached.  (Precisely theepoch would move if we set the system clock
> but I don't think that matters:p)

I do like the approach.

I'm hesitant to promote it for fixing the beta, but the code impact is 
small enough, so I'd say yes. Maybe there is a similar issue with progress 
which should probably use the same approach. I think that aligning the 
implementations can wait for pg15.

The patch as white space issues. Attached an updated version which fixes 
that, adds comments and simplify the code a little bit.

> I'm not sure we have transaction lasts for very short time that 
> nanoseconds matters.

Indeed.

-- 
Fabien.
Attachment

Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
Hello Thomas,

>> Before I could get to startup timing I noticed the pgbench logging 
>> output was broken via commit 547f04e7 "Improve time logic": 
>> https://www.postgresql.org/message-id/E1lJqpF-00064e-C6%40gemulon.postgresql.org
>
> It does suck that we broke the logging and that it took 3 months for
> anyone to notice and report it to the list.

Indeed. Well, it also demonstrates that beta are useful.

> Seems like it should be straightforward to fix, though, with fixes 
> already proposed (though I haven't studied them yet, will do).

I think that fixing logging is simple enough, thus a revert is not 
necessary.

>> I have a lot of community oriented work backed up behind this right 
>> now, so I'm gonna be really honest.  This time rework commit in its 
>> current form makes me uncomfortable at this point in the release 
>> schedule.  The commit has already fought through two rounds of platform 
>> specific bug fixes.  But since the buildfarm doesn't test the logging 
>> feature, that whole process is suspect.
>
> It's true that this work produced a few rounds of small portability
> follow-ups: c427de42 (work around strange hacks elsewhere in the tree
> for AIX), 68b34b23 (missing calling convention specifier on Windows),
> and de91c3b9 (adjust pthread missing-function code for threadless
> builds).  These were problems that didn't show up on developer or CI
> systems (including threadless and Windows), and IMHO are typical sorts
> of problems you expect to have to work through when stuff hits the
> build farm, especially when using new system interfaces.  So I don't
> think any of that, on its own, supports reverting anything here.

Yep, the buildfarm is here to catch portability issues, and it does its 
job:-) There is no doubt that logging is has been broken because of lack 
of tests in this area, shame on us. I think it is easy to fix.

> [...] For that reason, I'm not super attached to that new pg_time_usec_t 
> stuff at all, and wouldn't be sad if we reverted that piece.

Well, I was sooo happy to get rid of INSTR_TIME ugly and inefficient 
macros in pgbench… so anything looks better to me.

Note that Michaël is having a look at fixing pgbench logging issues.

-- 
Fabien.

Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
> Wouldn't it be better to put all those fixes into the same bag?

Attached.

-- 
Fabien.

Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
>> Wouldn't it be better to put all those fixes into the same bag?
>
> Attached.

Even better if the patch is not empty.

-- 
Fabien.
Attachment

Re: pgbench logging broken by time logic changes

From
Yugo NAGATA
Date:
On Thu, 17 Jun 2021 10:18:03 +0200 (CEST)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

> 
> >> Wouldn't it be better to put all those fixes into the same bag?
> >
> > Attached.
> 
> Even better if the patch is not empty.

I found you forgot to fix printProgressReport().

Also, according to the document, interval_start in Aggregated Logging
seems to be printed in seconds instead of ms.

   <para>
    Here is some example output:
 <screen>
 1345828501 5601 1542744 483552416 61 2573
 1345828503 7884 1979812 565806736 60 1479
 1345828505 7208 1979422 567277552 59 1391
 1345828507 7685 1980268 569784714 60 1398
 1345828509 7073 1979779 573489941 236 1411
 </screen></para>

If we obey the document and keep the back-compatibility, we should fix
logAgg().

The attached patch includes these fixes.

Regards,
Yugo Nagata 

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Attachment

Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
> I found you forgot to fix printProgressReport().

Indeed.

> Also, according to the document, interval_start in Aggregated Logging
> seems to be printed in seconds instead of ms.

Indeed. I'm unsure about what we should really want there, but for a beta 
bug fix I agree that it must simply comply to the old documented behavior.

> The attached patch includes these fixes.

Thanks. Works for me.

-- 
Fabien.



Re: pgbench logging broken by time logic changes

From
Thomas Munro
Date:
On Thu, Jun 17, 2021 at 7:24 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> > Seems like it should be straightforward to fix, though, with fixes
> > already proposed (though I haven't studied them yet, will do).
>
> I think that fixing logging is simple enough, thus a revert is not
> necessary.

I prepared a draft revert patch for discussion, just in case it comes
in handy.  This reverts "pgbench: Improve time logic.", but "pgbench:
Synchronize client threads." remains (slightly rearranged).

I'm on the fence TBH, I can see that it's really small things and it
seems we have the patches, but it's late, late enough that
benchmarking gurus are showing up to benchmark with it for real, and
it's not great to be getting in the way of that with what is mostly
refactoring work, so I don't think it would be a bad thing if we
agreed to try again in 15.  (A number of arguments for and against
moving pgbench out of the postgresql source tree and release cycle
occur to me, but I guess that's a topic for another thread.)

> > [...] For that reason, I'm not super attached to that new pg_time_usec_t
> > stuff at all, and wouldn't be sad if we reverted that piece.
>
> Well, I was sooo happy to get rid of INSTR_TIME ugly and inefficient
> macros in pgbench… so anything looks better to me.

I don't love it either, in this code or the executor.  (I know you
also don't like the THREAD_CREATE etc macros.  I have something to
propose to improve that for 15....)

> Note that Michaël is having a look at fixing pgbench logging issues.

Yeah I've been catching up with these threads.

Attachment

Re: pgbench logging broken by time logic changes

From
Andrew Dunstan
Date:
On 6/17/21 8:49 AM, Thomas Munro wrote:
> On Thu, Jun 17, 2021 at 7:24 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>>> Seems like it should be straightforward to fix, though, with fixes
>>> already proposed (though I haven't studied them yet, will do).
>> I think that fixing logging is simple enough, thus a revert is not
>> necessary.
> I prepared a draft revert patch for discussion, just in case it comes
> in handy.  This reverts "pgbench: Improve time logic.", but "pgbench:
> Synchronize client threads." remains (slightly rearranged).
>
> I'm on the fence TBH, I can see that it's really small things and it
> seems we have the patches, but it's late, late enough that
> benchmarking gurus are showing up to benchmark with it for real, and
> it's not great to be getting in the way of that with what is mostly
> refactoring work, so I don't think it would be a bad thing if we
> agreed to try again in 15.  


Is there an identified issue beyond the concrete example Greg gave of
the timestamps?


We are still fixing a few things with potentially far more impact than
anything in pgbench, so fixing this wouldn't bother me that much, as
long as we get it done for Beta2.


> (A number of arguments for and against
> moving pgbench out of the postgresql source tree and release cycle
> occur to me, but I guess that's a topic for another thread.)
>

Indeed.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: pgbench logging broken by time logic changes

From
Gregory Smith
Date:
On Wed, Jun 16, 2021 at 2:59 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
I'm unhappy because I already added tap tests for time-sensitive features
(-T and others, maybe logging aggregates, cannot remember), which have
been removed because they could fail under some circonstances (eg very
very very very slow hosts), or required some special handling (a few lines
of code) in pgbench, and the net result of this is there is not a single
test in place for some features:-(

I understand your struggle and I hope I was clear about two things:

-I am excited by all the progress made in pgbench, and this problem is an integration loose end rather than a developer failure at any level.
-Doing better in this messy area takes a team that goes from development to release management, and I had no right to complain unless I brought resources to improve in the specific areas of the process that I want to be better.

I think the only thing you and I disagree on is that you see a "first issue in a corner case" where I see a process failure that is absolutely vital for me to improve.  Since the reality is that I might be the best positioned person to actually move said process forward in a meaningful long-term way, I have every intention of applying pressure to the area you're frustrated at.  Crunchy has a whole parallel review team to the community one now focused on what our corporate and government customers need for software process control and procedure compliance.  The primary business problem I'm working on now is how to include performance review in that mix.

I already know I need to re-engage with you over how I need min/max numbers in the aggregate logging output to accomplish some valuable goals.  When I get around to that this summer, I'd really enjoy talking with you a bit, video call or something, about really any community topic you're frustrated with.  I have a lot riding now on the productivity of the PostgreSQL hacker community and I want everyone to succeed at the best goals.

There is no problem with proposing tests, the problem is that they are
accepted, or if they are accepted then that they are not removed at the
first small issue but rather fixed, or their limitations accepted, because
testing time-sensitive features is not as simple as testing functional
features.

For 2020 Crunchy gave me a sort of sabbatical year to research community oriented benchmarking topics.  Having a self contained project in my home turned out to be the perfect way to spend *that* wreck of a year. 

I made significant progress toward the idea of having a performance farm for PostgreSQL.  On my laptop today is a 14GB database with 1s resolution latency traces for 663 days of pgbench time running 4 workloads across a small bare metal farm of various operating systems and hardware classes.  I can answer questions like "how long does a typical SSD take to execute an INSERT commit?" across my farm with SQL.  It's at the "works for me!" stage of development, and I thought this was the right time in the development cycle to start sharing improvement ideas from my work; thus the other submissions in progress I alluded to.

The logging feature is in an intermediate spot where validating it requires light custom tooling that compares its output against known variables like the system time.  It doesn't quite have a performance component to it.  Since this time logic detail is a well known portability minefield, I thought demanding that particular test was a pretty easy sell.

That you in particular are frustrated here makes perfect sense to me.  I am fresh and ready to carry this forward some distance, and I hope the outcome makes you happy

Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
> Is there an identified issue beyond the concrete example Greg gave of
> the timestamps?

AFAICS, there is a patch which fixes all known issues linked to pgbench 
logging. Whether there exists other issues is possible, but the "broken"
area was quite specific. There are also some TAP tests on pgbench which do 
catch issues.

-- 
Fabien.



Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
Hello Greg,

> I think the only thing you and I disagree on is that you see a "first
> issue in a corner case" where I see a process failure that is absolutely
> vital for me to improve.

Hmmm. I agree that improvements are needed, but for me there is simply a
few missing (removed) tap tests which should/could have caught these
issues, which are AFAICS limited to the untested area.

Given the speed of the process and the energy and patience needed to move
things forward, reverting means that the patch is probably dead for at
least a year, possibly an eon, and that is too bad because IMHO it was an
improvement (my eyes are watering when I see INSTR_TIME macros), so I'd
prefer a fix rather than a revert if it is possible, which in this case I
think it could be.

> Since the reality is that I might be the best positioned person

Good for you:-)

> to actually move said process forward in a meaningful long-term way, I
> have every intention of applying pressure to the area you're frustrated
> at.  Crunchy has a whole parallel review team to the community one now
> focused on what our corporate and government customers need for software
> process control and procedure compliance.  The primary business problem
> I'm working on now is how to include performance review in that mix.

This idea has been around for some time now. It is quite a task, and a
working and possibly extended pgbench is just one part of the overall
software, infrastructure and procedure needed to have that.

> I already know I need to re-engage with you over how I need min/max numbers
> in the aggregate logging output to accomplish some valuable goals.

I do try to review every patch submitted about pgbench. Feel free to fire!

> When I get around to that this summer, I'd really enjoy talking with you
> a bit, video call or something, about really any community topic you're
> frustrated with.

"frustrated" may be a strong word. I'm somehow annoyed, and unlikely to
ever submit many tests improvements in the future.

>> There is no problem with proposing tests, the problem is that they are
>> accepted, or if they are accepted then that they are not removed at the
>> first small issue but rather fixed, or their limitations accepted, because
>> testing time-sensitive features is not as simple as testing functional
>> features.
>
> For 2020 Crunchy gave me a sort of sabbatical year to research community
> oriented benchmarking topics.  Having a self contained project in my home
> turned out to be the perfect way to spend *that* wreck of a year.

Yep.

> I made significant progress toward the idea of having a performance farm
> for PostgreSQL.  On my laptop today is a 14GB database with 1s resolution
> latency traces for 663 days of pgbench time running 4 workloads across a
> small bare metal farm of various operating systems and hardware classes.

Wow.

> I can answer questions like "how long does a typical SSD take to execute
> an INSERT commit?" across my farm with SQL.

So, what is the answer? :-)

> It's at the "works for me!" stage of development, and I thought this was
> the right time in the development cycle to start sharing improvement
> ideas from my work; thus the other submissions in progress I alluded to.
>
> The logging feature is in an intermediate spot where validating it requires
> light custom tooling that compares its output against known variables like
> the system time.

Sure.

> It doesn't quite have a performance component to it.

Hmmm, if you log all transactions it can becomes the performance
bottleneck quite quickly:-)

> Since this time logic detail is a well known portability minefield, I
> thought demanding that particular test was a pretty easy sell.

The test I recalled was removed at ad51c6f. Ok, it would not have caught
the issue about timestamp (although it could have been improved to do so),
but it would have caught the trivial one about the catchup loop in
aggregate interval generating too many lines because of a forgotten
conversion to µs.

--
Fabien.

Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
Hello Thomas,

> I prepared a draft revert patch for discussion, just in case it comes
> in handy.  This reverts "pgbench: Improve time logic.", but "pgbench:
> Synchronize client threads." remains (slightly rearranged).

I had a quick look.

I had forgotten that this patch also fixed the long-running brain-damaged 
tps computation that has been bothering me for years, so that one sane 
performance figure is now reported instead of two not-clear-to-interpret 
take-your-pick figures.

It would be a real loss if this user-facing fix is removed in the 
process:-(

-- 
Fabien.



Re: pgbench logging broken by time logic changes

From
Michael Paquier
Date:
On Fri, Jun 18, 2021 at 12:49:42AM +1200, Thomas Munro wrote:
> I'm on the fence TBH, I can see that it's really small things and it
> seems we have the patches, but it's late, late enough that
> benchmarking gurus are showing up to benchmark with it for real, and
> it's not great to be getting in the way of that with what is mostly
> refactoring work, so I don't think it would be a bad thing if we
> agreed to try again in 15.  (A number of arguments for and against
> moving pgbench out of the postgresql source tree and release cycle
> occur to me, but I guess that's a topic for another thread.)

I may be missing something of course, but I don't see any strong
reason why we need to do a revert here if we have patches to discuss
first.

>> Note that Michaël is having a look at fixing pgbench logging issues.
>
> Yeah I've been catching up with these threads.

Thomas, do you want me to look more at this issue?  I don't feel
comfortable with the idea of doing anything if you are planning to
look at this thread and you are the owner here, so that should be your
call.

From what I can see, we have the same area getting patched with
patches across two threads, so it seems better to give up the other
thread and just focus on the discussion here, where v7 has been sent:
https://www.postgresql.org/message-id/20210617175542.ad6b9b82926d8469e8520324@sraoss.co.jp
https://www.postgresql.org/message-id/CAF7igB1r6wRfSCEAB-iZBKxkowWY6%2BdFF2jObSdd9%2BiVK%2BvHJg%40mail.gmail.com
--
Michael

Attachment

Re: pgbench logging broken by time logic changes

From
Noah Misch
Date:
On Wed, Jun 16, 2021 at 03:13:30PM -0400, Andrew Dunstan wrote:
> On 6/16/21 2:59 PM, Fabien COELHO wrote:
> > The key feedback for me is the usual one: what is not tested does not
> > work. Wow:-)
> 
> Agreed.
> 
> > I'm unhappy because I already added tap tests for time-sensitive
> > features (-T and others, maybe logging aggregates, cannot remember),
> > which have been removed because they could fail under some
> > circonstances (eg very very very very slow hosts), or required some
> > special handling (a few lines of code) in pgbench, and the net result
> > of this is there is not a single test in place for some features:-(
> 
> I'm not familiar with exactly what happened in this case, but tests need
> to be resilient over a wide range of performance characteristics. One
> way around this issue might be to have a way of detecting that it's on a
> slow platform and if so either skipping tests (Test::More provides
> plenty of support for this) or expecting different results.

Detection would need the host to be consistently slow, like running under
Valgrind or a 20-year-old CPU.  We also test on systems having highly-variable
performance due to other processes competing for the same hardware.  I'd
perhaps add a "./configure --enable-realtime-tests" option that enables
affected tests.  Testers should use the option whenever the execution
environment has sufficient reserved CPU.



Re: pgbench logging broken by time logic changes

From
Thomas Munro
Date:
On Fri, Jun 18, 2021 at 12:31 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Fri, Jun 18, 2021 at 12:49:42AM +1200, Thomas Munro wrote:
> > Yeah I've been catching up with these threads.
>
> Thomas, do you want me to look more at this issue?  I don't feel
> comfortable with the idea of doing anything if you are planning to
> look at this thread and you are the owner here, so that should be your
> call.
>
> From what I can see, we have the same area getting patched with
> patches across two threads, so it seems better to give up the other
> thread and just focus on the discussion here, where v7 has been sent:
> https://www.postgresql.org/message-id/20210617175542.ad6b9b82926d8469e8520324@sraoss.co.jp
> https://www.postgresql.org/message-id/CAF7igB1r6wRfSCEAB-iZBKxkowWY6%2BdFF2jObSdd9%2BiVK%2BvHJg%40mail.gmail.com

Thanks for looking so far.  It's the weekend here and I need to
unplug, but I'll test these changes and if all looks good push on
Monday.



Re: pgbench logging broken by time logic changes

From
Michael Paquier
Date:
On Sat, Jun 19, 2021 at 11:59:16AM +1200, Thomas Munro wrote:
> Thanks for looking so far.  It's the weekend here and I need to
> unplug, but I'll test these changes and if all looks good push on
> Monday.

Thanks for the update.  Have a good weekend.
--
Michael

Attachment

Re: pgbench logging broken by time logic changes

From
Alvaro Herrera
Date:
On 2021-Jun-19, Thomas Munro wrote:

> Thanks for looking so far.  It's the weekend here and I need to
> unplug, but I'll test these changes and if all looks good push on
> Monday.

Surely not the same day as the beta stamp...

-- 
Álvaro Herrera       Valdivia, Chile
"Always assume the user will do much worse than the stupidest thing
you can imagine."                                (Julien PUYDT)



Re: pgbench logging broken by time logic changes

From
Thomas Munro
Date:
On Sun, Jun 20, 2021 at 3:18 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> On 2021-Jun-19, Thomas Munro wrote:
> > Thanks for looking so far.  It's the weekend here and I need to
> > unplug, but I'll test these changes and if all looks good push on
> > Monday.
>
> Surely not the same day as the beta stamp...

Because of timezones, that'll be Sunday in the Americas.  Still too close?



Re: pgbench logging broken by time logic changes

From
Thomas Munro
Date:
On Sun, Jun 20, 2021 at 4:52 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Sun, Jun 20, 2021 at 3:18 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > On 2021-Jun-19, Thomas Munro wrote:
> > > Thanks for looking so far.  It's the weekend here and I need to
> > > unplug, but I'll test these changes and if all looks good push on
> > > Monday.
> >
> > Surely not the same day as the beta stamp...
>
> Because of timezones, that'll be Sunday in the Americas.  Still too close?

Upon reflection, that amounts to the same thing really, so yeah,
scratch that plan.  I'll defer until after that (and then I'll be
leaning more towards the revert option).



Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
> Upon reflection, that amounts to the same thing really, so yeah,
> scratch that plan.  I'll defer until after that (and then I'll be
> leaning more towards the revert option).

Sigh. I do not understand anything about the decision processus.

If you do revert, please consider NOT reverting the tps computation 
changes intermixed in the patch.

-- 
Fabien.



Re: pgbench logging broken by time logic changes

From
Andrew Dunstan
Date:
On 6/20/21 5:02 AM, Fabien COELHO wrote:
>
>> Upon reflection, that amounts to the same thing really, so yeah,
>> scratch that plan.  I'll defer until after that (and then I'll be
>> leaning more towards the revert option).
>
> Sigh. I do not understand anything about the decision processus.


Yes, sometimes it passeth all understanding.

There will certainly be a BETA3, and in every recent year except last
year there has been a BETA4.

If this were core server code threatening data integrity I would be
inclined to be more strict, but after all pg_bench is a utility program,
and I think we can allow a little more latitude.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: pgbench logging broken by time logic changes

From
Michael Paquier
Date:
On Sun, Jun 20, 2021 at 10:15:55AM -0400, Andrew Dunstan wrote:
> If this were core server code threatening data integrity I would be
> inclined to be more strict, but after all pg_bench is a utility program,
> and I think we can allow a little more latitude.

+1.  Let's be flexible here.  It looks better to not rush a fix, and
we still have some time ahead.
--
Michael

Attachment

Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
Bonjour Michaël,

>> If this were core server code threatening data integrity I would be
>> inclined to be more strict, but after all pg_bench is a utility program,
>> and I think we can allow a little more latitude.
>
> +1.  Let's be flexible here.  It looks better to not rush a fix, and
> we still have some time ahead.

Attached an updated v8 patch which adds (reinstate) an improved TAP test 
which would have caught the various regressions on logs.

Given that such tests were removed once before, I'm unsure whether they 
will be acceptable, despite that their usefulness has been clearly 
demonstrated. At least it is for the record. Sigh:-(

-- 
Fabien.
Attachment

Re: pgbench logging broken by time logic changes

From
Michael Paquier
Date:
On Tue, Jun 22, 2021 at 12:06:45PM +0200, Fabien COELHO wrote:
> Attached an updated v8 patch which adds (reinstate) an improved TAP test
> which would have caught the various regressions on logs.

> Given that such tests were removed once before, I'm unsure whether they will
> be acceptable, despite that their usefulness has been clearly demonstrated.
> At least it is for the record. Sigh:-(

Thanks!

This v8 is an addition of the fix for the epoch with the adjustments
for the aggregate reports in the logs.  The maths look rather right
after a read and after some tests.

+# note: this test is time sensitive, and may fail on a very
+#       loaded host.
+# note: --progress-timestamp is not tested
+my $delay = pgbench(
+    '-T 2 -P 1 -l --aggregate-interval=1 -S -b se@2'
+    . ' --rate=20 --latency-limit=1000 -j ' . $nthreads
+    . ' -c 3 -r',
+    0,
+    [   qr{type: multiple},
+        qr{clients: 3},
+        qr{threads: $nthreads},
+        qr{duration: 2 s},
+        qr{script 1: .* select only},
+        qr{script 2: .* select only},
+        qr{statement latencies in milliseconds},
+        qr{FROM pgbench_accounts} ],
+    [ qr{vacuum}, qr{progress: 1\b} ],
+    'pgbench progress', undef,
+    "--log-prefix=$bdir/001_pgbench_log_1");
Could we make that shorter at 1s?  That will shorten the duration of
the test run.  It is easy to miss that this test is for
--aggregate-interval (aka the logAgg() path) without a comment.

+# cool check that we are around 2 seconds
+# The rate may results in an unlucky schedule which triggers
+# an early exit, hence the loose bound.
+ok(0.0 < $delay && $delay < 4.0, "-T 2 run around 2 seconds");

The command itself would not fail, but we would just fail on a machine
where the difference in start/stop time is higher than 4 seconds,
right?  On RPI-level machines, this could fail reliably.  I am not
completely sure what's the additional value we can get from that extra
test, to be honest.

+# $nthreads threads, 2 seconds, but due to timing imprecision we might get
+# only 1 or as many as 3 progress reports per thread.
+check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3,
+    qr{^\d{10,} \d{1,2} \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+$});
+
Now this one is good and actually stable thanks to the fact that we'd
get at least the final logs, and the main complain we got to discuss
about on this thread was the format of the logs.  I would say to give
up on the first test, and keep the second.  But, is this regex
correct?  Shouldn't we check for six integer fields only with the
first one having a minimal number of digits for the epoch?
--
Michael

Attachment

Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
Hello,

> +# note: this test is time sensitive, and may fail on a very
> +#       loaded host.
> +# note: --progress-timestamp is not tested
> +my $delay = pgbench(
> +    '-T 2 -P 1 -l --aggregate-interval=1 -S -b se@2'
> +    . ' --rate=20 --latency-limit=1000 -j ' . $nthreads
> +    . ' -c 3 -r',
> +    0,
> +    [   qr{type: multiple},
> +        qr{clients: 3},
> +        qr{threads: $nthreads},
> +        qr{duration: 2 s},
> +        qr{script 1: .* select only},
> +        qr{script 2: .* select only},
> +        qr{statement latencies in milliseconds},
> +        qr{FROM pgbench_accounts} ],
> +    [ qr{vacuum}, qr{progress: 1\b} ],
> +    'pgbench progress', undef,
> +    "--log-prefix=$bdir/001_pgbench_log_1");

> Could we make that shorter at 1s?  That will shorten the duration of
> the test run.  It is easy to miss that this test is for
> --aggregate-interval (aka the logAgg() path) without a comment.

It is for -T, -P and --aggregate-interval. The units of all these is 
seconds, the minimum is 1, I put 2 so that It is pretty sure to get some 
output. We could try 1, but I'm less confident that an output is ensured 
in all cases on a very slow host which may decide to shorten the run 
before having shown a progress for instance.

> +# cool check that we are around 2 seconds
> +# The rate may results in an unlucky schedule which triggers
> +# an early exit, hence the loose bound.
> +ok(0.0 < $delay && $delay < 4.0, "-T 2 run around 2 seconds");
>
> The command itself would not fail, but we would just fail on a machine
> where the difference in start/stop time is higher than 4 seconds,
> right?

Yep. It could detect a struck pgbench process which was one of the 
reported issue. Maybe there should be a timeout added.

> On RPI-level machines, this could fail reliably.

Dunno, Not sure what RPI means. Probably not "Retail Price Index"… maybe 
Rasberry-Pi? In that case, the O-4 second leeway is intended to be loose 
enough to accomodate such hosts, but I cannot test that.

> I am not completely sure what's the additional value we can get from 
> that extra test, to be honest.

This would be to detect a somehow stuck process. It could be looser if 
necessary. Or removed, or preferably commented out, or enabled with some 
options (eg an environment variable? configure option?). Such control 
could also skip all 3 calls, in which case the 2 seconds is not an issue.

> +# $nthreads threads, 2 seconds, but due to timing imprecision we might get
> +# only 1 or as many as 3 progress reports per thread.
> +check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3,
> +    qr{^\d{10,} \d{1,2} \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+$});
> +
> Now this one is good and actually stable thanks to the fact that we'd
> get at least the final logs, and the main complain we got to discuss
> about on this thread was the format of the logs.

Yep, this test would have probably detected the epoch issue reported by 
Greg.

> I would say to give up on the first test, and keep the second.

You mean remove the time check and keep the log check. I'd like to keep a 
time check, or at least have it in comment so that I can enable it simply.

> But, is this regex correct?  Shouldn't we check for six integer fields 
> only with the first one having a minimal number of digits for the epoch?

Epoch (seconds since 1970-01-01?) is currently 10 digits. Not sure how 
well it would work if some host have another zero start date.

Given the options of the bench run, there are that many fields in the 
log output, I'm not sure why we would want to check for less?

-- 
Fabien.

Re: pgbench logging broken by time logic changes

From
Michael Paquier
Date:
On Wed, Jun 23, 2021 at 08:26:32AM +0200, Fabien COELHO wrote:
>> Could we make that shorter at 1s?  That will shorten the duration of
>> the test run.  It is easy to miss that this test is for
>> --aggregate-interval (aka the logAgg() path) without a comment.
>
> It is for -T, -P and --aggregate-interval. The units of all these is
> seconds, the minimum is 1, I put 2 so that It is pretty sure to get some
> output. We could try 1, but I'm less confident that an output is ensured in
> all cases on a very slow host which may decide to shorten the run before
> having shown a progress for instance.

Could it be possible to document the intention of the test and its
coverage then?  With the current patch, one has to guess what's the
intention behind this case.

>> +# $nthreads threads, 2 seconds, but due to timing imprecision we might get
>> +# only 1 or as many as 3 progress reports per thread.
>> +check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3,
>> +    qr{^\d{10,} \d{1,2} \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+$});
>> +
>> Now this one is good and actually stable thanks to the fact that we'd
>> get at least the final logs, and the main complain we got to discuss
>> about on this thread was the format of the logs.
>
> Yep, this test would have probably detected the epoch issue reported by
> Greg.

(Sorry I missed the use of throttle_delay that would generate 10
fields in the log file)

Hm..  Could it be possible to tighten a bit the regex used here then?
I was playing with it and it is not really picky in terms of patterns
allowed or rejected.  The follow-up checks for check_pgbench_logs()
could be a bit more restrictive as well, but my regex-fu is bad.

>> I would say to give up on the first test, and keep the second.
>
> You mean remove the time check and keep the log check. I'd like to keep a
> time check, or at least have it in comment so that I can enable it simply.

I'd rather avoid tests that tend to fail on slow machines.  There is a
flotilla in the buildfarm.
--
Michael

Attachment

Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
Bonjour Michaël,

> Could it be possible to document the intention of the test and its
> coverage then?  With the current patch, one has to guess what's the
> intention behind this case.

Ok, see attached.

>>> +check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3,
>>> +    qr{^\d{10,} \d{1,2} \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+$});
>
> Hm..  Could it be possible to tighten a bit the regex used here then?

> I was playing with it and it is not really picky in terms of patterns
> allowed or rejected.

> The follow-up checks for check_pgbench_logs() could be a bit more 
> restrictive as well, but my regex-fu is bad.

Given the probabilistic nature of a --rate run and the variability of 
hosts which may run the tests, it is hard to be more specific that \d+ for 
actual performance data. The run may try 0 or 50 transaction within a 
second (both with pretty low probabilities), so the test mostly checks the 
format and some basic sanity on the output and logs.

>>> I would say to give up on the first test, and keep the second.
>>
>> You mean remove the time check and keep the log check. I'd like to keep a
>> time check, or at least have it in comment so that I can enable it simply.
>
> I'd rather avoid tests that tend to fail on slow machines.  There is a
> flotilla in the buildfarm.

Commented out in attached v9.

-- 
Fabien.
Attachment

Re: pgbench logging broken by time logic changes

From
Alvaro Herrera
Date:
On 2021-Jun-23, Fabien COELHO wrote:

> +# cool check that we are around 2 seconds
> +# The rate may results in an unlucky schedule which triggers
> +# an early exit, hence the loose bound.
> +#
> +# THIS TEST IS COMMENTED OUT BUT PLEASE LET IT THERE SO THAT
> +# IT CAN BE ENABLED EASILY.
> +#
> +## ok(1.5 < $delay && $delay < 2.5, "-T 2 run around 2 seconds");

I think you should use Test::More's "skip" for this, perhaps something
like this:

SKIP: {
   skip "This test is unreliable";

   # explain why
   ok(1.5 < $delay && $delay < 2.5, "-T 2 run around 2 seconds");
}

... or, actually, even better would be to use a TODO block, so that the
test is run and reports its status, but if it happens not to succeed it
will not cause the whole test to fail.  That way you'll accumulate some
evidence that may serve to improve the test in the future until it
works fully:

TODO: {
   local $TODO = "Ths test is unreliable";

   ok(1.5 < $delay && $delay < 2.5, "-T 2 run around 2 seconds");
}

-- 
Álvaro Herrera       Valdivia, Chile
"El Maquinismo fue proscrito so pena de cosquilleo hasta la muerte"
(Ijon Tichy en Viajes, Stanislaw Lem)



Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
Ola Álvaro,

> ... or, actually, even better would be to use a TODO block, so that the
> test is run and reports its status, but if it happens not to succeed it
> will not cause the whole test to fail.  That way you'll accumulate some
> evidence that may serve to improve the test in the future until it
> works fully:
>
> TODO: {
>   local $TODO = "Ths test is unreliable";
>
>   ok(1.5 < $delay && $delay < 2.5, "-T 2 run around 2 seconds");
> }

Thanks for the hint! Why not, having the ability to collect data is a good 
thing, so attached v10 does that. If something go wrong, the TODO section 
could be extended around all calls.

-- 
Fabien.
Attachment

Re: pgbench logging broken by time logic changes

From
Michael Paquier
Date:
On Wed, Jun 23, 2021 at 10:01:28PM +0200, Fabien COELHO wrote:
> Thanks for the hint! Why not, having the ability to collect data is a good
> thing, so attached v10 does that. If something go wrong, the TODO section
> could be extended around all calls.

+check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3,
+   qr{^\d{10,} \d{1,2} \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+$});
FWIW, I am still seeing problems with the regex pattern you are using
here, because this fails to detect the number of fields we should have
generated here, for one.  If you are not convinced, just run your new
test and extend or reduce the amount of data generated by logAgg() in
your patch: the tests will still pass.

So I have investigated this stuff in details.  The regular expressions
are correctly shaped, but the use of grep() in check_pgbench_logs()
seems to be incorrect.

For example, let's take an aggregate report generated by your new
test:
"1624498086 13 27632 60597490 1683 2853 3227 883179 120 386 123"
Here are some extra ones, shorter and longer:
"1624498086 13 27632 60597490 1683 2853 3227 8831";
"1624498086 13 27632 60597490 1683 2853 3227 883179 120 386 123 123";

Using grep() with "$re" results in all the fields matching.  Using on
the contrary "/$re/" in grep(), like list_files(), would only match
the first one, which is correct.  Please see attached a small script
to show my point, called perl_grep.pl.

With this issue fixed, I have bumped into what looks like a different
bug in the tests.  001_pgbench_log_2 uses pgbench with 2 clients, but
expects only patterns in the logs where the first column value uses
only 0.  With two clients, those first values can be either 0 or 1 due
to the client ID set.

It seems to me that we had better fix this issue and back-patch where
this has been introduced so as we have exact match checks with the log
formarts, no?  Please see the attached.

Thoughts?
--
Michael

Attachment

Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
Bonjour Michaël,

> Using grep() with "$re" results in all the fields matching.  Using on
> the contrary "/$re/" in grep(), like list_files(), would only match
> the first one, which is correct.

Ok, good catch. Perl is kind of a strange language.

> With this issue fixed, I have bumped into what looks like a different 
> bug in the tests.  001_pgbench_log_2 uses pgbench with 2 clients, but> 
> expects only patterns in the logs where the first column value uses only 
> 0.  With two clients, those first values can be either 0 or 1 due to the 
> client ID set.

Indeed. The tests passes because the number of expected lines is quite

> It seems to me that we had better fix this issue and back-patch where
> this has been introduced so as we have exact match checks with the log
> formarts, no?  Please see the attached.

Ok, however the regex should be "^[01] ...".

Attached v11 with your fixes + the above regex fix.

-- 
Fabien.
Attachment

Re: pgbench logging broken by time logic changes

From
Andrew Dunstan
Date:
On 6/24/21 2:46 AM, Fabien COELHO wrote:
>
> Bonjour Michaël,
>
>> Using grep() with "$re" results in all the fields matching.  Using on
>> the contrary "/$re/" in grep(), like list_files(), would only match
>> the first one, which is correct.
>
> Ok, good catch. Perl is kind of a strange language.


Not really, the explanation is fairly simple:

grep returns the values for which the test is true.

grep ("$re",@values) doesn't perform a regex test against the values, it
tests the truth of "$re" for each value, i.e. it's more or less the same
as  grep (1, @values), which will always returns the whole @values list.

By contrast grep (/$re/, @values) returns those elements of @values that
match the regex.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: pgbench logging broken by time logic changes

From
Michael Paquier
Date:
On Thu, Jun 24, 2021 at 08:03:27AM -0400, Andrew Dunstan wrote:
> On 6/24/21 2:46 AM, Fabien COELHO wrote:
>>> Using grep() with "$re" results in all the fields matching.  Using on
>>> the contrary "/$re/" in grep(), like list_files(), would only match
>>> the first one, which is correct.
>>
>> Ok, good catch. Perl is kind of a strange language.

Okay, I have extracted this part from your patch, and back-patched
this fix down to 11.  The comments were a good addition, so I have
kept them.  I have also made the second regex of check_pgbench_logs()
pickier with the client ID value expected, as it can only be 0.

> By contrast grep (/$re/, @values) returns those elements of @values that
> match the regex.

Thanks for the details here.
--
Michael

Attachment

Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
Bonjour Michaël,

> Okay, I have extracted this part from your patch, and back-patched
> this fix down to 11.  The comments were a good addition, so I have
> kept them.  I have also made the second regex of check_pgbench_logs()
> pickier with the client ID value expected, as it can only be 0.

Attached the remaining part of the patch to fix known issues on pgbench 
logging.

I've added an entry on the open item on the wiki. I'm unsure about who the 
owner should be.

-- 
Fabien.
Attachment

Re: pgbench logging broken by time logic changes

From
Michael Paquier
Date:
On Wed, Jun 30, 2021 at 09:45:47AM +0200, Fabien COELHO wrote:
> Attached the remaining part of the patch to fix known issues on pgbench
> logging.
>
> I've added an entry on the open item on the wiki. I'm unsure about who the
> owner should be.

There is already an item: "Incorrect time maths in pgbench".
--
Michael

Attachment

Re: pgbench logging broken by time logic changes

From
Thomas Munro
Date:
On Wed, Jun 30, 2021 at 8:05 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Wed, Jun 30, 2021 at 09:45:47AM +0200, Fabien COELHO wrote:
> > Attached the remaining part of the patch to fix known issues on pgbench
> > logging.
> >
> > I've added an entry on the open item on the wiki. I'm unsure about who the
> > owner should be.
>
> There is already an item: "Incorrect time maths in pgbench".

Fabien, thanks for the updated patch, I'm looking at it.  I removed
the duplicate item.  More soon.



Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
Hello Thomas,

>>> I've added an entry on the open item on the wiki. I'm unsure about who the
>>> owner should be.
>>
>> There is already an item: "Incorrect time maths in pgbench".

Argh *shoot*, I went over the list too quickly, looking for "log" as a 
keyword.

> Fabien, thanks for the updated patch, I'm looking at it.  I removed
> the duplicate item. More soon.

Thanks. Sorry for the noise.

-- 
Fabien.



Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
>> Fabien, thanks for the updated patch, I'm looking at it.

After looking at it again, here is an update which ensure 64 bits on 
epoch_shift computation.

-- 
Fabien.
Attachment

Re: pgbench logging broken by time logic changes

From
Thomas Munro
Date:
On Wed, Jun 30, 2021 at 9:55 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> >> Fabien, thanks for the updated patch, I'm looking at it.
>
> After looking at it again, here is an update which ensure 64 bits on
> epoch_shift computation.

Hi Fabien,

The code in pgbench 13 aggregates into buckets that begin on the
boundaries of wall clock seconds, because it is triggered by changes
in time_t.  In the current patch, we aggregate data into buckets that
begin on the boundaries of whole seconds since start_time.  Those
boundaries are not aligned with wall clock seconds, and yet we print
out the times rounded to wall clock seconds.

With the following temporary hack:

 static void
 logAgg(FILE *logfile, StatsData *agg)
 {
-       fprintf(logfile, INT64_FORMAT " " INT64_FORMAT " %.0f %.0f %.0f %.0f",
-                       (agg->start_time + epoch_shift) / 1000000,
+       fprintf(logfile, /*INT64_FORMAT*/ "%f " INT64_FORMAT " %.0f
%.0f %.0f %.0f",
+                       (agg->start_time + epoch_shift) / 1000000.0,

... you can see what I mean:

1625115080.840406 325 999256 3197232764 1450 6846

Perhaps we should round the start time of the first aggregate down to
the nearest wall clock second?  That would mean that the first
aggregate misses a part of a second (as it does in pgbench 13), but
all later aggregates begin at the time we write in the log (as it does
in pgbench 13).  That is, if we log 1625115080 we mean "all results >=
1625115080.000000".  It's a small detail, but it could be important
for someone trying to correlate the log with other data.  What do you
think?



Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
Hello Thomas,

>> After looking at it again, here is an update which ensure 64 bits on
>> epoch_shift computation.
>
> The code in pgbench 13 aggregates into buckets that begin on the
> boundaries of wall clock seconds, because it is triggered by changes
> in time_t.  In the current patch, we aggregate data into buckets that
> begin on the boundaries of whole seconds since start_time.  Those
> boundaries are not aligned with wall clock seconds, and yet we print
> out the times rounded to wall clock seconds.

Yes, I noticed this small changed, and did not feel it was an issue at the 
time.

I thought of doing something like the format change your are suggesting. 
However people would like it and it would need to be discussed, hence it 
stayed that way… People have scripts to process log files and do not like 
format changes, basically.

> Perhaps we should round the start time of the first aggregate down to
> the nearest wall clock second?

Yep, but that requires a common start point for all threads. Why not.

> That would mean that the first aggregate misses a part of a second (as 
> it does in pgbench 13), but all later aggregates begin at the time we 
> write in the log (as it does in pgbench 13).  That is, if we log 
> 1625115080 we mean "all results >= 1625115080.000000".  It's a small 
> detail, but it could be important for someone trying to correlate the 
> log with other data.  What do you think?

I think that you are right. The simplest way is to align on whole seconds, 
which is easier than changing the format and have complaints about that, 
or not align and have complaints about the timestamp being rounded.

Attached a v14 in that spirit.

-- 
Fabien.
Attachment

Re: pgbench logging broken by time logic changes

From
Thomas Munro
Date:
On Thu, Jul 1, 2021 at 8:50 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> Attached a v14 in that spirit.

Thanks!  This doesn't seem to address the complaint, though.  Don't
you need to do something like this?  (See also attached.)

+    initStats(&aggs, start - (start + epoch_shift) % 1000000);

That should reproduce what pgbench 13 does implicitly when it uses
time(NULL).  Namely, it rewinds to the start of the current *wall
clock* second, so that all future aggregates also start at round
number wall clock times, at the cost of making the first aggregate
miss out on a fraction of a second.

I wonder if some of the confusion on the other thread about the final
aggregate[1] was due to this difference.  By rounding down, we get a
"head start" (because the first aggregate is short), so we usually
manage to record the expected number of aggregates before time runs
out.  It's a race though.  Your non-rounding version was more likely
to lose the race and finish before the final expected aggregate was
logged, so you added code to force a final aggregate to be logged.  Do
I have this right?  I'm not entirely sure how useful a partial final
aggregate is (it's probably one you have to throw away, like the first
one, no?  Isn't it better if we only have to throw away the first
one?).  I'm not sure, but if we keep that change, a couple of very
minor nits:  I found the "tx" parameter name a little confusing.  Do
you think it's clearer if we change it to "final" (with inverted
sense)?  For the final aggregate, shouldn't we call doLog() only if
agg->cnt > 0?

I think I'd be inclined to take that change back out though, making
this patch very small and net behaviour like pgbench 13, if you agree
with my explanation for why you had to add it and why it's not
actually necessary with the fixed rounding shown above.  (And perhaps
in v15 we might consider other ideas like using hi-res times in the
log and not rounding, etc, a topic for later.)

I don't really see the value in the test that checks that $delay falls
in the range 1.5s - 2.5s and then ignores the result.  If it hangs
forever, we'll find out about it, and otherwise no human or machine
will ever care about that test.  I removed it from this version.  Were
you really attached to it?

I made some very minor language tweaks in comments (we don't usually
shorten "benchmark" to "bench" in English, "series" keeps the -s in
singular (blame the Romans), etc).

I think we should make it clear when we mean the *Unix* epoch (a
comment "switch to epoch" isn't meaningful on its own, to me at
least), so I changed that in a few places.

[1] https://www.postgresql.org/message-id/alpine.DEB.2.22.394.2106102323310.3698412%40pseudo

Attachment

Re: pgbench logging broken by time logic changes

From
Hannu Krosing
Date:
On Thu, Jun 17, 2021 at 7:18 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> I'm not sure we have transaction lasts for very short time that
> nanoseconds matters.
>

Nanoseconds may not matter yet, but they could be handy when for
example we want to determine the order of parallel query executions.

We are less than an order of magnitude away from being able to do 1M
inserts/updates/deletes per second, so microseconds already are not
always 100% reliable.

We could possibly move to using LSNs fetched as part of the queries
for this case, but this will surely introduce more problems than it
solves :)

Cheers
-----
Hannu Krosing
Google Cloud - We have a long list of planned contributions and we are hiring.
Contact me if interested.



Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
Hello Thomas,

> Thanks!  This doesn't seem to address the complaint, though.  Don't
> you need to do something like this?  (See also attached.)
>
> +    initStats(&aggs, start - (start + epoch_shift) % 1000000);

ISTM that this is: (start + epoch_shift) / 1000000 * 1000000

> That should reproduce what pgbench 13 does implicitly when it uses
> time(NULL).

I understand that you are shifting the aggregate internal start time to 
epoch, however ISTM that other points in the program are not shifted 
consistently with this, eg the while comparison in doLog? Also if start 
time is log shifted, then it should not be shifted again when printed (in 
logAgg). Attached version tries to be consistent.

> Namely, it rewinds to the start of the current *wall clock* second, so 
> that all future aggregates also start at round number wall clock times, 
> at the cost of making the first aggregate miss out on a fraction of a 
> second.

ISTM that it was already wall clock time, but not epoch wall clock.
I'm okay with realigning aggregates on full seconds.

> I wonder if some of the confusion on the other thread about the final
> aggregate[1] was due to this difference.

Dunno. The parallel execution with thread is a pain when handling details.

> By rounding down, we get a "head start" (because the first aggregate is 
> short), so we usually manage to record the expected number of aggregates 
> before time runs out.

Fine with me if everything is consistent.

> It's a race though.  Your non-rounding version was more likely
> to lose the race and finish before the final expected aggregate was
> logged, so you added code to force a final aggregate to be logged.

ISTM that we always want to force because some modes can have low tps, and 
the aggregates should be "full".

> Do I have this right?  I'm not entirely sure how useful a partial final 
> aggregate is

If you ask for 10 seconds run with 1 aggregate per second, you expect to 
see (at least, about) 10 lines, and I want to ensure that, otherwise 
people will ask questions, tools will have to look for special cases, 
missing rows, whatever, and it will be a pain there. We want to produce 
something simple, consistent, reliable, that tools can depend on.

> (it's probably one you have to throw away, like the first one, no? 
> Isn't it better if we only have to throw away the first one?).

This should be the user decision to drop it or not, not the tool producing 
it, IMO.

> I'm not sure, but if we keep that change, a couple of very minor nits: 
> I found the "tx" parameter name a little confusing.  Do you think it's 
> clearer if we change it to "final" (with inverted sense)?

I agree that tx is not a very good name, but the inversion does not look 
right to me. The "normal" behavior is

> For the final aggregate, shouldn't we call doLog() only if agg->cnt > 0?

No, I think that we should want to have all aggregates, even with zeros, 
so that the user can expect a deterministic number of lines.

> I think I'd be inclined to take that change back out though, making this 
> patch very small and net behaviour like pgbench 13, if you agree with my 
> explanation for why you had to add it and why it's not actually 
> necessary with the fixed rounding shown above.  (And perhaps in v15 we 
> might consider other ideas like using hi-res times in the log and not 
> rounding, etc, a topic for later.)

I think that I'm moslty okay.

> I don't really see the value in the test that checks that $delay falls
> in the range 1.5s - 2.5s and then ignores the result.  If it hangs
> forever, we'll find out about it, and otherwise no human or machine
> will ever care about that test.  I removed it from this version.  Were
> you really attached to it?

YES, REALLY! It would just have caught quite a few of the regressions we 
are trying to address here. I want it there even if ignored because I'll 
look for it to avoid regressions in the future. If the test is actually 
removed, recreating it is a pain. If you really want to disactivate it, 
use if(0) but PLEASE let it there so that it can ne reactivated for tests 
very simply, not bad maintaining some test outside of the tree.

Also, if farm logs show that it is okay on all animals, it can be switched 
on by removing the ignore trick.

> I made some very minor language tweaks in comments (we don't usually
> shorten "benchmark" to "bench" in English, "series" keeps the -s in
> singular (blame the Romans), etc).

Thanks! My English is kind of fuzzy in the details:-)

> I think we should make it clear when we mean the *Unix* epoch (a
> comment "switch to epoch" isn't meaningful on its own, to me at
> least), so I changed that in a few places.

Ok.

Attached v16:
  - tries to be consistent wrt epoch & aggregates, aligning to Unix epoch
    as you suggested.
  - renames tx as accumulate, but does not invert it.
  - always shows aggregates so that the user can depend on the output,
    even if stats are zero, because ISTM that clever must be avoided.
  - put tests back, even if ignored, because I really want them available
    easily.

When/if you get to commit this patch, eventually, do not forget that I'm 
pushing forward fixes contributed by others, including Kyotaro Horiguchi 
and Yugo Nagata.

-- 
Fabien.
Attachment

Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
Hello Hannu,

>> I'm not sure we have transaction lasts for very short time that
>> nanoseconds matters.
>
> Nanoseconds may not matter yet, but they could be handy when for
> example we want to determine the order of parallel query executions.
>
> We are less than an order of magnitude away from being able to do 1M
> inserts/updates/deletes per second, so microseconds already are not
> always 100% reliable.

ISTM that 1M tps would be with really a lot of parallel clients, thus the 
latency of each would be quite measurable, so that µs would still make 
sense for measuring their performance? If an actual network is involved, 
the network latency is already 100-200 µs even before executing any code.

-- 
Fabien.

Re: pgbench logging broken by time logic changes

From
Thomas Munro
Date:
On Fri, Jul 9, 2021 at 5:15 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> > Thanks!  This doesn't seem to address the complaint, though.  Don't
> > you need to do something like this?  (See also attached.)
> >
> > +    initStats(&aggs, start - (start + epoch_shift) % 1000000);
>
> ISTM that this is: (start + epoch_shift) / 1000000 * 1000000

Sure, it's equivalent if you also change doLog() to match the change
in epoch, as your v16 did.  Your version seems fine to me.  The key
thing is that we only start new aggregates when the Unix epoch time
crosses over a XXX.000000 boundary, since we're only logging the XXX
part.  That's much like(*) pgbench13, and I'm happy however you want
to code that.  Case closed on that point.  Thanks!

> > Isn't it better if we only have to throw away the first one?).
>
> This should be the user decision to drop it or not, not the tool producing
> it, IMO.

Let me try this complaint again.  It's admittedly a minor detail, but
my goal in this thread is to match pgbench13's treatment of aggregate
boundaries and partial aggregates, so that we can close the open item
for 14 with a minimal fix that doesn't make any unnecessary changes.
Discussions about improvements or pre-existing problems can wait.

First, let me demonstrate that pgbench13 throws away final partial
aggregates.  I hacked REL_13_STABLE like so:

                if (agg_interval > 0)
                {
                        /* log aggregated but not yet reported transactions */
+fprintf(thread->logfile, "XXX log aggregated but not yet reported
transactions: aggs.cnt = %ld\n", aggs.cnt);
                        doLog(thread, state, &aggs, false, 0, 0);
                }
                fclose(thread->logfile);

I ran pgbench -T6 --aggregate-interval 1 -l -S postgres, and it
produced a log file containing:

===BOF===
1625782245 7974 428574 24998642 49 683
1625782246 19165 998755 53217669 49 310
1625782247 19657 998868 51577774 47 189
1625782248 19707 998898 51660408 47 189
1625782249 19969 998867 50454999 48 156
1625782250 19845 998877 51071013 47 210
XXX log aggregated but not yet reported transactions: aggs.cnt = 10988
===EOF===

We can see three interesting things:

1.  The first aggregate is partial (only ~7k transactions, because it
started partway through a second).  Users have to throw away that
first aggregate because its statistics are noise.  That is the price
to pay to have later aggregates start at the time they say.

2.  We get 5 more full aggregates (~19k transactions).  That's a total
of 6 aggregates, which makes intuitive sense with -T6.

3.  At the end, the extra call to doLog() did nothing, and yet cnt =
10988.  That's data we're throwing away, because Unix epoch time has
not advanced far enough to reach a new aggregate start time (it's not
impossible, but probability is very low).  Checking the commit log, I
see that the code that claims to log the final aggregate came from
commit b6037664960 (2016); apparently it doesn't do what the comments
seem to think it does (did that ever work?  Should perhaps be cleaned
up, but separately, it's not an open item for 14).

Now, in your recent patches you force that final partial aggregate to
be logged in that case with that new flag mechanism, as we can see:

===BOF===
1625783726 11823 609143 32170321 48 549
1625783727 19530 998995 52383115 47 210
1625783728 19468 999026 52208898 46 181
1625783729 19826 999001 51238427 46 185
1625783730 19195 999110 52841674 49 172
1625783731 18572 998992 56028876 48 318
1625783732 7484 388620 20951100 48 316
===EOF===

1.  We get a partial initial aggregate just like 13.  That sacrificial
aggregate helps us synchronize the rest of the aggregates with the
logged timestamps.  Good.

2.  We get 5 full aggregates (~19k transactions) just like 13.  As in
13, that's quite likely, because the first one was "short" so we
almost always reach the end of the 6th one before -T6 runs out of
sand.  Good.

3.  We get a new partial aggregate at the end.  Users would have to
throw that one away too.  This is not a big deal, but it's a change in
behaviour that should be discussed.

Given that that last one is a waste of pixels and a (so far)
unjustified change in behaviour, I propose, this time with a little
more force and an updated patch, that we abandon that part of the
change.  I submit that you only added that because your earlier
patches didn't have the partial aggregate at the start, so then it
often didn't produce the 6th line of output.  So, you forced it to log
whatever it had left, even though the full time hadn't elapsed yet.
Now we don't need that.

The patch and resulting code are simpler, and the user experience matches 13.

See attached.

> > I don't really see the value in the test that checks that $delay falls
> > in the range 1.5s - 2.5s and then ignores the result.  If it hangs
> > forever, we'll find out about it, and otherwise no human or machine
> > will ever care about that test.  I removed it from this version.  Were
> > you really attached to it?
>
> YES, REALLY! It would just have caught quite a few of the regressions we
> are trying to address here. I want it there even if ignored because I'll
> look for it to avoid regressions in the future. If the test is actually
> removed, recreating it is a pain. If you really want to disactivate it,
> use if(0) but PLEASE let it there so that it can ne reactivated for tests
> very simply, not bad maintaining some test outside of the tree.

Ok, you win :-)

> When/if you get to commit this patch, eventually, do not forget that I'm
> pushing forward fixes contributed by others, including Kyotaro Horiguchi
> and Yugo Nagata.

Fixed, thanks.

* I say "much like" and not "exactly like"; of course there may be a
subtle difference if ntpd adjusts your clock while a benchmark is
running.  Something must give, and 14's coding prefers to keep the
duration of aggregates stable at exactly X seconds according to the
high precision timer, so that the statistics it reports are
meaningful, but 13 prefers to keep the timestamps it logs in sync with
other software using gettimeofday() and will give you a weird short or
long aggregate to achieve that (producing bad statistics).  I can see
arguments for both but I'm OK with that change and I see that it is in
line with your general goal of switching to modern accurate time
interfaces.

Attachment

Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
Hello Thomas,

>>> Isn't it better if we only have to throw away the first one?).
>>
>> This should be the user decision to drop it or not, not the tool 
>> producing it, IMO.
>
> Let me try this complaint again. [...]

I understand your point.

For me removing silently the last bucket is not right because the sum of 
aggregates does not match the final grand total. This change is 
intentional and borders on a bug fix, which is what this patch was doing, 
even if it is also a small behavioral change: We should want the detailed 
and final reports in agreement.

I do agree that users should probably ignore the first and last lines.

> See attached.

Works for me: patch applies, global and local check ok. I'm fine with it.

If it was me, I'd still show the last bucket, but it does not matter much.

-- 
Fabien.



Re: pgbench logging broken by time logic changes

From
Thomas Munro
Date:
On Fri, Jul 9, 2021 at 5:14 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> Works for me: patch applies, global and local check ok. I'm fine with it.

I hoped we were done here but I realised that your check for 1-3 log
lines will not survive the harsh environment of the build farm.
Adding sleep(2) before the final doLog() confirms that.  I had two
ideas:

1.  Give up and expect 1-180 lines.  (180s is the current timeout
tolerance used elsewhere to deal with
swamped/valgrind/swapping/microsd computers, after a few rounds of
inflation, so if you need an arbitrary large number to avoid buildfarm
measles that's my suggestion....)
2.  Change doLog() to give up after end_time.  But then I think you'd
need to make it 0-3 :-(

I think the logging could be tightened up to work the way you expected
in future work, though.  Perhaps we could change all logging to work
with transaction start time instead of log-writing time, which doLog()
should receive.  If you never start a transaction after end_time, then
there can never be an aggregate that begins after that, and the whole
thing becomes more deterministic.  That kind of alignment of aggregate
timing with whole-run timing could also get rid of those partial
aggregates completely.  But that's an idea for 15.

So I think we should do 1 for now.  Objections or better ideas?



Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
>> Works for me: patch applies, global and local check ok. I'm fine with it.
>
> I hoped we were done here but I realised that your check for 1-3 log
> lines will not survive the harsh environment of the build farm.
> Adding sleep(2) before the final doLog() confirms that.  I had two
> ideas:
>
> 1.  Give up and expect 1-180 lines.  (180s is the current timeout
> tolerance used elsewhere to deal with
> swamped/valgrind/swapping/microsd computers, after a few rounds of
> inflation, so if you need an arbitrary large number to avoid buildfarm
> measles that's my suggestion....)
> 2.  Change doLog() to give up after end_time.  But then I think you'd
> need to make it 0-3 :-(
>
> I think the logging could be tightened up to work the way you expected
> in future work, though.  Perhaps we could change all logging to work
> with transaction start time instead of log-writing time, which doLog()
> should receive.  If you never start a transaction after end_time, then
> there can never be an aggregate that begins after that, and the whole
> thing becomes more deterministic.  That kind of alignment of aggregate
> timing with whole-run timing could also get rid of those partial
> aggregates completely.  But that's an idea for 15.
>
> So I think we should do 1 for now.  Objections or better ideas?

At least, we now that it is too much.

What about moving the test as is in the TODO section with a comment, next 
to the other one, for now?

I hesitated to suggest that before for the above risks, but I was very 
naively optimistic that it may pass because the test is not that too 
demanding.

Someone suggested to have a "real-time" configure switch to enable/disable 
time-sensitive tests.

-- 
Fabien.



Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
Hello again,

>> I hoped we were done here but I realised that your check for 1-3 log
>> lines will not survive the harsh environment of the build farm.
>> Adding sleep(2) before the final doLog() confirms that.  I had two
>> ideas:

>> So I think we should do 1 for now.  Objections or better ideas?
>
> At least, we now that it is too much.

I misread your point. You think that it should fail, but it is not
tried yet. I'm rather optimistic that it should not fail, but I'm okay 
with averting the risk anyway.

> What about moving the test as is in the TODO section with a comment, next to 
> the other one, for now?

I stand by this solution which should allow to get some data from the 
field, as v18 attached. If all is green then the TODO could be removed 
later.

-- 
Fabien.
Attachment

Re: pgbench logging broken by time logic changes

From
Thomas Munro
Date:
Hi Fabien,

I committed the code change without the new TAP tests, because I
didn't want to leave the open item hanging any longer.  As for the
test, ...

On Sat, Jul 10, 2021 at 9:36 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> >> I hoped we were done here but I realised that your check for 1-3 log
> >> lines will not survive the harsh environment of the build farm.
> >> Adding sleep(2) before the final doLog() confirms that.  I had two
> >> ideas:

> I misread your point. You think that it should fail, but it is not
> tried yet. I'm rather optimistic that it should not fail, but I'm okay
> with averting the risk anyway.

... I know it can fail, and your v18 didn't fix that, because...

+check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3,

                                                              ^
                                                              |

                                              ... this range can be exceeded.

That's because extra aggregations are created based on doLog() reading
the clock after a transaction is finished, entirely independently of
the -T mechanism deciding when to stop the benchmark, and potentially
many seconds later in adverse conditions.  As I mentioned, you can see
it fail with your own eyes if you hack the code like so:

        if (agg_interval > 0)
        {
+           /*
+            * XXX: simulate an overloaded raspberry pi swapping to a microsd
+            * card or other random delays as we can expect in the build farm
+            */
+           sleep(3);
            /* log aggregated but not yet reported transactions */
            doLog(thread, state, &aggs, false, 0, 0);
        }

> I stand by this solution which should allow to get some data from the
> field, as v18 attached. If all is green then the TODO could be removed
> later.

I suspect the number of aggregates could be made deterministic, as I
described in an earlier message.  What do you think about doing
something like that first for the next release, before trying to add
assertions about the number of aggregates?  I'm with you on the
importance of testing, but it seems better to start by making the
thing more testable.



Re: pgbench logging broken by time logic changes

From
Fabien COELHO
Date:
Hello Thomas,

> I committed the code change without the new TAP tests, because I
> didn't want to leave the open item hanging any longer.

Ok. Good.

>  As for the test, ... [...]

Argh, so there are no tests that would have caught the regressions:-(

> ... I know it can fail, and your v18 didn't fix that, because...
>
> +check_pgbench_logs($bdir, '001_pgbench_log_1', $nthreads, 1, 3,
>                                              ... this range can be exceeded.

Indeed. I meant to move that one in the TODO section as well, not just the 
previous call, so that all time-sensitive tests are fully ignored but 
reported, which would be enough for me.

> I suspect the number of aggregates could be made deterministic, as I
> described in an earlier message.  What do you think about doing
> something like that first for the next release, before trying to add
> assertions about the number of aggregates?

I think that last time I did something to get more deterministic results 
in pgbench, which involved a few lines of hocus-pocus in pgbench, the 
patch got rejected:-)

An "ignored" tests looked like a good compromise to check how things are 
going in the farm and to be able to check for more non regressions when 
developing pgbench, without introducing behavioral changes.

> I'm with you on the importance of testing, but it seems better to start 
> by making the thing more testable.

I'm used to my test patches being rejected, including modifying pgbench 
behavior to make it more testable. Am I mad enough to retry? Maybe, maybe 
not.

Attached the fully "ignored" version of the time features test as a patch.

-- 
Fabien.
Attachment

Re: pgbench logging broken by time logic changes

From
Daniel Gustafsson
Date:
> On 11 Jul 2021, at 15:07, Fabien COELHO <coelho@cri.ensmp.fr> wrote:

> Attached the fully "ignored" version of the time features test as a patch.

This version of the patch is failing to apply on top of HEAD, can you please
submit a rebased version?

--
Daniel Gustafsson        https://vmware.com/




Re: pgbench logging broken by time logic changes

From
Daniel Gustafsson
Date:
> On 4 Nov 2021, at 13:38, Daniel Gustafsson <daniel@yesql.se> wrote:
>
>> On 11 Jul 2021, at 15:07, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>> Attached the fully "ignored" version of the time features test as a patch.
>
> This version of the patch is failing to apply on top of HEAD, can you please
> submit a rebased version?

I'm marking this patch Returned with Feedback, please feel free to resubmit
this when there is an updated version of the patch available.

--
Daniel Gustafsson        https://vmware.com/