Thread: [HACKERS] Fixing pgbench's logging of transaction timestamps

[HACKERS] Fixing pgbench's logging of transaction timestamps

From
Tom Lane
Date:
pgbench's -l option is coded using inappropriate familiarity with the
contents of struct instr_time.  I recall complaining about that when
the code went in, but to little avail.  However, it needs to be fixed
if we're to switch over to using clock_gettime() as discussed in
the gettimeofday thread,
https://www.postgresql.org/message-id/17524.1483063359@sss.pgh.pa.us
And really it's unacceptably bad code even without that consideration,
because it doesn't work on Windows.

There are at least three ways we could fix it:

1. Switch over to printing the timestamp in the form of elapsed seconds
since the pgbench run start, as in the attached draft patch (which is
code-complete but lacks necessary documentation changes).  You could make
an argument that this is a better definition than what's there: in most
situations, people are going to want the elapsed time, and right now they
have to do painful manual arithmetic to get it.  About the only reason
I can see for liking the current definition is that it makes it possible
to correlate the pgbench log with external events, and I'm not sure
whether that's especially useful.

2. Have pgbench save both the INSTR_TIME_SET_CURRENT() and gettimeofday()
results for the run start instant.  In doLog(), compute the elapsed time
from run start much as in the attached patch, but then add it to the saved
gettimeofday() result and print that in the existing format.  This would
preserve the existing output format at the cost of a very small amount
of extra arithmetic per log line.  However, it's got a nasty problem if
we use clock_gettime(CLOCK_MONOTONIC) in instr_time, which I think would
be the typical case.  To the extent that CLOCK_MONOTONIC diverges from
CLOCK_REALTIME, which it would in case of external adjustments to the
system clock, the printed timestamps would no longer match the actual
system clock, which destroys the argument that you could correlate the
pgbench log with other events.  (I imagine the same problem would apply
in the Windows implementation.)

3. Forget about using the instr_time result and just have doLog() execute
gettimeofday() to obtain the timestamp to print.  This is kind of
conceptually ugly, but realistically the added overhead is probably
insignificant.  A larger objection might be that on Windows, the result
of gettimeofday() isn't very high precision ... but it'd still be a huge
improvement over the non-answer you get now.

I'm inclined to think that #2 isn't a very good choice; it appears to
preserve the current behavior but really doesn't.  So we should either
change the behavior as in #1 or expend an extra system call as in #3.
Preferences?

            regards, tom lane

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index fbb0c2b..ca75900 100644
*** a/src/bin/pgbench/pgbench.c
--- b/src/bin/pgbench/pgbench.c
*************** static int64 total_weight = 0;
*** 398,403 ****
--- 398,405 ----

  static int    debug = 0;            /* debug flag */

+ static instr_time start_time;    /* overall run start time */
+
  /* Builtin test scripts */
  typedef struct BuiltinScript
  {
*************** doLog(TState *thread, CState *st, instr_
*** 2483,2509 ****
      else
      {
          /* no, print raw transactions */
! #ifndef WIN32

!         /* This is more than we really ought to know about instr_time */
          if (skipped)
!             fprintf(logfile, "%d " INT64_FORMAT " skipped %d %ld %ld",
                      st->id, st->cnt, st->use_file,
!                     (long) now->tv_sec, (long) now->tv_usec);
          else
!             fprintf(logfile, "%d " INT64_FORMAT " %.0f %d %ld %ld",
                      st->id, st->cnt, latency, st->use_file,
!                     (long) now->tv_sec, (long) now->tv_usec);
! #else
!
!         /* On Windows, instr_time doesn't provide a timestamp anyway */
!         if (skipped)
!             fprintf(logfile, "%d " INT64_FORMAT " skipped %d 0 0",
!                     st->id, st->cnt, st->use_file);
!         else
!             fprintf(logfile, "%d " INT64_FORMAT " %.0f %d 0 0",
!                     st->id, st->cnt, latency, st->use_file);
! #endif
          if (throttle_delay)
              fprintf(logfile, " %.0f", lag);
          fputc('\n', logfile);
--- 2485,2501 ----
      else
      {
          /* no, print raw transactions */
!         instr_time    elapsed = *now;

!         INSTR_TIME_SUBTRACT(elapsed, start_time);
          if (skipped)
!             fprintf(logfile, "%d " INT64_FORMAT " skipped %d %.6f",
                      st->id, st->cnt, st->use_file,
!                     INSTR_TIME_GET_DOUBLE(elapsed));
          else
!             fprintf(logfile, "%d " INT64_FORMAT " %.0f %d %.6f",
                      st->id, st->cnt, latency, st->use_file,
!                     INSTR_TIME_GET_DOUBLE(elapsed));
          if (throttle_delay)
              fprintf(logfile, " %.0f", lag);
          fputc('\n', logfile);
*************** main(int argc, char **argv)
*** 3664,3670 ****
      CState       *state;            /* status of clients */
      TState       *threads;        /* array of thread */

-     instr_time    start_time;        /* start up time */
      instr_time    total_time;
      instr_time    conn_total_time;
      int64        latency_late = 0;
--- 3656,3661 ----

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixing pgbench's logging of transaction timestamps

From
Fabien COELHO
Date:
Hello and happy new year,

> -l [...]

My 0.02€:

> There are at least three ways we could fix it:
>
> 1. Switch over to printing the timestamp in the form of elapsed seconds 
> since the pgbench run start, [...] About the only reason I can see for 
> liking the current definition is that it makes it possible to correlate 
> the pgbench log with external events, and I'm not sure whether that's 
> especially useful.

I have found that the ability to correlate different logs, esp. pgbench 
and postgres, is "sometimes" useful.

> 2. Have pgbench save both the INSTR_TIME_SET_CURRENT() and 
> gettimeofday() results for the run start instant. However, it's got a 
> nasty problem [...]

I'm not sure how wide the problem would be. I would not expect the 
correlation to be perfect, as there are various protocol/client overheads 
here and there anyway.

> 3. Forget about using the instr_time result and just have doLog() execute
> gettimeofday() to obtain the timestamp to print.  This is kind of
> conceptually ugly, but realistically the added overhead is probably
> insignificant.  A larger objection might be that on Windows, the result
> of gettimeofday() isn't very high precision ... but it'd still be a huge
> improvement over the non-answer you get now.

Yep.

> I'm inclined to think that #2 isn't a very good choice; it appears to
> preserve the current behavior but really doesn't.  So we should either
> change the behavior as in #1 or expend an extra system call as in #3.
> Preferences?

Marginal preference for #3 for KIS? Otherwise any three options seems 
better than the current status.

-- 
Fabien.

Re: [HACKERS] Fixing pgbench's logging of transaction timestamps

From
Tom Lane
Date:
Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> 3. Forget about using the instr_time result and just have doLog() execute
>> gettimeofday() to obtain the timestamp to print.  This is kind of
>> conceptually ugly, but realistically the added overhead is probably
>> insignificant.  A larger objection might be that on Windows, the result
>> of gettimeofday() isn't very high precision ... but it'd still be a huge
>> improvement over the non-answer you get now.

> Yep.

>> I'm inclined to think that #2 isn't a very good choice; it appears to
>> preserve the current behavior but really doesn't.  So we should either
>> change the behavior as in #1 or expend an extra system call as in #3.
>> Preferences?

> Marginal preference for #3 for KIS? Otherwise any three options seems 
> better than the current status.

OK, done that way.

BTW, why is it that the --aggregate-interval option is unsupported on
Windows?  Is that an artifact of the same disease of assuming too much
about how instr_time is represented?  I don't see any very good reason
for it other than the weird decision to store the result of
INSTR_TIME_GET_DOUBLE in a "long", which seems rather broken in any case.
        regards, tom lane



Re: [HACKERS] Fixing pgbench's logging of transaction timestamps

From
Tom Lane
Date:
I wrote:
> BTW, why is it that the --aggregate-interval option is unsupported on
> Windows?  Is that an artifact of the same disease of assuming too much
> about how instr_time is represented?  I don't see any very good reason
> for it other than the weird decision to store the result of
> INSTR_TIME_GET_DOUBLE in a "long", which seems rather broken in any case.

After looking closer, I see the real issue is that it prints the integer
part of INSTR_TIME_GET_DOUBLE and documents that as being a Unix
timestamp.  So that's not going to do either.  I solved it the same way
as in the other code path, ie just eat the cost of doing our own time
inquiry.
        regards, tom lane