Thread: Patch to show individual statement latencies in pgbench output

Patch to show individual statement latencies in pgbench output

From

Florian Pflug

Date:

12 June 2010, 12:07:39

Hi

To be able to asses the performance characteristics of the different wal-related options, I patched pgbench to show the
averagelatency of each individual statement. The idea is to be able to compare the latency of the COMMIT with the ones
ofthe other statements. 

This patch adds two new finds to the Command struct, elapsed and cnt, which accumulate the total time spent on each
statementand the number of times it was executed. printResults essentially prings elapsed/cnt for each statement,
followedby the statement itself. 

Patch is attached.

best regards,
Florian Pflug

Attachment

pgbench_statementlatency.patch

Re: Patch to show individual statement latencies in pgbench output

From

Greg Smith

Date:

13 June 2010, 21:22:47

Florian Pflug wrote:
> To be able to asses the performance characteristics of the different wal-related options, I patched pgbench to show
theaverage latency of each individual statement. The idea is to be able to compare the latency of the COMMIT with the
onesof the other statements.

>   

That's an interesting idea, particularly given that people don't really 
understand where the time is going in the standard pgbench test.  Your 
first bit of review feedback is that this would have to be something you 
could toggle on and off, there's no way most people want to pay this 
penalty.  If you submit a new patch with a command line option to enable 
this alternate logging format and add the result to 
https://commitfest.postgresql.org/action/commitfest_view?id=6 , you can 
put my name down as a reviewer and I'll take a deeper look at it as part 
of that.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: Patch to show individual statement latencies in pgbench output

From

Florian Pflug

Date:

16 June 2010, 15:59:07

On Jun 14, 2010, at 2:22 , Greg Smith wrote:
> Florian Pflug wrote:
>> To be able to asses the performance characteristics of the different wal-related options, I patched pgbench to show
theaverage latency of each individual statement. The idea is to be able to compare the latency of the COMMIT with the
onesof the other statements. 
>
> That's an interesting idea, particularly given that people don't really understand where the time is going in the
standardpgbench test.  Your first bit of review feedback is that this would have to be something you could toggle on
andoff, there's no way most people want to pay this penalty.  If you submit a new patch with a command line option to
enablethis alternate logging format and add the result to https://commitfest.postgresql.org/action/commitfest_view?id=6
,you can put my name down as a reviewer and I'll take a deeper look at it as part of that. 

Thank for offering to review this!

Here is an updated patch that adds a command-line option (-r) to enable/disable per-command latency reporting. It also
usesthe INSTR_TIME infrastructure that 9.0 provides, and should work correctly in multi-threaded mode. Data is
collectedper-thread and summarized only for reporting to avoid locking overhead. It now shows all lines for the SQL
scriptstogether with their latencies (zero for comments), not only those containing SQL commands. 

I'll add this patch to the next commitfest, and put you down as a reviewer, as you suggested.

best regards,
Florian Pflug

Attachment

pgbench_statementlatency.patch

Re: Patch to show individual statement latencies in pgbench output

From

Greg Smith

Date:

28 July 2010, 19:49:19

Finally got around to taking a longer look at your patch, sorry about 
the delay here. Patch itself seems to work on simple tests anyway (more 
on the one suspect bit below). You didn't show what the output looks 
like, so let's start with that because it is both kind of neat and not 
what I expected from your description. Here's the sort of extra stuff 
added to the end of the standard output when you toggle this feature on:

$ pgbench -S pgbench -T 10 -c 8 -j 4 -l -r
...
tps = 28824.943954 (excluding connections establishing)
command latencies
0.001487 \set naccounts 100000 * :scale
0.000677 \setrandom aid 1 :naccounts
0.273983 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
From the way you described the patch, I had thought that you were just 
putting this information into the log files or something like that. In 
fact, it's not in the log files; it just shows up in this summary at the 
end. I'm not sure if that's best or not. Some might want to see how the 
per-statement variation varies over time. Sort of torn on whether the 
summary alone is enough detail or not. Let me play with this some more 
and get back to you on that.

Here's what a standard test looks like:

tps = 292.468349 (excluding connections establishing)
command latencies
0.002120 \set nbranches 1 * :scale
0.000682 \set ntellers 10 * :scale
0.000615 \set naccounts 100000 * :scale
0.000723 \setrandom aid 1 :naccounts
0.000522 \setrandom bid 1 :nbranches
0.000553 \setrandom tid 1 :ntellers
0.000525 \setrandom delta -5000 5000
0.070307 BEGIN;
1.721631 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE 
aid = :aid;
0.147854 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
11.894366 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE 
tid = :tid;
4.761715 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE 
bid = :bid;
0.643895 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) 
VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
7.452017 END;

I'm happy to see that the END here has a significant portion of time 
assigned to it, which means that it's correctly tracking the commit that 
happens there. It's possible to ask the question "why add this feature 
when pg_stat_statements will get you the same data?". I think this gives 
a different enough view of things that it's worth adding anyway. Getting 
the END statements tracked, not having to use prepared statements to 
make it work, and now having to worry about overflowing the 
pg_stat_statements.max parameter that limits what that module tracks are 
all points in favor of this patch being useful even if you know about 
pg_stat_statements.

Now onto the nitpicking. With the standard Ubuntu compiler warnings on I 
get this:

pgbench.c:1588: warning: ‘i’ may be used uninitialized in this function

If you didn't see that, you may want to double-check how verbose you 
have your compiler setup to be; maybe you just missed it (which is of 
course what reviewers are here for). Regardless, the troublesome bit is 
this:

int i;

commands = process_commands(&buf[i]);

Which is obviously not a good thing. I'm not sure entirely what you're 
doing with the changes you made to process_file, but I'd suggest you 
double check the logic and coding of that section because there's at 
least one funny thing going on here with i being used without 
initialization first here. I didn't try yet to figure out how this error 
might lead to a bug, but there probably is one.

This looks like a good feature to me, just not sure if it's worth 
extending to produce even more output if people want to see it. Can 
always layer that on top later. I'll continue testing and try to get a 
firmer opinion. Please take a look at the problem I pointed out and 
produce a new patch when you get a chance that fixes that part, so at 
least we don't get stuck on that detail.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: Patch to show individual statement latencies in pgbench output

From

Florian Pflug

Date:

28 July 2010, 22:24:13

On Jul29, 2010, at 00:48 , Greg Smith wrote:
> Finally got around to taking a longer look at your patch, sorry about the delay here. Patch itself seems to work on
simpletests anyway (more on the one suspect bit below). You didn't show what the output looks like, so let's start with
thatbecause it is both kind of neat and not what I expected from your description. Here's the sort of extra stuff added
tothe end of the standard output when you toggle this feature on: 
>
> <snipped output>
>
> From the way you described the patch, I had thought that you were just putting this information into the log files or
somethinglike that. In fact, it's not in the log files; it just shows up in this summary at the end. I'm not sure if
that'sbest or not. Some might want to see how the per-statement variation varies over time. Sort of torn on whether the
summaryalone is enough detail or not. Let me play with this some more and get back to you on that. 

I think the two features are pretty much orthogonal, even though they'd make use of the same per-statement
instrumentationmachinery. 

I created the patch to tune the wal_writer for the synchronous_commit=off case - the idea being that the COMMIT should
bevirtually instantaneous if the wal_writer writes old wal buffers out fast enough. 

I haven't yet used pgbench's log output feature, so I can't judge how useful the additional of per-statement data to
thatlog is, and what the format should be. However, if you think it's useful and can come up with a sensible format,
I'dbe happy to add that feature to the patch. 

> Now onto the nitpicking. With the standard Ubuntu compiler warnings on I get this:
>
> pgbench.c:1588: warning: ‘i’ may be used uninitialized in this function
>
> If you didn't see that, you may want to double-check how verbose you have your compiler setup to be; maybe you just
missedit (which is of course what reviewers are here for). Regardless, the troublesome bit is this: 
>
> int i;
>
> commands = process_commands(&buf[i]);

Fixed. That was a leftover of the trimming and comment skipping logic, which my patch moves to process_command.

Updated patch is attached.

Thanks for your extensive review &
best regards,
Florian Pflug

Attachment

pgbench_statementlatency.patch

Re: Patch to show individual statement latencies in pgbench output

From

Greg Smith

Date:

03 August 2010, 16:16:40

Florian Pflug wrote:
> I created the patch to tune the wal_writer for the synchronous_commit=off case - the idea being that the COMMIT
shouldbe virtually instantaneous if the wal_writer writes old wal buffers out fast enough.

>   

As I was saying, being able to see the COMMIT times for purposes such as 
that is something I consider valuable about using this instrumentation 
that's not really targeted by pg_stat_statements, the other way built-in 
way people might try to get it.

> I haven't yet used pgbench's log output feature, so I can't judge how useful the additional of per-statement data to
thatlog is, and what the format should be. However, if you think it's useful and can come up with a sensible format,
I'dbe happy to add that feature to the patch.

>   

Let's worry about that in the future.  Maybe it's a good add-on, but 
it's more than I have time to get into during this CF personally.

> That was a leftover of the trimming and comment skipping logic, which my patch moves to process_command.
>   

I think there's still a trimming error here--line 195 of the new patch 
is now removing the declaration of "i" just before it sets it to zero?

On the coding standard side, I noticed all your for loops are missing a 
space between the for and the (; that should get fixed.

Finally, now that the rest of the patch is looking in good shape and is 
something I think is worth considering to commit, it's time to work on 
the documentation SGML.

Also:  when generating multiple versions of a patch like this, standard 
practice is to add something like "-vX" to the naming, so those of us 
trying to review can keep them straight. So next one would be 
"pgbench_statementlatency_v3.patch" or something like that.  It's a good 
habit to get into from first version of a patch you submit.  Presuming 
that's going to be the only version is optimistic for all but the 
smallest of patches, and sometimes not even them...

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: Patch to show individual statement latencies in pgbench output

From

Florian Pflug

Date:

04 August 2010, 08:58:45

On Aug3, 2010, at 21:16 , Greg Smith wrote:
>> That was a leftover of the trimming and comment skipping logic, which my patch moves to process_command.
>
> I think there's still a trimming error here--line 195 of the new patch is now removing the declaration of "i" just
beforeit sets it to zero? 
Hm, I think it's just the diff thats miss-leading there. It correctly marks the "int i" line as "removed" with a "-",
butfor some reason marks the "i = 0" line (and its successors) with a "!", although they're removed too, and not
modified.

> On the coding standard side, I noticed all your for loops are missing a space between the for and the (; that should
getfixed. 
Fixed

>
> Finally, now that the rest of the patch is looking in good shape and is something I think is worth considering to
commit,it's time to work on the documentation SGML. 
I've added the "-r" option to the list of pgbench options in pgbench.sgml and also added a short section that shows how
theoutput looks like, similar to how things are done for the "-l" option. 

> Also:  when generating multiple versions of a patch like this, standard practice is to add something like "-vX" to
thenaming, so those of us trying to review can keep them straight. 
Will do from now on.

Updated patch is attached. I've also pushed this as branch "pgbench_statementlatency" to
git://github.com/fgp/postgres.git

best regards,
Florian Pflug

Attachment

pgbench_statementlatency_v3.patch

Re: Patch to show individual statement latencies in pgbench output

From

Florian Pflug

Date:

05 August 2010, 19:05:38

On Aug4, 2010, at 13:58 , Florian Pflug wrote:
> On Aug3, 2010, at 21:16 , Greg Smith wrote:
>>> That was a leftover of the trimming and comment skipping logic, which my patch moves to process_command.
>>
>> I think there's still a trimming error here--line 195 of the new patch is now removing the declaration of "i" just
beforeit sets it to zero? 
> Hm, I think it's just the diff thats miss-leading there. It correctly marks the "int i" line as "removed" with a "-",
butfor some reason marks the "i = 0" line (and its successors) with a "!", although they're removed too, and not
modified.
>
>> On the coding standard side, I noticed all your for loops are missing a space between the for and the (; that should
getfixed. 
> Fixed

Crap. I've messed up to the effect that the for-loop formatting fix wasn't actually in the patch.

Attached is an updated version (v4).

Sorry for the noise.

best regards,
Florian Pflug

Attachment

pgbench_statementlatency_v4.patch

Re: Patch to show individual statement latencies in pgbench output

From

Greg Smith

Date:

11 August 2010, 01:34:21

Florian Pflug wrote:
> Attached is an updated version (v4).
>

I've attached a v5.  No real code changes from Florian's version, just
some wording/style fixes and rework on the documentation.  The user side
is now consistent about calling these statement latencies for example,
even though the internals still call them command latencies most places.

Since using this new feature will introduce a whole stack of new calls
to get the system time, I also added a warning about that impacting results:

    Note that collecting the additional timing information needed for
    detailed latency computation does add some overhead.  This will slow
    average execution speed and lower the computed TPS.  The exact amount
    of slowdown varies significantly based on platform and hardware.
    Comparing average TPS values with and without latency reporting enabled
    is a good way to measure if the timing overhead is significant.

I wasn't able to see any significant slowdown on my modern Linux systems
doing such a test:

$ ./pgbench -T 10 -S -c 8 -j 4 pgbench
tps = 6716.039813 (including connections establishing)
tps = 6720.238878 (excluding connections establishing)
$ ./pgbench -T 10 -S -c 8 -j 4 -r pgbench
tps = 6708.544618 (including connections establishing)
tps = 6712.728526 (excluding connections establishing)

But I know gettimeofday is fast here.  Worth including a warning for
though I think.

I'm out of things to check here, marking this one ready for a committer
review.  The patch hasn't had a committer assigned yet, so whoever wants
to claim it should mark the CF app.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 53cc4b6..90d11f1 100644
*** a/contrib/pgbench/pgbench.c
--- b/contrib/pgbench/pgbench.c
*************** int            fillfactor = 100;
*** 133,138 ****
--- 133,139 ----

  bool        use_log;            /* log transaction latencies to a file */
  bool        is_connect;            /* establish connection for each transaction */
+ bool        is_latencies;        /* report per-command latencies */
  int            main_pid;            /* main process id used in log filename */

  char       *pghost = "";
*************** typedef struct
*** 171,177 ****
      int64        until;            /* napping until (usec) */
      Variable   *variables;        /* array of variable definitions */
      int            nvariables;
!     instr_time    txn_begin;        /* used for measuring latencies */
      int            use_file;        /* index in sql_files for this client */
      bool        prepared[MAX_FILES];
  } CState;
--- 172,179 ----
      int64        until;            /* napping until (usec) */
      Variable   *variables;        /* array of variable definitions */
      int            nvariables;
!     instr_time    txn_begin;        /* used for measuring transaction latencies */
!     instr_time    stmt_begin;        /* used for measuring statement latencies */
      int            use_file;        /* index in sql_files for this client */
      bool        prepared[MAX_FILES];
  } CState;
*************** typedef struct
*** 199,204 ****
--- 201,207 ----
  /*
   * queries read from files
   */
+ #define COMMENT_COMMAND    0
  #define SQL_COMMAND        1
  #define META_COMMAND    2
  #define MAX_ARGS        10
*************** static const char *QUERYMODE[] = {"simpl
*** 216,224 ****
--- 219,230 ----

  typedef struct
  {
+     char       *line;            /* line containing the command */
      int            type;            /* command type (SQL_COMMAND or META_COMMAND) */
      int            argc;            /* number of commands */
      char       *argv[MAX_ARGS]; /* command list */
+     instr_time *exec_elapsed;    /* time spent executing this command (per thread) */
+     int           *exec_count;        /* number of executions (per thread) */
  } Command;

  static Command **sql_files[MAX_FILES];    /* SQL script files */
*************** usage(const char *progname)
*** 287,292 ****
--- 293,299 ----
             "               define variable for use by custom script\n"
             "  -f FILENAME  read transaction script from FILENAME\n"
             "  -j NUM       number of threads (default: 1)\n"
+            "  -r           report average latency per command\n"
             "  -l           write transaction times to log file\n"
             "  -M {simple|extended|prepared}\n"
             "               protocol for submitting queries to server (default: simple)\n"
*************** clientDone(CState *st, bool ok)
*** 740,746 ****

  /* return false iff client should be disconnected */
  static bool
! doCustom(CState *st, instr_time *conn_time, FILE *logfile)
  {
      PGresult   *res;
      Command   **commands;
--- 747,753 ----

  /* return false iff client should be disconnected */
  static bool
! doCustom(TState* thread, CState *st, instr_time *conn_time, FILE *logfile)
  {
      PGresult   *res;
      Command   **commands;
*************** top:
*** 800,805 ****
--- 807,824 ----
  #endif
          }

+         /* Accumulate per-command execution times if per-command latencies
+            are requested */
+         if (is_latencies && (commands[st->state]->type != COMMENT_COMMAND))
+         {
+             instr_time now;
+
+             INSTR_TIME_SET_CURRENT(now);
+             INSTR_TIME_ACCUM_DIFF(commands[st->state]->exec_elapsed[thread->tid],
+                                   now, st->stmt_begin);
+             ++(commands[st->state]->exec_count[thread->tid]);
+         }
+
          if (commands[st->state]->type == SQL_COMMAND)
          {
              res = PQgetResult(st->con);
*************** top:
*** 840,845 ****
--- 859,871 ----
              commands = sql_files[st->use_file];
          }
      }
+
+     /* Skip comments */
+     if (commands[st->state]->type == COMMENT_COMMAND)
+     {
+         st->listen = 1;
+         goto top;
+     }

      if (st->con == NULL)
      {
*************** top:
*** 856,864 ****
--- 882,895 ----
          INSTR_TIME_ACCUM_DIFF(*conn_time, end, start);
      }

+     /* Record transaction start time if logging is enabled */
      if (logfile && st->state == 0)
          INSTR_TIME_SET_CURRENT(st->txn_begin);

+     /* Record statement start time if per-command latencies are requested */
+     if (is_latencies)
+         INSTR_TIME_SET_CURRENT(st->stmt_begin);
+
      if (commands[st->state]->type == SQL_COMMAND)
      {
          const Command *command = commands[st->state];
*************** process_commands(char *buf)
*** 1361,1385 ****
      char       *p,
                 *tok;

      if ((p = strchr(buf, '\n')) != NULL)
          *p = '\0';

!     p = buf;
!     while (isspace((unsigned char) *p))
!         p++;
!
!     if (*p == '\0' || strncmp(p, "--", 2) == 0)
      {
          return NULL;
      }

!     my_commands = (Command *) malloc(sizeof(Command));
!     if (my_commands == NULL)
      {
!         return NULL;
      }

!     my_commands->argc = 0;

      if (*p == '\\')
      {
--- 1392,1428 ----
      char       *p,
                 *tok;

+     /* Make the string buf end at the next newline */
      if ((p = strchr(buf, '\n')) != NULL)
          *p = '\0';

!     /* Allocate and initialize Command structure, initially
!      * assuming the line is a comment
!      */
!     my_commands = (Command *) malloc(sizeof(Command));
!     if (my_commands == NULL)
      {
          return NULL;
      }
+     my_commands->type = COMMENT_COMMAND;
+     my_commands->line = strdup(buf);
+     my_commands->argc = 0;
+     my_commands->exec_elapsed = NULL;
+     my_commands->exec_count = NULL;

!     p = buf;
!
!     /* Skip leading whitespace */
!     while (isspace((unsigned char) *p))
      {
!         p++;
      }

!     /* If the line is empty or actually a comment, we're done */
!     if (*p == '\0' || strncmp(p, "--", 2) == 0)
!     {
!         return my_commands;
!     }

      if (*p == '\\')
      {
*************** process_file(char *filename)
*** 1548,1570 ****
      while (fgets(buf, sizeof(buf), fd) != NULL)
      {
          Command    *commands;
-         int            i;

!         i = 0;
!         while (isspace((unsigned char) buf[i]))
!             i++;
!
!         if (buf[i] != '\0' && strncmp(&buf[i], "--", 2) != 0)
          {
!             commands = process_commands(&buf[i]);
!             if (commands == NULL)
!             {
!                 fclose(fd);
!                 return false;
!             }
          }
-         else
-             continue;

          my_commands[lineno] = commands;
          lineno++;
--- 1591,1603 ----
      while (fgets(buf, sizeof(buf), fd) != NULL)
      {
          Command    *commands;

!         commands = process_commands(buf);
!         if (commands == NULL)
          {
!             fclose(fd);
!             return false;
          }

          my_commands[lineno] = commands;
          lineno++;
*************** printResults(int ttype, int normal_xacts
*** 1660,1665 ****
--- 1693,1699 ----
                  tps_include,
                  tps_exclude;
      char       *s;
+     int            i;

      time_include = INSTR_TIME_GET_DOUBLE(total_time);
      tps_include = normal_xacts / time_include;
*************** printResults(int ttype, int normal_xacts
*** 1694,1699 ****
--- 1728,1771 ----
      }
      printf("tps = %f (including connections establishing)\n", tps_include);
      printf("tps = %f (excluding connections establishing)\n", tps_exclude);
+
+     /* Report per-command latencies */
+     if (is_latencies)
+     {
+         for (i = 0; i < num_files; ++i) {
+             Command** commands;
+
+             if (num_files > 1)
+                 printf("statement latencies in milliseconds, file %d\n:", i+1);
+             else
+                 printf("statement latencies in milliseconds:\n");
+
+             for (commands = sql_files[i]; *commands != NULL; ++commands)
+             {
+                 Command* command = *commands;
+                 instr_time total_exec_elapsed;
+                 int total_exec_count;
+                 int t;
+
+                 /* Accumulate per-thread data if available */
+                 INSTR_TIME_SET_ZERO(total_exec_elapsed);
+                 total_exec_count = 0;
+                 if ((command->exec_elapsed != NULL) && (command->exec_count != NULL))
+                 {
+                     for (t = 0; t < nthreads; ++t)
+                     {
+                         INSTR_TIME_ADD(total_exec_elapsed, command->exec_elapsed[t]);
+                         total_exec_count += command->exec_count[t];
+                     }
+                 }
+
+                 if (total_exec_count > 0)
+                     printf("\t%f\t%s\n", INSTR_TIME_GET_MILLISEC(total_exec_elapsed) / (double)total_exec_count,
command->line);
+                 else
+                     printf("\t%f\t%s\n", 0.0, command->line);
+             }
+         }
+     }
  }


*************** main(int argc, char **argv)
*** 1770,1776 ****

      memset(state, 0, sizeof(*state));

!     while ((c = getopt(argc, argv, "ih:nvp:dSNc:Cs:t:T:U:lf:D:F:M:j:")) != -1)
      {
          switch (c)
          {
--- 1842,1848 ----

      memset(state, 0, sizeof(*state));

!     while ((c = getopt(argc, argv, "ih:nvp:dSNc:Cs:t:T:U:rlf:D:F:M:j:")) != -1)
      {
          switch (c)
          {
*************** main(int argc, char **argv)
*** 1834,1839 ****
--- 1906,1914 ----
              case 'C':
                  is_connect = true;
                  break;
+             case 'r':
+                 is_latencies = true;
+                 break;
              case 's':
                  scale_given = true;
                  scale = atoi(optarg);
*************** main(int argc, char **argv)
*** 2090,2095 ****
--- 2165,2193 ----
          default:
              break;
      }
+
+     /* Reserve memory for the threads to store per-command latencies if requested */
+     if (is_latencies)
+     {
+         for (i = 0; i < num_files; ++i)
+         {
+             Command** commands;
+             for (commands = sql_files[i]; *commands != NULL; ++commands)
+             {
+                 Command* command = *commands;
+                 int t;
+
+                 command->exec_elapsed = (instr_time*) malloc(sizeof(instr_time) * nthreads);
+                 command->exec_count = (int*) malloc(sizeof(int) * nthreads);
+
+                 for (t = 0; t < nthreads; ++t)
+                 {
+                     INSTR_TIME_SET_ZERO(command->exec_elapsed[t]);
+                     command->exec_count[t] = 0;
+                 }
+             }
+         }
+     }

      /* get start up time */
      INSTR_TIME_SET_CURRENT(start_time);
*************** threadRun(void *arg)
*** 2211,2217 ****
          int            prev_ecnt = st->ecnt;

          st->use_file = getrand(0, num_files - 1);
!         if (!doCustom(st, &result->conn_time, logfile))
              remains--;            /* I've aborted */

          if (st->ecnt > prev_ecnt && commands[st->state]->type == META_COMMAND)
--- 2309,2315 ----
          int            prev_ecnt = st->ecnt;

          st->use_file = getrand(0, num_files - 1);
!         if (!doCustom(thread, st, &result->conn_time, logfile))
              remains--;            /* I've aborted */

          if (st->ecnt > prev_ecnt && commands[st->state]->type == META_COMMAND)
*************** threadRun(void *arg)
*** 2260,2266 ****
              {
                  continue;
              }
!             else if (commands[st->state]->type == META_COMMAND)
              {
                  min_usec = 0;    /* the connection is ready to run */
                  break;
--- 2358,2364 ----
              {
                  continue;
              }
!             else if ((commands[st->state]->type == META_COMMAND) || (commands[st->state]->type == COMMENT_COMMAND))
              {
                  min_usec = 0;    /* the connection is ready to run */
                  break;
*************** threadRun(void *arg)
*** 2313,2319 ****
              if (st->con && (FD_ISSET(PQsocket(st->con), &input_mask)
                              || commands[st->state]->type == META_COMMAND))
              {
!                 if (!doCustom(st, &result->conn_time, logfile))
                      remains--;    /* I've aborted */
              }

--- 2411,2417 ----
              if (st->con && (FD_ISSET(PQsocket(st->con), &input_mask)
                              || commands[st->state]->type == META_COMMAND))
              {
!                 if (!doCustom(thread, st, &result->conn_time, logfile))
                      remains--;    /* I've aborted */
              }

diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index cdad212..6877bec 100644
*** a/doc/src/sgml/pgbench.sgml
--- b/doc/src/sgml/pgbench.sgml
*************** pgbench <optional> <replaceable>options<
*** 247,252 ****
--- 247,263 ----
       </varlistentry>

       <varlistentry>
+       <term><option>-r</option></term>
+       <listitem>
+        <para>
+         Report the average per-statement latency (execution time from the
+         perspective of the client) of each command after the benchmark
+         finishes.  See below for details.
+        </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
        <term><option>-M</option> <replaceable>querymode</></term>
        <listitem>
         <para>
*************** END;
*** 632,638 ****
  <replaceable>client_id</> <replaceable>transaction_no</> <replaceable>time</> <replaceable>file_no</>
<replaceable>time_epoch</><replaceable>time_us</> 
  </synopsis>

!    where <replaceable>time</> is the elapsed transaction time in microseconds,
     <replaceable>file_no</> identifies which script file was used
     (useful when multiple scripts were specified with <literal>-f</>),
     and <replaceable>time_epoch</>/<replaceable>time_us</> are a
--- 643,649 ----
  <replaceable>client_id</> <replaceable>transaction_no</> <replaceable>time</> <replaceable>file_no</>
<replaceable>time_epoch</><replaceable>time_us</> 
  </synopsis>

!    where <replaceable>time</> is the total elapsed transaction time in microseconds,
     <replaceable>file_no</> identifies which script file was used
     (useful when multiple scripts were specified with <literal>-f</>),
     and <replaceable>time_epoch</>/<replaceable>time_us</> are a
*************** END;
*** 654,659 ****
--- 665,742 ----
   </sect2>

   <sect2>
+   <title>Per-statement latencies</title>
+   <para>
+    With the <literal>-r</> option, <application>pgbench</> collects
+    the elapsed transaction time of each statement executed by every
+    client.  It then reports an average of those values, referred to
+    as the latency for each statement, after the benchmark has finished.
+   </para>
+
+   <para>
+    For the default script, the output will look similar to this:
+ <screen>
+ starting vacuum...end.
+ transaction type: TPC-B (sort of)
+ scaling factor: 1
+ query mode: simple
+ number of clients: 10
+ number of threads: 1
+ number of transactions per client: 1000
+ number of transactions actually processed: 10000/10000
+ tps = 618.764555 (including connections establishing)
+ tps = 622.977698 (excluding connections establishing)
+ statement latencies in milliseconds:
+     0.004386    \set nbranches 1 * :scale
+     0.001343    \set ntellers 10 * :scale
+     0.001212    \set naccounts 100000 * :scale
+     0.001310    \setrandom aid 1 :naccounts
+     0.001073    \setrandom bid 1 :nbranches
+     0.001005    \setrandom tid 1 :ntellers
+     0.001078    \setrandom delta -5000 5000
+     0.326152    BEGIN;
+     0.603376    UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
+     0.454643    SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
+     5.528491    UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
+     7.335435    UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
+     0.371851    INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta,
CURRENT_TIMESTAMP);
+     1.212976    END;
+ </screen>
+   </para>
+
+   <para>
+    If multiple script files are specified, the averages are reported
+    separately for each individual script file:
+ <screen>
+ starting vacuum...end.
+ transaction type: Custom query
+ scaling factor: 1
+ query mode: simple
+ number of clients: 10
+ number of threads: 1
+ number of transactions per client: 1000
+ number of transactions actually processed: 10000/10000
+ tps = 162.902251 (including connections establishing)
+ tps = 163.166833 (excluding connections establishing)
+ statement latencies in milliseconds, file 1:
+     2.534675    SELECT * FROM pgbench_branches LIMIT 1;
+     1.168928    SELECT COUNT(*) FROM pgbench_branches;
+ statement latencies in milliseconds, file 2:
+     2.394322    SELECT * FROM pgbench_accounts LIMIT 1;
+     113.624613    SELECT COUNT(*) FROM pgbench_accounts;
+ </screen>
+   </para>
+   <para>
+    Note that collecting the additional timing information needed for
+    detailed latency computation does add some overhead.  This will slow
+    average execution speed and lower the computed TPS.  The exact amount
+    of slowdown varies significantly based on platform and hardware.
+    Comparing average TPS values with and without latency reporting enabled
+    is a good way to measure if the timing overhead is significant.
+   </para>
+  </sect2>
+
+  <sect2>
    <title>Good Practices</title>

    <para>

Re: Patch to show individual statement latencies in pgbench output

From

Tom Lane

Date:

12 August 2010, 14:49:09

Greg Smith <greg@2ndquadrant.com> writes:
> Florian Pflug wrote:
>> Attached is an updated version (v4).

> I've attached a v5.  No real code changes from Florian's version, just 
> some wording/style fixes and rework on the documentation.

I'm looking through this patch now.  It looks mostly good, but I am
wondering just exactly what is the rationale for adding comment
statements to the data structures, rather than ignoring them as before.
It seems like a complete waste of logic, memory space, and cycles;
moreover it renders the documentation's statement that comments
"are ignored" incorrect.  I did not find anything in the patch history
explaining the point of that change.
        regards, tom lane

Re: Patch to show individual statement latencies in pgbench output

From

Tom Lane

Date:

12 August 2010, 16:20:21

Greg Smith <greg@2ndquadrant.com> writes:
> Florian Pflug wrote:
>> Attached is an updated version (v4).

> I've attached a v5.

BTW, I discovered a rather nasty shortcoming of this patch on platforms
without ENABLE_THREAD_SAFETY.  It doesn't work, and what's worse, it
*looks* like it's working, because it gives you plausible-looking
numbers.  But actually the numbers are only averages across the first
worker thread.  The other threads are in sub-processes where they can't
affect the contents of the parent's arrays.

Since platforms without ENABLE_THREAD_SAFETY are only a small minority
these days, this is probably not sufficient reason to reject the patch.
What I plan to do instead is reject the combination of -r with -j larger
than 1 on such platforms:
   if (is_latencies)   {       /*        * is_latencies only works with multiple threads in thread-based        *
implementations,not fork-based ones, because it supposes that the        * parent can see changes made to the command
datastructures by child        * threads.  It seems useful enough to accept despite this limitation,        * but
perhapswe should FIXME someday.        */

#ifndef ENABLE_THREAD_SAFETY       if (nthreads > 1)       {           fprintf(stderr, "-r does not work with -j larger
than1 on this platform.\n");           exit(1);       }

#endif

It could be fixed with enough effort, by having the child threads pass
back their numbers through the child-to-parent pipes.  I'm not
interested in doing that myself though.

If anyone thinks this problem makes it uncommittable, speak up now.
        regards, tom lane

Re: Patch to show individual statement latencies in pgbench output

From

Tom Lane

Date:

12 August 2010, 18:19:24

Greg Smith <greg@2ndquadrant.com> writes:
> I've attached a v5.  No real code changes from Florian's version, just 
> some wording/style fixes and rework on the documentation.

I've committed this with some editorialization.  The main non-cosmetic
change was that I pulled the latency statistics counters out of the
per-Command data structures and put them into per-thread arrays instead.
I did this for two reasons:

1. Having different threads munging adjacent array entries without any
locking makes me itch.  On some platforms that could possibly fail
entirely, and in any case it's likely to be a performance hit on
machines where processors lock whole cache lines (which is most of them
these days, I think).

2. It should make it a lot easier to pass the per-thread results back up
to the parent in a fork-based implementation, should anyone desire to
fix the limitation I mentioned before.
        regards, tom lane

Re: Patch to show individual statement latencies in pgbench output

From

Florian Pflug

Date:

12 August 2010, 20:14:02

On Aug12, 2010, at 19:48 , Tom Lane wrote:
> Greg Smith <greg@2ndquadrant.com> writes:
>> Florian Pflug wrote:
>>> Attached is an updated version (v4).
>
>> I've attached a v5.  No real code changes from Florian's version, just
>> some wording/style fixes and rework on the documentation.
>
> I'm looking through this patch now.  It looks mostly good, but I am
> wondering just exactly what is the rationale for adding comment
> statements to the data structures, rather than ignoring them as before.
> It seems like a complete waste of logic, memory space, and cycles;
> moreover it renders the documentation's statement that comments
> "are ignored" incorrect.  I did not find anything in the patch history
> explaining the point of that change.

To be able to include the comments (with an average latency of zero) in the latency report. This makes the latency
reportas self-explanatory as the original script was (Think latency report copy-and-pasted into an e-mail or wiki). It
alsohas the benefit of making the line numbers of the latency report agree to those of the original script, which
seemedlike a natural thing to do, and might make some sorts of post-processing easier. It does make doCustom() a bit
morecomplex, though. 

Anyway, I guess the chance of adding this back is slim now that the patch is committed. Oh well.

Thanks for committing this, and
best regards,
Florian Pflug

Re: Patch to show individual statement latencies in pgbench output

From

Greg Smith

Date:

12 August 2010, 21:22:32

Tom Lane wrote:
> It could be fixed with enough effort, by having the child threads pass
> back their numbers through the child-to-parent pipes.  I'm not
> interested in doing that myself though.
>   

That does seem an improvement with a very limited user base it's 
applicable to.  Probably the next useful improvement to this feature is 
to get per-statement data into the latency log files if requested.  If 
this issue gets in the way there somehow, maybe it's worth squashing 
then.  I don't think it will though.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: Patch to show individual statement latencies in pgbench output

From

Tom Lane

Date:

12 August 2010, 22:51:34

Florian Pflug <fgp@phlo.org> writes:
> On Aug12, 2010, at 19:48 , Tom Lane wrote:
>> I'm looking through this patch now.  It looks mostly good, but I am
>> wondering just exactly what is the rationale for adding comment
>> statements to the data structures, rather than ignoring them as before.

> To be able to include the comments (with an average latency of zero)
> in the latency report. This makes the latency report as
> self-explanatory as the original script was (Think latency report
> copy-and-pasted into an e-mail or wiki). It also has the benefit of
> making the line numbers of the latency report agree to those of the
> original script, which seemed like a natural thing to do, and might
> make some sorts of post-processing easier. It does make doCustom() a
> bit more complex, though.

I had wondered if the original form of the patch printed line numbers
rather than the actual line contents.  If that were true then it'd make
sense to include comment lines.  In the current form of the patch,
though, I think the output is just as readable without comment lines ---
and I'm not thrilled with having this patch materially affect the
behavior for cases where -r wasn't even specified.
        regards, tom lane