Thread: A couple of proposed pgbench changes

A couple of proposed pgbench changes

From
Tom Lane
Date:
Hi Tatsuo,

Attached is a proposed patch that deals with a couple of pgbench issues.

The change at line 490 updates doCustom's local variable "commands"
after selecting a new file (command sequence).  I think that the
existing coding will cause the thing to use the first command of the
old sequence in the remainder of the routine, which would be a bug.
I have not tried to set up a test case to prove it, though.

The other two changes cause doCustom to loop after processing a
meta-command.  This might be a bit controversial, but as the code
is currently written, each meta-command "costs" one cycle of the
outer select() loop.  Thus, for example, with the default TPC-B script,
once a backend returns "COMMIT" it will not receive a new command
until four cycles of issuing commands to other backends have elapsed.
(You can see this very easily by strace'ing pgbench under load.)
ISTM it is bad to have backends sitting idle waiting for pgbench to
give them a new command.  On the other hand you could argue that
finishing out several consecutive metacommands will delay issuance of
new commands to other backends.  In the test case I was running,
making this change made for a small but noticeable improvement in
total CPU usage, but I'm not entirely convinced it'd always be a win.

I get the impression that pgbench itself is a bit of a bottleneck when
running on multi-CPU machines.  I can't get the total CPU usage to reach
90% with ten client threads on a dual HT-enabled Xeon, and the only
explanation I can see is that there are too many backends waiting for
pgbench to give them new commands instead of doing useful work.  strace
shows that each select() iteration usually finds *all ten* sockets
read-ready, which is definitely bad.  (I tried nice'ing pgbench to
negative priority, but it did not improve matters.  It could easily be
that this is a Heisenproblem, though, with strace itself slowing pgbench
enough to cause the behavior.)  I wonder if it would help for pgbench to
fork off multiple sub-processes and have each sub-process tend just one
backend.

Anyway, since I'm not sure of either of these changes, I'm passing them
to you for review.

            regards, tom lane


Index: pgbench.c
===================================================================
RCS file: /cvsroot/pgsql/contrib/pgbench/pgbench.c,v
retrieving revision 1.48
diff -c -r1.48 pgbench.c
*** pgbench.c    23 Nov 2005 12:19:12 -0000    1.48
--- pgbench.c    29 Nov 2005 23:51:46 -0000
***************
*** 411,416 ****
--- 411,417 ----
      CState       *st = &state[n];
      Command   **commands;

+ top:
      commands = sql_files[st->use_file];

      if (st->listen)
***************
*** 489,494 ****
--- 490,496 ----
          {
              st->state = 0;
              st->use_file = getrand(0, num_files - 1);
+             commands = sql_files[st->use_file];
          }
      }

***************
*** 572,577 ****
--- 574,581 ----
              free(val);
              st->listen = 1;
          }
+
+         goto top;
      }
  }


Re: A couple of proposed pgbench changes

From
Tatsuo Ishii
Date:
> Hi Tatsuo,
>
> Attached is a proposed patch that deals with a couple of pgbench issues.
>
> The change at line 490 updates doCustom's local variable "commands"
> after selecting a new file (command sequence).  I think that the
> existing coding will cause the thing to use the first command of the
> old sequence in the remainder of the routine, which would be a bug.
> I have not tried to set up a test case to prove it, though.

Thanks for pointing it out. It's definitely a bug. To reproduce the
problem, you have two SQL files, say,

a.sql:
SELECT 1;
SELECT 2;

b.sql:
SELECT 300;
SELECT 400;

then run,

pgbench -f a.sql -f b.sql

and see the SQL trace by enabling log_statement=all,

I see:
LOG:  statement: SELECT 300;
LOG:  statement: SELECT 400;
LOG:  statement: SELECT 300;
LOG:  statement: SELECT 2;
LOG:  statement: SELECT 1;
LOG:  statement: SELECT 400;
LOG:  statement: SELECT 300;
LOG:  statement: SELECT 400;
LOG:  statement: SELECT 300;
LOG:  statement: SELECT 2;
LOG:  statement: SELECT 1;
:
:

#1 and #2 are ok, but #3(SELECT 300) and #4(SELECT 2) combo is
supposed to be impossible if pgbench works correctly.

> The other two changes cause doCustom to loop after processing a
> meta-command.  This might be a bit controversial, but as the code
> is currently written, each meta-command "costs" one cycle of the
> outer select() loop.  Thus, for example, with the default TPC-B script,
> once a backend returns "COMMIT" it will not receive a new command
> until four cycles of issuing commands to other backends have elapsed.
> (You can see this very easily by strace'ing pgbench under load.)
> ISTM it is bad to have backends sitting idle waiting for pgbench to
> give them a new command.  On the other hand you could argue that
> finishing out several consecutive metacommands will delay issuance of
> new commands to other backends.  In the test case I was running,
> making this change made for a small but noticeable improvement in
> total CPU usage, but I'm not entirely convinced it'd always be a win.

Agreed.

> I get the impression that pgbench itself is a bit of a bottleneck when
> running on multi-CPU machines.  I can't get the total CPU usage to reach
> 90% with ten client threads on a dual HT-enabled Xeon, and the only
> explanation I can see is that there are too many backends waiting for
> pgbench to give them new commands instead of doing useful work.  strace
> shows that each select() iteration usually finds *all ten* sockets
> read-ready, which is definitely bad.  (I tried nice'ing pgbench to
> negative priority, but it did not improve matters.  It could easily be
> that this is a Heisenproblem, though, with strace itself slowing pgbench
> enough to cause the behavior.)  I wonder if it would help for pgbench to
> fork off multiple sub-processes and have each sub-process tend just one
> backend.

I'm not sure multiple sub-processes version of pgbench shows superior
performance than current implementation because of process context
switching overhead. Maybe threading is better? Mr. Yasuo Ohgaki
implemented pthead version of pgbench.

http://wiki.ohgaki.net/index.php?plugin=attach&pcmd=open&file=ppgbench-20050906-3.tar.bz2&refer=PostgreSQL%2Fppgbench

You might want to try.

> Anyway, since I'm not sure of either of these changes, I'm passing them
> to you for review.

Your patches seem fine.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

Re: A couple of proposed pgbench changes

From
Tatsuo Ishii
Date:
I have commited your fixes into PostgreSQL 8.1 stable branches.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> Hi Tatsuo,
>
> Attached is a proposed patch that deals with a couple of pgbench issues.
>
> The change at line 490 updates doCustom's local variable "commands"
> after selecting a new file (command sequence).  I think that the
> existing coding will cause the thing to use the first command of the
> old sequence in the remainder of the routine, which would be a bug.
> I have not tried to set up a test case to prove it, though.
>
> The other two changes cause doCustom to loop after processing a
> meta-command.  This might be a bit controversial, but as the code
> is currently written, each meta-command "costs" one cycle of the
> outer select() loop.  Thus, for example, with the default TPC-B script,
> once a backend returns "COMMIT" it will not receive a new command
> until four cycles of issuing commands to other backends have elapsed.
> (You can see this very easily by strace'ing pgbench under load.)
> ISTM it is bad to have backends sitting idle waiting for pgbench to
> give them a new command.  On the other hand you could argue that
> finishing out several consecutive metacommands will delay issuance of
> new commands to other backends.  In the test case I was running,
> making this change made for a small but noticeable improvement in
> total CPU usage, but I'm not entirely convinced it'd always be a win.
>
> I get the impression that pgbench itself is a bit of a bottleneck when
> running on multi-CPU machines.  I can't get the total CPU usage to reach
> 90% with ten client threads on a dual HT-enabled Xeon, and the only
> explanation I can see is that there are too many backends waiting for
> pgbench to give them new commands instead of doing useful work.  strace
> shows that each select() iteration usually finds *all ten* sockets
> read-ready, which is definitely bad.  (I tried nice'ing pgbench to
> negative priority, but it did not improve matters.  It could easily be
> that this is a Heisenproblem, though, with strace itself slowing pgbench
> enough to cause the behavior.)  I wonder if it would help for pgbench to
> fork off multiple sub-processes and have each sub-process tend just one
> backend.
>
> Anyway, since I'm not sure of either of these changes, I'm passing them
> to you for review.
>
>             regards, tom lane
>
>
> Index: pgbench.c
> ===================================================================
> RCS file: /cvsroot/pgsql/contrib/pgbench/pgbench.c,v
> retrieving revision 1.48
> diff -c -r1.48 pgbench.c
> *** pgbench.c    23 Nov 2005 12:19:12 -0000    1.48
> --- pgbench.c    29 Nov 2005 23:51:46 -0000
> ***************
> *** 411,416 ****
> --- 411,417 ----
>       CState       *st = &state[n];
>       Command   **commands;
>
> + top:
>       commands = sql_files[st->use_file];
>
>       if (st->listen)
> ***************
> *** 489,494 ****
> --- 490,496 ----
>           {
>               st->state = 0;
>               st->use_file = getrand(0, num_files - 1);
> +             commands = sql_files[st->use_file];
>           }
>       }
>
> ***************
> *** 572,577 ****
> --- 574,581 ----
>               free(val);
>               st->listen = 1;
>           }
> +
> +         goto top;
>       }
>   }
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>