Thread: Suggested change to pgbench

Suggested change to pgbench

From
Tom Lane
Date:
I think pgbench is not dealing with asynchronous input quite right.
As written, if the backend sends a response that doesn't fit into
a single TCP packet, pgbench will go into a tight loop of
PQisBusy/PQconsumeInput, which will not exit until the rest of the
response arrives.  This would degrade the reported performance, first
because of the wasted CPU cycles, and second because other connections
wouldn't get serviced during that interval.

I am not sure how likely this is to happen, but I can report that
correcting it via the attached patch made a visible difference:

[tgl@rh1 pgbench]$ ./pgbench -c 100 -t 100 bench1
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 100
number of transactions per client: 100
number of transactions actually processed: 10000/10000
tps = 34.748295 (including connections establishing)
tps = 34.803249 (excluding connections establishing)

[tgl@rh1 pgbench]$ ./mybench -c 100 -t 100 bench1    -- patched
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 100
number of transactions per client: 100
number of transactions actually processed: 10000/10000
tps = 36.849122 (including connections establishing)
tps = 36.909993 (excluding connections establishing)

I haven't done enough tests to be sure that that isn't just a chance
variation, but I recommend changing the code as below anyway.

            regards, tom lane


*** pgbench.c~    Sun Oct  6 12:01:44 2002
--- pgbench.c    Sun Oct  6 12:11:02 2002
***************
*** 184,200 ****
      {                            /* are we receiver? */
          if (debug)
              fprintf(stderr, "client %d receiving\n", n);
!         while (PQisBusy(st->con) == TRUE)
!         {
!             if (!PQconsumeInput(st->con))
!             {                    /* there's something wrong */
!                 fprintf(stderr, "Client %d aborted in state %d. Probably the backend died while processing.\n", n,
st->state);
!                 remains--;        /* I've aborted */
!                 PQfinish(st->con);
!                 st->con = NULL;
!                 return;
!             }
          }

          switch (st->state)
          {
--- 184,199 ----
      {                            /* are we receiver? */
          if (debug)
              fprintf(stderr, "client %d receiving\n", n);
!         if (!PQconsumeInput(st->con))
!         {                        /* there's something wrong */
!             fprintf(stderr, "Client %d aborted in state %d. Probably the backend died while processing.\n", n,
st->state);
!             remains--;            /* I've aborted */
!             PQfinish(st->con);
!             st->con = NULL;
!             return;
          }
+         if (PQisBusy(st->con))
+             return;                /* don't have the whole result yet */

          switch (st->state)
          {
***************
*** 367,383 ****
      {                            /* are we receiver? */
          if (debug)
              fprintf(stderr, "client %d receiving\n", n);
!         while (PQisBusy(st->con) == TRUE)
!         {
!             if (!PQconsumeInput(st->con))
!             {                    /* there's something wrong */
!                 fprintf(stderr, "Client %d aborted in state %d. Probably the backend died while processing.\n", n,
st->state);
!                 remains--;        /* I've aborted */
!                 PQfinish(st->con);
!                 st->con = NULL;
!                 return;
!             }
          }

          switch (st->state)
          {
--- 366,381 ----
      {                            /* are we receiver? */
          if (debug)
              fprintf(stderr, "client %d receiving\n", n);
!         if (!PQconsumeInput(st->con))
!         {                        /* there's something wrong */
!             fprintf(stderr, "Client %d aborted in state %d. Probably the backend died while processing.\n", n,
st->state);
!             remains--;            /* I've aborted */
!             PQfinish(st->con);
!             st->con = NULL;
!             return;
          }
+         if (PQisBusy(st->con))
+             return;                /* don't have the whole result yet */

          switch (st->state)
          {

Re: Suggested change to pgbench

From
Tatsuo Ishii
Date:
I have applied your patches plus an fprintf to see if PQisBusy()
actually returns true. Then ran pgbench with -c 128 -t 100. So far I
could not see any evidence that PQisBusy() returns true.

> I think pgbench is not dealing with asynchronous input quite right.
> As written, if the backend sends a response that doesn't fit into
> a single TCP packet, pgbench will go into a tight loop of
> PQisBusy/PQconsumeInput, which will not exit until the rest of the
> response arrives.  This would degrade the reported performance, first
> because of the wasted CPU cycles, and second because other connections
> wouldn't get serviced during that interval.

> I haven't done enough tests to be sure that that isn't just a chance
> variation, but I recommend changing the code as below anyway.

Ok, I will commit your changes. Thanks.
--
Tatsuo Ishii

Re: Suggested change to pgbench

From
Justin Clift
Date:
Hi Tom,

Have applied this change to the pg_autotune code, and am running tests
against a remote database with that.

Seems a *lot* more consistent results now.

None of the wide swings in variation, etc.

This will help in the case of pg_autotune quite a lot!

:-)

Regards and best wishes,

Justin Clift


Tom Lane wrote:
>
> I think pgbench is not dealing with asynchronous input quite right.
> As written, if the backend sends a response that doesn't fit into
> a single TCP packet, pgbench will go into a tight loop of
> PQisBusy/PQconsumeInput, which will not exit until the rest of the
> response arrives.  This would degrade the reported performance, first
> because of the wasted CPU cycles, and second because other connections
> wouldn't get serviced during that interval.
>
> I am not sure how likely this is to happen, but I can report that
> correcting it via the attached patch made a visible difference:
>
> [tgl@rh1 pgbench]$ ./pgbench -c 100 -t 100 bench1
> starting vacuum...end.
> transaction type: TPC-B (sort of)
> scaling factor: 100
> number of clients: 100
> number of transactions per client: 100
> number of transactions actually processed: 10000/10000
> tps = 34.748295 (including connections establishing)
> tps = 34.803249 (excluding connections establishing)
>
> [tgl@rh1 pgbench]$ ./mybench -c 100 -t 100 bench1       -- patched
> starting vacuum...end.
> transaction type: TPC-B (sort of)
> scaling factor: 100
> number of clients: 100
> number of transactions per client: 100
> number of transactions actually processed: 10000/10000
> tps = 36.849122 (including connections establishing)
> tps = 36.909993 (excluding connections establishing)
>
> I haven't done enough tests to be sure that that isn't just a chance
> variation, but I recommend changing the code as below anyway.
>
>                         regards, tom lane
>
> *** pgbench.c~  Sun Oct  6 12:01:44 2002
> --- pgbench.c   Sun Oct  6 12:11:02 2002
> ***************
> *** 184,200 ****
>         {                                                       /* are we receiver? */
>                 if (debug)
>                         fprintf(stderr, "client %d receiving\n", n);
> !               while (PQisBusy(st->con) == TRUE)
> !               {
> !                       if (!PQconsumeInput(st->con))
> !                       {                                       /* there's something wrong */
> !                               fprintf(stderr, "Client %d aborted in state %d. Probably the backend died while
processing.\n",n, st->state); 
> !                               remains--;              /* I've aborted */
> !                               PQfinish(st->con);
> !                               st->con = NULL;
> !                               return;
> !                       }
>                 }
>
>                 switch (st->state)
>                 {
> --- 184,199 ----
>         {                                                       /* are we receiver? */
>                 if (debug)
>                         fprintf(stderr, "client %d receiving\n", n);
> !               if (!PQconsumeInput(st->con))
> !               {                                               /* there's something wrong */
> !                       fprintf(stderr, "Client %d aborted in state %d. Probably the backend died while
processing.\n",n, st->state); 
> !                       remains--;                      /* I've aborted */
> !                       PQfinish(st->con);
> !                       st->con = NULL;
> !                       return;
>                 }
> +               if (PQisBusy(st->con))
> +                       return;                         /* don't have the whole result yet */
>
>                 switch (st->state)
>                 {
> ***************
> *** 367,383 ****
>         {                                                       /* are we receiver? */
>                 if (debug)
>                         fprintf(stderr, "client %d receiving\n", n);
> !               while (PQisBusy(st->con) == TRUE)
> !               {
> !                       if (!PQconsumeInput(st->con))
> !                       {                                       /* there's something wrong */
> !                               fprintf(stderr, "Client %d aborted in state %d. Probably the backend died while
processing.\n",n, st->state); 
> !                               remains--;              /* I've aborted */
> !                               PQfinish(st->con);
> !                               st->con = NULL;
> !                               return;
> !                       }
>                 }
>
>                 switch (st->state)
>                 {
> --- 366,381 ----
>         {                                                       /* are we receiver? */
>                 if (debug)
>                         fprintf(stderr, "client %d receiving\n", n);
> !               if (!PQconsumeInput(st->con))
> !               {                                               /* there's something wrong */
> !                       fprintf(stderr, "Client %d aborted in state %d. Probably the backend died while
processing.\n",n, st->state); 
> !                       remains--;                      /* I've aborted */
> !                       PQfinish(st->con);
> !                       st->con = NULL;
> !                       return;
>                 }
> +               if (PQisBusy(st->con))
> +                       return;                         /* don't have the whole result yet */
>
>                 switch (st->state)
>                 {
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
   - Indira Gandhi


Re: Suggested change to pgbench

From
Tom Lane
Date:
Justin Clift <justin@postgresql.org> writes:
> Have applied this change to the pg_autotune code, and am running tests
> against a remote database with that.
> Seems a *lot* more consistent results now.

It would be more likely to make a difference on a remote connection
than a local one --- smaller packet size ...

            regards, tom lane