Thread: Suggested change to pgbench
I think pgbench is not dealing with asynchronous input quite right. As written, if the backend sends a response that doesn't fit into a single TCP packet, pgbench will go into a tight loop of PQisBusy/PQconsumeInput, which will not exit until the rest of the response arrives. This would degrade the reported performance, first because of the wasted CPU cycles, and second because other connections wouldn't get serviced during that interval. I am not sure how likely this is to happen, but I can report that correcting it via the attached patch made a visible difference: [tgl@rh1 pgbench]$ ./pgbench -c 100 -t 100 bench1 starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 100 number of clients: 100 number of transactions per client: 100 number of transactions actually processed: 10000/10000 tps = 34.748295 (including connections establishing) tps = 34.803249 (excluding connections establishing) [tgl@rh1 pgbench]$ ./mybench -c 100 -t 100 bench1 -- patched starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 100 number of clients: 100 number of transactions per client: 100 number of transactions actually processed: 10000/10000 tps = 36.849122 (including connections establishing) tps = 36.909993 (excluding connections establishing) I haven't done enough tests to be sure that that isn't just a chance variation, but I recommend changing the code as below anyway. regards, tom lane *** pgbench.c~ Sun Oct 6 12:01:44 2002 --- pgbench.c Sun Oct 6 12:11:02 2002 *************** *** 184,200 **** { /* are we receiver? */ if (debug) fprintf(stderr, "client %d receiving\n", n); ! while (PQisBusy(st->con) == TRUE) ! { ! if (!PQconsumeInput(st->con)) ! { /* there's something wrong */ ! fprintf(stderr, "Client %d aborted in state %d. Probably the backend died while processing.\n", n, st->state); ! remains--; /* I've aborted */ ! PQfinish(st->con); ! st->con = NULL; ! return; ! } } switch (st->state) { --- 184,199 ---- { /* are we receiver? */ if (debug) fprintf(stderr, "client %d receiving\n", n); ! if (!PQconsumeInput(st->con)) ! { /* there's something wrong */ ! fprintf(stderr, "Client %d aborted in state %d. Probably the backend died while processing.\n", n, st->state); ! remains--; /* I've aborted */ ! PQfinish(st->con); ! st->con = NULL; ! return; } + if (PQisBusy(st->con)) + return; /* don't have the whole result yet */ switch (st->state) { *************** *** 367,383 **** { /* are we receiver? */ if (debug) fprintf(stderr, "client %d receiving\n", n); ! while (PQisBusy(st->con) == TRUE) ! { ! if (!PQconsumeInput(st->con)) ! { /* there's something wrong */ ! fprintf(stderr, "Client %d aborted in state %d. Probably the backend died while processing.\n", n, st->state); ! remains--; /* I've aborted */ ! PQfinish(st->con); ! st->con = NULL; ! return; ! } } switch (st->state) { --- 366,381 ---- { /* are we receiver? */ if (debug) fprintf(stderr, "client %d receiving\n", n); ! if (!PQconsumeInput(st->con)) ! { /* there's something wrong */ ! fprintf(stderr, "Client %d aborted in state %d. Probably the backend died while processing.\n", n, st->state); ! remains--; /* I've aborted */ ! PQfinish(st->con); ! st->con = NULL; ! return; } + if (PQisBusy(st->con)) + return; /* don't have the whole result yet */ switch (st->state) {
I have applied your patches plus an fprintf to see if PQisBusy() actually returns true. Then ran pgbench with -c 128 -t 100. So far I could not see any evidence that PQisBusy() returns true. > I think pgbench is not dealing with asynchronous input quite right. > As written, if the backend sends a response that doesn't fit into > a single TCP packet, pgbench will go into a tight loop of > PQisBusy/PQconsumeInput, which will not exit until the rest of the > response arrives. This would degrade the reported performance, first > because of the wasted CPU cycles, and second because other connections > wouldn't get serviced during that interval. > I haven't done enough tests to be sure that that isn't just a chance > variation, but I recommend changing the code as below anyway. Ok, I will commit your changes. Thanks. -- Tatsuo Ishii
Hi Tom, Have applied this change to the pg_autotune code, and am running tests against a remote database with that. Seems a *lot* more consistent results now. None of the wide swings in variation, etc. This will help in the case of pg_autotune quite a lot! :-) Regards and best wishes, Justin Clift Tom Lane wrote: > > I think pgbench is not dealing with asynchronous input quite right. > As written, if the backend sends a response that doesn't fit into > a single TCP packet, pgbench will go into a tight loop of > PQisBusy/PQconsumeInput, which will not exit until the rest of the > response arrives. This would degrade the reported performance, first > because of the wasted CPU cycles, and second because other connections > wouldn't get serviced during that interval. > > I am not sure how likely this is to happen, but I can report that > correcting it via the attached patch made a visible difference: > > [tgl@rh1 pgbench]$ ./pgbench -c 100 -t 100 bench1 > starting vacuum...end. > transaction type: TPC-B (sort of) > scaling factor: 100 > number of clients: 100 > number of transactions per client: 100 > number of transactions actually processed: 10000/10000 > tps = 34.748295 (including connections establishing) > tps = 34.803249 (excluding connections establishing) > > [tgl@rh1 pgbench]$ ./mybench -c 100 -t 100 bench1 -- patched > starting vacuum...end. > transaction type: TPC-B (sort of) > scaling factor: 100 > number of clients: 100 > number of transactions per client: 100 > number of transactions actually processed: 10000/10000 > tps = 36.849122 (including connections establishing) > tps = 36.909993 (excluding connections establishing) > > I haven't done enough tests to be sure that that isn't just a chance > variation, but I recommend changing the code as below anyway. > > regards, tom lane > > *** pgbench.c~ Sun Oct 6 12:01:44 2002 > --- pgbench.c Sun Oct 6 12:11:02 2002 > *************** > *** 184,200 **** > { /* are we receiver? */ > if (debug) > fprintf(stderr, "client %d receiving\n", n); > ! while (PQisBusy(st->con) == TRUE) > ! { > ! if (!PQconsumeInput(st->con)) > ! { /* there's something wrong */ > ! fprintf(stderr, "Client %d aborted in state %d. Probably the backend died while processing.\n",n, st->state); > ! remains--; /* I've aborted */ > ! PQfinish(st->con); > ! st->con = NULL; > ! return; > ! } > } > > switch (st->state) > { > --- 184,199 ---- > { /* are we receiver? */ > if (debug) > fprintf(stderr, "client %d receiving\n", n); > ! if (!PQconsumeInput(st->con)) > ! { /* there's something wrong */ > ! fprintf(stderr, "Client %d aborted in state %d. Probably the backend died while processing.\n",n, st->state); > ! remains--; /* I've aborted */ > ! PQfinish(st->con); > ! st->con = NULL; > ! return; > } > + if (PQisBusy(st->con)) > + return; /* don't have the whole result yet */ > > switch (st->state) > { > *************** > *** 367,383 **** > { /* are we receiver? */ > if (debug) > fprintf(stderr, "client %d receiving\n", n); > ! while (PQisBusy(st->con) == TRUE) > ! { > ! if (!PQconsumeInput(st->con)) > ! { /* there's something wrong */ > ! fprintf(stderr, "Client %d aborted in state %d. Probably the backend died while processing.\n",n, st->state); > ! remains--; /* I've aborted */ > ! PQfinish(st->con); > ! st->con = NULL; > ! return; > ! } > } > > switch (st->state) > { > --- 366,381 ---- > { /* are we receiver? */ > if (debug) > fprintf(stderr, "client %d receiving\n", n); > ! if (!PQconsumeInput(st->con)) > ! { /* there's something wrong */ > ! fprintf(stderr, "Client %d aborted in state %d. Probably the backend died while processing.\n",n, st->state); > ! remains--; /* I've aborted */ > ! PQfinish(st->con); > ! st->con = NULL; > ! return; > } > + if (PQisBusy(st->con)) > + return; /* don't have the whole result yet */ > > switch (st->state) > { > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
Justin Clift <justin@postgresql.org> writes: > Have applied this change to the pg_autotune code, and am running tests > against a remote database with that. > Seems a *lot* more consistent results now. It would be more likely to make a difference on a remote connection than a local one --- smaller packet size ... regards, tom lane