Thread: Doubt in pgbench TPS number
Today I got an interesting output from pgbench. The scenario is executing SELECT pg_sleep(0.1). $ cat test.sql select pg_sleep(0.1); $ pgbench -C -n -p 11002 -c 10 -T 30 -f test.sql test transaction type: Custom query scaling factor: 1 query mode: simple number of clients: 10 number of threads: 1 duration: 30 s number of transactions actually processed: 2883 latency average: 104.058 ms tps = 95.799114 (including connections establishing) tps = 124.487149 (excluding connections establishing) Interesting thing is, the number from "excluding connections establishing". 124.487149 tps means 0.008032 seconds per transaction. Since the query executes pg_sleep(0.1), I think the number should be equal to or greater than 0.1. Maybe a tolerance, but 20% of error seems to be too high for me. Note that if "-C" does not present, the TPS number seems to be sane. $ pgbench -n -p 11002 -c 10 -T 30 -f test.sql test transaction type: Custom query scaling factor: 1 query mode: simple number of clients: 10 number of threads: 1 duration: 30 s number of transactions actually processed: 2970 latency average: 101.010 ms tps = 98.692514 (including connections establishing) tps = 98.747053 (excluding connections establishing) -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Hello Tatsuo-san, > $ pgbench -C -n -p 11002 -c 10 -T 30 -f test.sql test > tps = 95.799114 (including connections establishing) > tps = 124.487149 (excluding connections establishing) > > Interesting thing is, the number from "excluding connections > establishing". 124.487149 tps means 0.008032 seconds per > transaction. Since the query executes pg_sleep(0.1), I think the > number should be equal to or greater than 0.1. Maybe a tolerance, but > 20% of error seems to be too high for me. Indeed. > Note that if "-C" does not present, the TPS number seems to be sane. Hmmm... I never use -C. The formula seems ok: tps_exclude = normal_xacts / (time_include - (INSTR_TIME_GET_DOUBLE(conn_total_time) / nthreads)); conn_total_time is the cumulated time spent by all threads. A quick look at the logic reveals some minor issues (conn_time is passed as an argument to doCustom, although it is already available as a "thread" field, stange). I spotted "threads" used instead of "thread" in an accumulation, but it is not related to this computation. Hmmm. I'll have a slower look... -- Fabien.
Hello Tatsuo, > Hmmm... I never use -C. The formula seems ok: > > tps_exclude = normal_xacts / (time_include - > (INSTR_TIME_GET_DOUBLE(conn_total_time) / nthreads)); Hmmm... it is not:-) I think that the degree of parallelism to consider is nclients, not nthreads: while connection time is accumulated in conn_time, other clients are possibly doing their transactions, in parallel, even if it is in the same thread, so it is not "stopped time" for all clients. It starts to matter with "-j 1 -c 30" and slow transactions, the cumulated conn_time in each thread may be arbitrary close to the whole time if there are many clients. Now, I do not think that this tps computation makes that much sense. If you want to know the tps without reconnect, run without reconnecting... It is clear that I do not get this figure when running without -C, so maybe the tps with/without reconnection should be dropped? Anyway, here is a patch, which: - fixes the "exclude" computation (s/nthreads/nclients/) - fixes the count total for skipped(s/threads/thread/) - removes a useless parameter to doCustom (the conn_time is available through the thread parameter). -- Fabien.
> I think that the degree of parallelism to consider is nclients, not > nthreads: while connection time is accumulated in conn_time, other > clients are possibly doing their transactions, in parallel, I'm not sure about this. I think pgbench will not start transactions until all clients establish connections to PostgreSQL. I found this while playing with pgpool-II. pgpool-II pre-forks child process, whose number is defined by "num_init_children" directive. What I observed was, pgbench starts connecting to pgpool-II until "-c" connections are established. So, if "-c" is larger than "num_init_children", no transaction starts. > even if it > is in the same thread, so it is not "stopped time" for all clients. It > starts to matter with "-j 1 -c 30" and slow transactions, the > cumulated conn_time in each thread may be arbitrary close to the whole > time if there are many clients. > > Now, I do not think that this tps computation makes that much > sense. If you want to know the tps without reconnect, run without > reconnecting... It is clear that I do not get this figure when running > without -C, so maybe > the tps with/without reconnection should be dropped? > > Anyway, here is a patch, which: > - fixes the "exclude" computation (s/nthreads/nclients/) > - fixes the count total for skipped (s/threads/thread/) > - removes a useless parameter to doCustom > (the conn_time is available through the thread parameter). I have tested your patch. It seems the tolerance is much better than before with your patch. With the patch: ./pgbench -C -n -p 11002 -c 10 -T 30 -f test.sql test transaction type: Custom query scaling factor: 1 query mode: simple number of clients: 10 number of threads: 1 duration: 30 s number of transactions actually processed: 2887 latency average: 103.914 ms tps = 95.896850 (including connections establishing) tps = 98.101662 (excluding connections establishing) Without the patch: ./pgbench -C -n -p 11002 -c 10 -T 30 -f test.sql test transaction type: Custom query scaling factor: 1 query mode: simple number of clients: 10 number of threads: 1 duration: 30 s number of transactions actually processed: 2887 latency average: 103.914 ms tps = 95.919415 (including connections establishing) tps = 124.732475 (excluding connections establishing) I'm going to commit your patch if there's no objection. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
> I have tested your patch. It seems the tolerance is much better than > before with your patch. [snip] > I'm going to commit your patch if there's no objection. I think we should commit this to master and 9.5 stable branch only because the fix significantly changes the benchmark result of pgbench. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Hello Tatsuo-san, >> I think that the degree of parallelism to consider is nclients, not >> nthreads: while connection time is accumulated in conn_time, other >> clients are possibly doing their transactions, in parallel, > > I'm not sure about this. I think pgbench will not start transactions > until all clients establish connections to PostgreSQL. I think that is true if "!is_connect", all client connections are performed at the beginning of threadRun, but under -C each client has its own connect/deconnect integrated within doCustom, so it is done in parallel to other clients having their transactions processed, hence the issue with the formula. > I found this while playing with pgpool-II. pgpool-II pre-forks child > process, whose number is defined by "num_init_children" > directive. What I observed was, pgbench starts connecting to pgpool-II > until "-c" connections are established. So, if "-c" is larger than > "num_init_children", no transaction starts. Yep, unless -C, I think, where some client may start nevertheless? Not sure... Does not matter. > I have tested your patch. It seems the tolerance is much better than > before with your patch. Yep. > I'm going to commit your patch if there's no objection. This seems fine with me. The formula change, and just this one, should probably be backported somehow, as this is a bug, wherever pgbench resides in older versions. It is just 's/nthreads/nclients/' in the printResult formula for computing tps_exclude. -- Fabien.
>> I'm not sure about this. I think pgbench will not start transactions >> until all clients establish connections to PostgreSQL. > > I think that is true if "!is_connect", all client connections are > performed at the beginning of threadRun, but under -C each client has > its own connect/deconnect integrated within doCustom, so it is done in > parallel to other clients having their transactions processed, hence > the issue with the formula. Really? I have tested with pgpool-II which is set to accept up to 2 connections, then run pgbench with -C and -c 32. pgbench was blocked as expected and I attached gdb and got stack trace: (gdb) bt #0 0x00007f48d5f17110 in __poll_nocancel () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007f48d6724056 in pqSocketCheck () from /usr/local/src/pgsql/current/lib/libpq.so.5 #2 0x00007f48d6724940 in pqWaitTimed () from /usr/local/src/pgsql/current/lib/libpq.so.5 #3 0x00007f48d671f3e2 in connectDBComplete () from /usr/local/src/pgsql/current/lib/libpq.so.5 #4 0x00007f48d671fbcf in PQconnectdbParams () from /usr/local/src/pgsql/current/lib/libpq.so.5 #5 0x0000000000402b2b in doConnect () at pgbench.c:650 #6 0x0000000000404591 in doCustom (thread=0x25c2f40, st=0x25c2770, conn_time=0x25c2f90, logfile=0x0, agg=0x7fffba224260)at pgbench.c:1353 #7 0x000000000040a1d5 in threadRun (arg=0x25c2f40) at pgbench.c:3581 #8 0x0000000000409ab4 in main (argc=12, argv=0x7fffba224668) at pgbench.c:3455 As you can see, one of threads wants to connect to PostgreSQL (actually pgpool-II) and waits for reply. In threadRun() around line 3581: /* send start up queries in async manner */for (i = 0; i < nstate; i++){ CState *st = &state[i]; Command **commands= sql_files[st->use_file]; int prev_ecnt = st->ecnt; st->use_file = getrand(thread, 0, num_files - 1); if (!doCustom(thread, st, &thread->conn_time, logfile, &aggs)) remains--; /* I've aborted */ if (st->ecnt > prev_ecnt && commands[st->state]->type == META_COMMAND) { fprintf(stderr, "client %d abortedin state %d; execution of meta-command failed\n", i, st->state); remains--; /* I'veaborted */ PQfinish(st->con); st->con = NULL; }} Here doCustome() is called with st->con == NULL. In doCustom() around line 1353: if (st->con == NULL){ instr_time start, end; INSTR_TIME_SET_CURRENT(start); if ((st->con = doConnect()) == NULL) { doConnect() blocks until PostgreSQL (pgpool-II) allows to be connected. Because outer loop in threadRun() wants to loop over until all threads succefully connects to PostgreSQL, pgbench is blocked here. /* send start up queries in async manner */for (i = 0; i < nstate; i++) >> I'm going to commit your patch if there's no objection. > > This seems fine with me. > > The formula change, and just this one, should probably be backported > somehow, as this is a bug, wherever pgbench resides in older > versions. It is just 's/nthreads/nclients/' in the printResult formula > for computing tps_exclude. Yeah, that's definitely a bug but I'm afraid the fix will change the TPS number and may break the backward compatibility. Since we have lived with bug for years, I hesitate to back port to older stable branches... Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
> #5 0x0000000000402b2b in doConnect () at pgbench.c:650 > #6 0x0000000000404591 in doCustom (thread=0x25c2f40, st=0x25c2770, > conn_time=0x25c2f90, logfile=0x0, agg=0x7fffba224260) at pgbench.c:1353 > #7 0x000000000040a1d5 in threadRun (arg=0x25c2f40) at pgbench.c:3581 > #8 0x0000000000409ab4 in main (argc=12, argv=0x7fffba224668) at pgbench.c:3455 Hmm. Ok, whatever the connection position (from doCustom or from threadRun), doConnect would block the thread. On the other hand, if you start several threads, probably those threads which could connect all their client would proceed. What looks to be needed would be some timeout when connecting to the server. >> The formula change, and just this one, should probably be backported >> somehow, as this is a bug, wherever pgbench resides in older >> versions. It is just 's/nthreads/nclients/' in the printResult formula >> for computing tps_exclude. > > Yeah, that's definitely a bug but I'm afraid the fix will change the > TPS number and may break the backward compatibility. Since we have > lived with bug for years, I hesitate to back port to older stable > branches... My 2¥: I do not think of a good argument to keep wrong tps numbers once it is known that there are plain wrong, especially as it is not a behavioral change as such which could block applications or whatever, just a different number printed at the end of a run. So I would not bother much with upward compatibility consideration in this case. -- Fabien.
>> Yeah, that's definitely a bug but I'm afraid the fix will change the >> TPS number and may break the backward compatibility. Since we have >> lived with bug for years, I hesitate to back port to older stable >> branches... > > My 2¥: I do not think of a good argument to keep wrong tps numbers > once it is known that there are plain wrong, especially as it is not a > behavioral change as such which could block applications or whatever, > just a different number printed at the end of a run. So I would not > bother much with upward compatibility consideration in this case. > > -- > Fabien. FYI, the bug was there since the thread support was introduced in commit: da0dfb4b1460c3701abc8ed5f516d138dc4654c That was in 9.0, which was released in 2010. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Fabien COELHO <coelho@cri.ensmp.fr> writes: >> Yeah, that's definitely a bug but I'm afraid the fix will change the >> TPS number and may break the backward compatibility. Since we have >> lived with bug for years, I hesitate to back port to older stable >> branches... > My 2�: I do not think of a good argument to keep wrong tps numbers once it > is known that there are plain wrong, especially as it is not a behavioral > change as such which could block applications or whatever, just a > different number printed at the end of a run. So I would not bother much > with upward compatibility consideration in this case. FWIW, I vote with Tatsuo-san. Such a change will break comparability of results with all previous versions, which means it's not something to do in minor releases, even if we now believe the previous results were somewhat bogus. Arguing that it's "not a behavioral change" seems quite loony to me: for most people, the TPS numbers are the only interesting output of pgbench. I think we should fix it in 9.5 and document it as an incompatible change. (Note: I've not read the patch, so this is not an opinion about its correctness.) regards, tom lane
Hello Tom, > FWIW, I vote with Tatsuo-san. Such a change will break comparability of > results I would not classify a performance measure as a "result compatibility" issue. What matters is the *correction* of the results. When a bug is fixed anywhere in pg it may change performance significantly for some tests, and I have never seen this as a relevant consideration not to fix a problem... > with all previous versions, which means it's not something to do > in minor releases, even if we now believe the previous results were > somewhat bogus. Arguing that it's "not a behavioral change" seems > quite loony to me: for most people, the TPS numbers are the only > interesting output of pgbench. I think that if people are interested in tps without reconnecting on each transaction they would not run with "-C" to trigger reconnections and then look at the "tps without connections" for the figure they want... so I do not think that keeping this error is worth anything. On the other hand, and on the principle, keeping the bug looks wrong. I cannot agree with the logic behind shipping something which is known bad, especially to display *optimistic* performances... It looks too much like the VW way:-) > I think we should fix it in 9.5 and document it as an incompatible > change. Hmm. > (Note: I've not read the patch, so this is not an opinion about its > correctness.) That is another question:-) -- Fabien.
> FWIW, I vote with Tatsuo-san. Such a change will break comparability of > results with all previous versions, which means it's not something to do > in minor releases, even if we now believe the previous results were > somewhat bogus. Arguing that it's "not a behavioral change" seems > quite loony to me: for most people, the TPS numbers are the only > interesting output of pgbench. > > I think we should fix it in 9.5 and document it as an incompatible change. Plus if 9.4 or earlier version of PostgreSQL user wants to use fixed version of pgbench, he/she can always use pgbench coming with 9.5 or later version of PostgreSQL. > (Note: I've not read the patch, so this is not an opinion about its > correctness.) As Fabien anayzed the problem was that pgbench simply uses wrong variable: nthreads (number of threads, specified by "-j" option) vs. nclients (number of clients, specified by "-c" option). @@ -2616,7 +2616,7 @@ printResults(int ttype, int64 normal_xacts, int nclients, time_include = INSTR_TIME_GET_DOUBLE(total_time); tps_include = normal_xacts / time_include; tps_exclude = normal_xacts / (time_include- - (INSTR_TIME_GET_DOUBLE(conn_total_time) / nthreads)); + (INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients)); Here conn_total_time is the sum of time to establish connection to PostgreSQL. Since establishing connections to PostgreSQL is done in serial rather than in parallel, conn_total_time should have been divided by nclients. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
>> (Note: I've not read the patch, so this is not an opinion about its >> correctness.) > > As Fabien anayzed the problem was that pgbench simply uses wrong > variable: nthreads (number of threads, specified by "-j" option) > vs. nclients (number of clients, specified by "-c" option). > > @@ -2616,7 +2616,7 @@ printResults(int ttype, int64 normal_xacts, int nclients, > time_include = INSTR_TIME_GET_DOUBLE(total_time); > tps_include = normal_xacts / time_include; > tps_exclude = normal_xacts / (time_include - > - (INSTR_TIME_GET_DOUBLE(conn_total_time) / nthreads)); > + (INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients)); > > Here conn_total_time is the sum of time to establish connection to > PostgreSQL. Since establishing connections to PostgreSQL is done in > serial rather than in parallel, conn_total_time should have been > divided by nclients. After some more thinking and looking again at the connection code, I revise slightly my diagnostic: - the amount of parallelism is "nclients", as discussed above, when reconnecting on each transaction (-C) because the connections are managed in parallel from doCustom, * BUT * - if there is no reconnections (not -C) the connections are performed in threadRun in a sequential way, all clients wait for the connections of other clients in the same thread before starting processing transactions, so "nthreads" is the right amount of parallelism in this case. So on second thought the formula should rather be: ... / (is_connect? nclients: nthreads) Here is a v2 with this formula. Note that it does not take into account thread creation time, which might be significant, the start and end of a pgbench run are quite fuzzy. I've removed the doCustom parameter change as it is included in a larger patch I submitted about reworking stat collections in pgbench, so this attached patch is bug fix only. For the record, I still plainly disagree with the idea of shipping a performance measuring tool which knowingly displays wrong and optimistic figures under some conditions. -- Fabien.
>> Here conn_total_time is the sum of time to establish connection to >> PostgreSQL. Since establishing connections to PostgreSQL is done in >> serial rather than in parallel, conn_total_time should have been >> divided by nclients. > > After some more thinking and looking again at the connection code, I > revise slightly my diagnostic: > > - the amount of parallelism is "nclients", as discussed above, when > - reconnecting on each transaction (-C) because the connections are > - managed in parallel from doCustom, > > * BUT * > > - if there is no reconnections (not -C) the connections are performed in > - threadRun in a sequential way, all clients wait for the connections of > - other clients in the same thread before starting processing > - transactions, so "nthreads" is the right amount of parallelism in this > - case. > > So on second thought the formula should rather be: > > ... / (is_connect? nclients: nthreads) I don't think this is quite correct. If is_connect is false, then following loop is executed in threadRun(): /* make connections to the database */ for (i = 0; i < nstate; i++) { if ((state[i].con = doConnect()) ==NULL) goto done; } Here, nstate is nclients/nthreads. Suppose nclients = 16 and nthreads = 2, then 2 threads run in parallel, and each thread is connecting 8 times (nstate = 8) in *serial*. The total connection time for this thread is calculated by "the time ends the loop" - "the time starts the loop". So if the time to establish a connection is 1 second, the total connection time for a thread will be 8 seconds. Thus grand total of connection time will be 2 * 8 = 16 seconds. If is_connect is true, following loop is executed. /* send start up queries in async manner */for (i = 0; i < nstate; i++){ CState *st = &state[i]; Command **commands= sql_files[st->use_file]; int prev_ecnt = st->ecnt; st->use_file = getrand(thread, 0, num_files - 1); if (!doCustom(thread, st, &thread->conn_time, logfile, &aggs)) In the loop, exactly same thing happens as is_connect = false case. If t = 1, total connection time will be same as is_connect = false case, i.e. 16 seconds. In summary, I see no reason to change the v1 patch. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
>> FWIW, I vote with Tatsuo-san. Such a change will break comparability of >> results with all previous versions, which means it's not something to do >> in minor releases, even if we now believe the previous results were >> somewhat bogus. Arguing that it's "not a behavioral change" seems >> quite loony to me: for most people, the TPS numbers are the only >> interesting output of pgbench. >> >> I think we should fix it in 9.5 and document it as an incompatible change. > > Plus if 9.4 or earlier version of PostgreSQL user wants to use fixed > version of pgbench, he/she can always use pgbench coming with 9.5 or > later version of PostgreSQL. > >> (Note: I've not read the patch, so this is not an opinion about its >> correctness.) > > As Fabien anayzed the problem was that pgbench simply uses wrong > variable: nthreads (number of threads, specified by "-j" option) > vs. nclients (number of clients, specified by "-c" option). > > @@ -2616,7 +2616,7 @@ printResults(int ttype, int64 normal_xacts, int nclients, > time_include = INSTR_TIME_GET_DOUBLE(total_time); > tps_include = normal_xacts / time_include; > tps_exclude = normal_xacts / (time_include - > - (INSTR_TIME_GET_DOUBLE(conn_total_time) / nthreads)); > + (INSTR_TIME_GET_DOUBLE(conn_total_time) / nclients)); > > Here conn_total_time is the sum of time to establish connection to > PostgreSQL. Since establishing connections to PostgreSQL is done in > serial rather than in parallel, conn_total_time should have been > divided by nclients. Fix committed to master and 9.5 stable branches. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Hello Tatsuo, >> So on second thought the formula should rather be: >> >> ... / (is_connect? nclients: nthreads) > > I don't think this is quite correct. > > If is_connect is false, then following loop is executed in threadRun(): > > /* make connections to the database */ > for (i = 0; i < nstate; i++) > { > if ((state[i].con = doConnect()) == NULL) > goto done; > } Yep. The loop initializes all client connections *BEFORE* starting any transactions on any client, the thread does only do connections at this time, which is included conn_time. > Here, nstate is nclients/nthreads. Suppose nclients = 16 and nthreads > = 2, then 2 threads run in parallel, and each thread is connecting 8 > times (nstate = 8) in *serial*. Yes. > The total connection time for this thread is calculated by "the time > ends the loop" - "the time starts the loop". So if the time to establish > a connection is 1 second, the total connection time for a thread will be > 8 seconds. Thus grand total of connection time will be 2 * 8 = 16 > seconds. Yes, 16 seconds in 2 threads, 8 seconds per thread of the "real time" of the test is spend in the connection, and no > If is_connect is true, following loop is executed. > > /* send start up queries in async manner */ > for (i = 0; i < nstate; i++) > { > CState *st = &state[i]; > Command **commands = sql_files[st->use_file]; > int prev_ecnt = st->ecnt; > > st->use_file = getrand(thread, 0, num_files - 1); > if (!doCustom(thread, st, &thread->conn_time, logfile, &aggs)) > > In the loop, exactly same thing happens as is_connect = false case. If > t = 1, total connection time will be same as is_connect = false case, > i.e. 16 seconds. Without -C, 1 thread, 2 clients, if transactions take same time as connections: Client 1: C-|TTTTTTTTTTTTTTTTTTTTTTTTTTT Client 2: -C|TTTTTTTTTTTTTTTTTTTTTTTTTTT <> connection time (initialloop in threadRun) <----------------------------> whole execution time <------------------------->transaction time The actual transaction time to consider on this drawing is whole time minus the connection time of the thread, which is serialised. It is not whole execution time minus connection time / 2 (number of clients), because of the '|' synchronisation (clients do not start before all other clients of the thread are connected). With -C, the is no initial connection, the connections are managed within doCustom, which is doing transaction processing asynchronously. Client 1: CTCTCTCTCTCTCTCTCTCTCTCTCTCT- Client 2: -CTCTCTCTCTCTCTCTCTCTCTCTCTCT <--------------------------->whole execution time <--------------------------> measured connection time While a client is connecting, the other client is performing its transaction in an asynchronous manner, so the measured connection time may be arbitrary close to the execution time, this was the bug you detected. > In summary, I see no reason to change the v1 patch. I still think that my revised thinking is the right one... I hope that the above drawings make my thinking clearer. For me, the initial formula was the right one when not using -C, only the -C case need fixing. -- Fabien.
Hi, On 2015-09-25 20:35:45 +0200, Fabien COELHO wrote: > > Hello Tatsuo, > > > Hmmm... I never use -C. The formula seems ok: > > > > tps_exclude = normal_xacts / (time_include - > > (INSTR_TIME_GET_DOUBLE(conn_total_time) / nthreads)); > > Hmmm... it is not:-) > > I think that the degree of parallelism to consider is nclients, not > nthreads: while connection time is accumulated in conn_time, other clients > are possibly doing their transactions, in parallel, even if it is in the > same thread, so it is not "stopped time" for all clients. It starts to > matter with "-j 1 -c 30" and slow transactions, the cumulated conn_time in > each thread may be arbitrary close to the whole time if there are many > clients. I think this pretty much entirely broke the tps_exclude logic when not using -C, especially when -c and -j differ. The wait time there is actually per thread, not per client. In this example I set post_auth_delay=1s on the server. Pgbench tells me: pgbench -M prepared -c 180 -j 180 -T 10 -P1 -S tps = 897607.544862 (including connections establishing) tps = 1004793.708611 (excluding connections establishing) pgbench -M prepared -c 180 -j 60 -T 10 -P1 -S tps = 739502.979613 (including connections establishing) tps = 822639.038779 (excluding connections establishing) pgbench -M prepared -c 180 -j 30 -T 10 -P1 -S tps = 376468.177081 (including connections establishing) tps = 418554.527585 (excluding connections establishing) which pretty obviously is bogus. While I'd not expect it'd to work perfectly, the "excluding" number should stay roughly constant. The fundamental issue is that without -C *none* of the connections in each thread gets to actually execute work before all of them have established a connection. So dividing conn_total_time by / nclients is wrong. For more realistic connection delays this leads to the 'excluding' number being way too close to the 'including' number, even if a substantial portion of the time is spent on it. Greetings, Andres Freund
On 2020-02-27 12:26:36 -0800, Andres Freund wrote: > Hi, > > On 2015-09-25 20:35:45 +0200, Fabien COELHO wrote: > > > > Hello Tatsuo, > > > > > Hmmm... I never use -C. The formula seems ok: > > > > > > tps_exclude = normal_xacts / (time_include - > > > (INSTR_TIME_GET_DOUBLE(conn_total_time) / nthreads)); > > > > Hmmm... it is not:-) > > > > I think that the degree of parallelism to consider is nclients, not > > nthreads: while connection time is accumulated in conn_time, other clients > > are possibly doing their transactions, in parallel, even if it is in the > > same thread, so it is not "stopped time" for all clients. It starts to > > matter with "-j 1 -c 30" and slow transactions, the cumulated conn_time in > > each thread may be arbitrary close to the whole time if there are many > > clients. > > I think this pretty much entirely broke the tps_exclude logic when not > using -C, especially when -c and -j differ. The wait time there is > actually per thread, not per client. > > In this example I set post_auth_delay=1s on the server. Pgbench > tells me: > pgbench -M prepared -c 180 -j 180 -T 10 -P1 -S > tps = 897607.544862 (including connections establishing) > tps = 1004793.708611 (excluding connections establishing) > > pgbench -M prepared -c 180 -j 60 -T 10 -P1 -S > tps = 739502.979613 (including connections establishing) > tps = 822639.038779 (excluding connections establishing) > > pgbench -M prepared -c 180 -j 30 -T 10 -P1 -S > tps = 376468.177081 (including connections establishing) > tps = 418554.527585 (excluding connections establishing) > > which pretty obviously is bogus. While I'd not expect it'd to work > perfectly, the "excluding" number should stay roughly constant. > > > The fundamental issue is that without -C *none* of the connections in > each thread gets to actually execute work before all of them have > established a connection. So dividing conn_total_time by / nclients is > wrong. > > For more realistic connection delays this leads to the 'excluding' > number being way too close to the 'including' number, even if a > substantial portion of the time is spent on it. Not suggesting it as an actual fix, but just multiplying the computed conn_time within runThread() by the number of connections the thread handles (for the non -C case) results in much more reasonable output. pgbench -M prepared -c 180 -j 30 -T 10 -P1 -S before: tps = 378393.985650 (including connections establishing) tps = 420691.025930 (excluding connections establishing) after: tps = 379818.929680 (including connections establishing) tps = 957405.785600 (excluding connections establishing) pgbench -M prepared -c 180 -j 180 -T 10 -P1 -S before: tps = 906662.031099 (including connections establishing) tps = 1012223.500880 (excluding connections establishing) after: tps = 906178.154876 (including connections establishing) tps = 1012431.154463 (excluding connections establishing) The numbers are still a bit bogus because of thread startup overhead being included, and conn_time being computed relative to the thread creation. But they at least make some basic sense. diff --git i/src/bin/pgbench/pgbench.c w/src/bin/pgbench/pgbench.c index 1159757acb0..3bc45107136 100644 --- i/src/bin/pgbench/pgbench.c +++ w/src/bin/pgbench/pgbench.c @@ -6265,6 +6265,16 @@ threadRun(void *arg) INSTR_TIME_SET_CURRENT(thread->conn_time); INSTR_TIME_SUBTRACT(thread->conn_time, thread->start_time); + /* add once for each other connection */ + if (!is_connect) + { + instr_time e = thread->conn_time; + for (i = 0; i < (nstate - 1); i++) + { + INSTR_TIME_ADD(thread->conn_time, e); + } + } + /* explicitly initialize the state machines */ for (i = 0; i < nstate; i++) { Greetings, Andres Freund