Re: Doubt in pgbench TPS number - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: Doubt in pgbench TPS number
Date
Msg-id alpine.DEB.2.10.1509300818520.22913@sto
Whole thread Raw
In response to Re: Doubt in pgbench TPS number  (Tatsuo Ishii <ishii@postgresql.org>)
List pgsql-hackers
Hello Tatsuo,

>> So on second thought the formula should rather be:
>>
>>   ...  / (is_connect? nclients: nthreads)
>
> I don't think this is quite correct.
>
> If is_connect is false, then following loop is executed in threadRun():
>
>         /* make connections to the database */
>         for (i = 0; i < nstate; i++)
>         {
>             if ((state[i].con = doConnect()) == NULL)
>                 goto done;
>         }

Yep. The loop initializes all client connections *BEFORE* starting any 
transactions on any client, the thread does only do connections at this 
time, which is included conn_time.

> Here, nstate is nclients/nthreads. Suppose nclients = 16 and nthreads
> = 2, then 2 threads run in parallel, and each thread is connecting 8
> times (nstate = 8) in *serial*.

Yes.

> The total connection time for this thread is calculated by "the time 
> ends the loop" - "the time starts the loop". So if the time to establish 
> a connection is 1 second, the total connection time for a thread will be 
> 8 seconds. Thus grand total of connection time will be 2 * 8 = 16 
> seconds.

Yes, 16 seconds in 2 threads, 8 seconds per thread of the "real time" of 
the test is spend in the connection, and no

> If is_connect is true, following loop is executed.
>
>     /* send start up queries in async manner */
>     for (i = 0; i < nstate; i++)
>     {
>         CState       *st = &state[i];
>         Command   **commands = sql_files[st->use_file];
>         int            prev_ecnt = st->ecnt;
>
>         st->use_file = getrand(thread, 0, num_files - 1);
>         if (!doCustom(thread, st, &thread->conn_time, logfile, &aggs))
>
> In the loop, exactly same thing happens as is_connect = false case. If
> t = 1, total connection time will be same as is_connect = false case,
> i.e. 16 seconds.

Without -C, 1 thread, 2 clients, if transactions take same time as 
connections:
 Client 1:  C-|TTTTTTTTTTTTTTTTTTTTTTTTTTT Client 2:  -C|TTTTTTTTTTTTTTTTTTTTTTTTTTT            <> connection time
(initialloop in threadRun)            <----------------------------> whole execution time
<------------------------->transaction time
 

The actual transaction time to consider on this drawing is whole time 
minus the connection time of the thread, which is serialised. It is not 
whole execution time minus connection time / 2 (number of clients), 
because of the '|' synchronisation (clients do not start before all other 
clients of the thread are connected).

With -C, the is no initial connection, the connections are managed within
doCustom, which is doing transaction processing asynchronously.
 Client 1:  CTCTCTCTCTCTCTCTCTCTCTCTCTCT- Client 2:  -CTCTCTCTCTCTCTCTCTCTCTCTCTCT
<--------------------------->whole execution time            <-------------------------->  measured connection time
 

While a client is connecting, the other client is performing its 
transaction in an asynchronous manner, so the measured connection time may 
be arbitrary close to the execution time, this was the bug you detected.

> In summary, I see no reason to change the v1 patch.

I still think that my revised thinking is the right one... I hope that the 
above drawings make my thinking clearer. For me, the initial formula was 
the right one when not using -C, only the -C case need fixing.

-- 
Fabien.



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Re: BRIN Scan: Optimize memory allocation in function 'bringetbitmap'
Next
From: Michael Paquier
Date:
Subject: Re: Use pg_rewind when target timeline was switched