Thread: About the tps explanation of pgbench, please help

About the tps explanation of pgbench, please help

From

Yanrui Hu

Date:

18 November 2014, 08:33:31

I am working on a evaluation to put db client outside the datacenter and to know how the network impact on the business.

After several round of testing, I have a question regarding to the two tps result in stress output.

Test A:

Client and DB server exist in same AWS datacenter.

transaction type: Custom query

scaling factor: 500

query mode: simple

number of clients: 25

number of threads: 25

duration: 600 s

number of transactions actually processed: 54502

tps = 90.814930 (including connections establishing)

tps = 204.574432 (excluding connections establishing)

Test B:

Client and DB server exist in different AWS datacenter (west and east).

transaction type: Custom query

scaling factor: 500

query mode: simple

number of clients: 25

number of threads: 25

duration: 600 s

number of transactions actually processed: 13966

tps = 23.235705 (including connections establishing)

tps = 42.915990 (excluding connections establishing)

Its obviously that both tps become lower if client and server do not exist in same datacetner since the network connection have more latency.

But I can not explain why the tps that excluding connections establishing is changed so much.

For my understanding, tps excluding connections establishing get rid of the time that create socket cost. That means in above two test cases(only network different), the tps excluding connections establishing should be very close, right? Because the database is same and capability is same only network latency is different.

Best Regards,

Yanrui Hu (Ray)

Re: About the tps explanation of pgbench, please help

From

John R Pierce

Date:

18 November 2014, 08:39:37

On 11/18/2014 12:33 AM, Yanrui Hu wrote:
> the tps excluding connections establishing should be very close,
> right? Because the database is same and capability is same only
> network latency is different.

that greatly latency is added to every sql command you send and get
results from.   why would you expect anything different?

--
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: About the tps explanation of pgbench, please help

From

Yanrui Hu

Date:

18 November 2014, 09:46:41

It the "connection establishing" means socket connect? If so, that 's the only different for my case A and B.

So the result of "exclude connection establishing" should be similar, right?

My overall test is to fingure out the what's the impact if my client access db server from out side datacenter( eg. internet).

On Tue, Nov 18, 2014 at 4:39 PM, John R Pierce <pierce@hogranch.com> wrote:

On 11/18/2014 12:33 AM, Yanrui Hu wrote:
the tps excluding connections establishing should be very close, right? Because the database is same and capability is same only network latency is different.

that greatly latency is added to every sql command you send and get results from. why would you expect anything different?

--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Best Regards,

Yanrui Hu (Ray)

Re: About the tps explanation of pgbench, please help

From

Francisco Olarte

Date:

18 November 2014, 10:27:05

Hi Yanrui:

On Tue, Nov 18, 2014 at 10:46 AM, Yanrui Hu <yhu@appannie.com> wrote:

It the "connection establishing" means socket connect? If so, that 's the only different for my case A and B.
So the result of "exclude connection establishing" should be similar, right?

When you connect to the database you need to stablish a socket connection, the server has to make some things ( like fork a postmaster to connect on that new sockets ), and then you have to negotiate passwords and this kind of things. This is the connection stablishing, and it is amortized among the live of a socket connection, even if you keep it open for months. As you have to send a lot of things on the net, this time is dependent on the network bandwidth and latency, typically more on latency as it normally consists on several small packets exchanges.

Then on every transaction / query you make you have to send the queries to the server and it has to transmit you the result, so this phase also has a network dependency. Normally more on latency than bandwidth too ( as present networks are not normally short and thin ( low latency, small bandwidth ), more likely balanced or fat and long in the extreme, but in some cases ( like sending a select * from a big table, which needs very little server work ) this can be dominated by bandwidth.

BOTH of the operations depend on network, that is why you get two sets of numbers, but ( with the pgbench case ) one is more dependent than the other. The including connection time is roughly equivalent to starting psql, sending the query, exiting, rinse and repear. The excluding one is roughly equivalent to starting psql ( send many queries ) exit as the end. Even if you open psql and keep it open forever if your connection bounces thrice to the geostationay orbit on each way you'll get slow queries, but you'll save the couple of extra seconds to open it on each query.

Francisco Olarte.

Re: About the tps explanation of pgbench, please help

From

Adrian Klaver

Date:

18 November 2014, 14:35:23

On 11/18/2014 12:33 AM, Yanrui Hu wrote:
> I am working on a evaluation to put db client outside the datacenter and
> to know how the network impact on the business.
> After several round of testing, I have a question regarding to the two
> tps result in stress output.
>
> Test A:
> Client and DB server exist in same AWS datacenter.
> transaction type: Custom query
> scaling factor: 500
> query mode: simple
> number of clients: 25
> number of threads: 25
> duration: 600 s
> number of transactions actually processed: 54502
> tps = 90.814930 (including connections establishing)
> tps = 204.574432 (excluding connections establishing)
>
> Test B:
> Client and DB server exist in different AWS datacenter (west and east).
> transaction type: Custom query
> scaling factor: 500
> query mode: simple
> number of clients: 25
> number of threads: 25
> duration: 600 s
> number of transactions actually processed: 13966
> tps = 23.235705 (including connections establishing)
> tps = 42.915990 (excluding connections establishing)
>
> Its obviously that both tps become lower if client and server do not
> exist in same datacetner since the network connection have more latency.
> But I can not explain why the tps that excluding connections
> establishing is changed so much.
> For my understanding, tps excluding connections establishing get rid of
> the time that create socket cost. That means in above two test
> cases(only network different), the tps excluding connections
> establishing should be very close, right?

Not that I can see from the numbers. In the non-network case you
processed 54,502 transactions over 600s and in the network case 13,966
transactions over 600s. Even if you factor out the connection
establishment you have fewer transactions over the same time period for
the network case. So there is no way the tps can be equivalent. As
others have pointed out this due to the effect of network latency on the
processing of the queries.

You might want to take a look at the Notes section of here:

http://www.postgresql.org/docs/9.3/static/pgbench.html

In particular the different logging options that are available. They may
make it easier to see what is going on.

Because the database is same
> and capability is same only network latency is different.
>
>
> --
> Best Regards,
>
> Yanrui Hu (Ray)


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: About the tps explanation of pgbench, please help

From

Yanrui Hu

Date:

19 November 2014, 01:48:35

Thanks Francisco,

I am partly understanding your explain.

the "including connection establish" case means the db connection not only socket connection, the "excluding connection establish" case include many db query and more depends on socket latency, right?

And what's your suggestion on my testing ? (to test network impact on my client user experience, my client on production environment is one dedicate machine already connected to database server with several connection, and occasionally send database select or update to database server depends on business).

My current test parameters is "-U pgbench -c 150 -j 150 -n -s 500 -T 60 -f script_1.sql -r -C"

Based on your mail, my senario is more like "excluding connection establish" case, right? So the network change make the capacity decrease to 1/5 (204->42). The network change impact the system so much!

On Tue, Nov 18, 2014 at 6:26 PM, Francisco Olarte <folarte@peoplecall.com> wrote:

Hi Yanrui:

On Tue, Nov 18, 2014 at 10:46 AM, Yanrui Hu <yhu@appannie.com> wrote:
It the "connection establishing" means socket connect? If so, that 's the only different for my case A and B.
So the result of "exclude connection establishing" should be similar, right?

When you connect to the database you need to stablish a socket connection, the server has to make some things ( like fork a postmaster to connect on that new sockets ), and then you have to negotiate passwords and this kind of things. This is the connection stablishing, and it is amortized among the live of a socket connection, even if you keep it open for months. As you have to send a lot of things on the net, this time is dependent on the network bandwidth and latency, typically more on latency as it normally consists on several small packets exchanges.

Then on every transaction / query you make you have to send the queries to the server and it has to transmit you the result, so this phase also has a network dependency. Normally more on latency than bandwidth too ( as present networks are not normally short and thin ( low latency, small bandwidth ), more likely balanced or fat and long in the extreme, but in some cases ( like sending a select * from a big table, which needs very little server work ) this can be dominated by bandwidth.

BOTH of the operations depend on network, that is why you get two sets of numbers, but ( with the pgbench case ) one is more dependent than the other. The including connection time is roughly equivalent to starting psql, sending the query, exiting, rinse and repear. The excluding one is roughly equivalent to starting psql ( send many queries ) exit as the end. Even if you open psql and keep it open forever if your connection bounces thrice to the geostationay orbit on each way you'll get slow queries, but you'll save the couple of extra seconds to open it on each query.

Francisco Olarte.

Best Regards,

Yanrui Hu (Ray)

Re: About the tps explanation of pgbench, please help

From

Yanrui Hu

Date:

19 November 2014, 01:53:45

Adrian,

I saw that in two case, one is 54502 transactions and the other is 13966 but that is caused by capacity decrease.

And fps is transaction per second, so it's not the transactions but transaction per second, so I don't think the total transactions different has any problem.

Please point if my understanding is not correct.

My initial plan is to know the impact if I move that db client (also a server runs web server with restful api) out side to internet.

On Tue, Nov 18, 2014 at 10:35 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 11/18/2014 12:33 AM, Yanrui Hu wrote:
I am working on a evaluation to put db client outside the datacenter and
to know how the network impact on the business.
After several round of testing, I have a question regarding to the two
tps result in stress output.

Test A:
Client and DB server exist in same AWS datacenter.
transaction type: Custom query
scaling factor: 500
query mode: simple
number of clients: 25
number of threads: 25
duration: 600 s
number of transactions actually processed: 54502
tps = 90.814930 (including connections establishing)
tps = 204.574432 (excluding connections establishing)

Test B:
Client and DB server exist in different AWS datacenter (west and east).
transaction type: Custom query
scaling factor: 500
query mode: simple
number of clients: 25
number of threads: 25
duration: 600 s
number of transactions actually processed: 13966
tps = 23.235705 (including connections establishing)
tps = 42.915990 (excluding connections establishing)

Its obviously that both tps become lower if client and server do not
exist in same datacetner since the network connection have more latency.
But I can not explain why the tps that excluding connections
establishing is changed so much.
For my understanding, tps excluding connections establishing get rid of
the time that create socket cost. That means in above two test
cases(only network different), the tps excluding connections
establishing should be very close, right?

Not that I can see from the numbers. In the non-network case you processed 54,502 transactions over 600s and in the network case 13,966 transactions over 600s. Even if you factor out the connection establishment you have fewer transactions over the same time period for the network case. So there is no way the tps can be equivalent. As others have pointed out this due to the effect of network latency on the processing of the queries.

You might want to take a look at the Notes section of here:

http://www.postgresql.org/docs/9.3/static/pgbench.html

In particular the different logging options that are available. They may make it easier to see what is going on.

Because the database is same
and capability is same only network latency is different.

--
Best Regards,

Yanrui Hu (Ray)

--
Adrian Klaver
adrian.klaver@aklaver.com

Best Regards,

Yanrui Hu (Ray)

Re: About the tps explanation of pgbench, please help

From

Adrian Klaver

Date:

19 November 2014, 02:58:51

On 11/18/2014 05:53 PM, Yanrui Hu wrote:
> Adrian,
> I saw that in two case, one is 54502 transactions and the other is 13966
> but that is caused by capacity decrease.
> And fps is transaction per second, so it's not the transactions but
> transaction per second, so I don't think the total transactions
> different has any problem.
> Please point if my understanding is not correct.

Alright

If:

kph = kilometer per hour = kilometer/hour

100 km/1 hr = 100 km/hr

200 km/1 hr = 200 km/hr

If you cover 100 km in 1 hour you have an average rate of speed of 100
km/hr if you cover 200 km in 1 hour your average rate of speed is 200 km/hr

then

tps = transactions per second = transactions/sec

54502 transactions/600 sec = 90.84 transactions/sec

13966 transactions/600 sec = 23.28 transactions/sec

The numbers are not exactly the same as the below, but that is probably
down to rounding error. They pass the close enough rule though:) Any way
you look at it, if run a two tests over the same time period and one
does less transactions then the other you will have different
transactions rates(tps) You where asking about the why behind the
different tps rates, the answer is above. In other words you cannot
ignore the raw numbers for the transactions.


> My initial plan is to know the impact if I move that db client (also a
> server runs web server with restful api) out side to internet.
>
> On Tue, Nov 18, 2014 at 10:35 PM, Adrian Klaver
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
>
>     On 11/18/2014 12:33 AM, Yanrui Hu wrote:
>
>         I am working on a evaluation to put db client outside the
>         datacenter and
>         to know how the network impact on the business.
>         After several round of testing, I have a question regarding to
>         the two
>         tps result in stress output.
>
>         Test A:
>         Client and DB server exist in same AWS datacenter.
>         transaction type: Custom query
>         scaling factor: 500
>         query mode: simple
>         number of clients: 25
>         number of threads: 25
>         duration: 600 s
>         number of transactions actually processed: 54502
>         tps = 90.814930 (including connections establishing)
>         tps = 204.574432 (excluding connections establishing)
>
>         Test B:
>         Client and DB server exist in different AWS datacenter (west and
>         east).
>         transaction type: Custom query
>         scaling factor: 500
>         query mode: simple
>         number of clients: 25
>         number of threads: 25
>         duration: 600 s
>         number of transactions actually processed: 13966
>         tps = 23.235705 (including connections establishing)
>         tps = 42.915990 (excluding connections establishing)
>
>         Its obviously that both tps become lower if client and server do not
>         exist in same datacetner since the network connection have more
>         latency.
>         But I can not explain why the tps that excluding connections
>         establishing is changed so much.
>         For my understanding, tps excluding connections establishing get
>         rid of
>         the time that create socket cost. That means in above two test
>         cases(only network different), the tps excluding connections
>         establishing should be very close, right?
>
>
>     Not that I can see from the numbers. In the non-network case you
>     processed 54,502 transactions over 600s and in the network case
>     13,966 transactions over 600s. Even if you factor out the connection
>     establishment you have fewer transactions over the same time period
>     for the network case. So there is no way the tps can be equivalent.
>     As others have pointed out this due to the effect of network latency
>     on the processing of the queries.
>
>     You might want to take a look at the Notes section of here:
>
>     http://www.postgresql.org/__docs/9.3/static/pgbench.html
>     <http://www.postgresql.org/docs/9.3/static/pgbench.html>
>
>     In particular the different logging options that are available. They
>     may make it easier to see what is going on.
>
>
>     Because the database is same
>
>         and capability is same only network latency is different.
>
>
>         --
>         Best Regards,
>
>         Yanrui Hu (Ray)
>
>
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>
>
>
>
> --
> Best Regards,
>
> Yanrui Hu (Ray)


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: About the tps explanation of pgbench, please help

From

Yanrui Hu

Date:

19 November 2014, 05:36:22

Adrian,

I understand you about the tps explain.

What I would like to know is how much the network changed impact on the tps changes in two cases.

On Wed, Nov 19, 2014 at 10:58 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 11/18/2014 05:53 PM, Yanrui Hu wrote:
Adrian,
I saw that in two case, one is 54502 transactions and the other is 13966
but that is caused by capacity decrease.
And fps is transaction per second, so it's not the transactions but
transaction per second, so I don't think the total transactions
different has any problem.
Please point if my understanding is not correct.

Alright

If:

kph = kilometer per hour = kilometer/hour

100 km/1 hr = 100 km/hr

200 km/1 hr = 200 km/hr

If you cover 100 km in 1 hour you have an average rate of speed of 100 km/hr if you cover 200 km in 1 hour your average rate of speed is 200 km/hr

then

tps = transactions per second = transactions/sec

54502 transactions/600 sec = 90.84 transactions/sec

13966 transactions/600 sec = 23.28 transactions/sec

The numbers are not exactly the same as the below, but that is probably down to rounding error. They pass the close enough rule though:) Any way you look at it, if run a two tests over the same time period and one does less transactions then the other you will have different transactions rates(tps) You where asking about the why behind the different tps rates, the answer is above. In other words you cannot ignore the raw numbers for the transactions.

My initial plan is to know the impact if I move that db client (also a
server runs web server with restful api) out side to internet.

On Tue, Nov 18, 2014 at 10:35 PM, Adrian Klaver
<adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:

On 11/18/2014 12:33 AM, Yanrui Hu wrote:

I am working on a evaluation to put db client outside the
datacenter and
to know how the network impact on the business.
After several round of testing, I have a question regarding to
the two
tps result in stress output.

Test A:
Client and DB server exist in same AWS datacenter.
transaction type: Custom query
scaling factor: 500
query mode: simple
number of clients: 25
number of threads: 25
duration: 600 s
number of transactions actually processed: 54502
tps = 90.814930 (including connections establishing)
tps = 204.574432 (excluding connections establishing)

Test B:
Client and DB server exist in different AWS datacenter (west and
east).
transaction type: Custom query
scaling factor: 500
query mode: simple
number of clients: 25
number of threads: 25
duration: 600 s
number of transactions actually processed: 13966
tps = 23.235705 (including connections establishing)
tps = 42.915990 (excluding connections establishing)

Its obviously that both tps become lower if client and server do not
exist in same datacetner since the network connection have more
latency.
But I can not explain why the tps that excluding connections
establishing is changed so much.
For my understanding, tps excluding connections establishing get
rid of
the time that create socket cost. That means in above two test
cases(only network different), the tps excluding connections
establishing should be very close, right?

Not that I can see from the numbers. In the non-network case you
processed 54,502 transactions over 600s and in the network case
13,966 transactions over 600s. Even if you factor out the connection
establishment you have fewer transactions over the same time period
for the network case. So there is no way the tps can be equivalent.
As others have pointed out this due to the effect of network latency
on the processing of the queries.

You might want to take a look at the Notes section of here:

http://www.postgresql.org/__docs/9.3/static/pgbench.html
<http://www.postgresql.org/docs/9.3/static/pgbench.html>

In particular the different logging options that are available. They
may make it easier to see what is going on.

Because the database is same

and capability is same only network latency is different.

--
Best Regards,

Yanrui Hu (Ray)

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

--
Best Regards,

Yanrui Hu (Ray)

--
Adrian Klaver
adrian.klaver@aklaver.com

Best Regards,

Yanrui Hu (Ray)

Re: About the tps explanation of pgbench, please help

From

John R Pierce

Date:

19 November 2014, 06:52:59

On 11/18/2014 5:48 PM, Yanrui Hu wrote:
> Based on your mail, my senario is more like "excluding connection
> establish" case, right? So the network change make the capacity
> decrease to 1/5 (204->42). The network change impact the system so much!
>

have you measured the packet latency, with ping or whatever, between the
'same DC' client and server, and the 'remote' client to server ?


--
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: About the tps explanation of pgbench, please help

From

John R Pierce

Date:

19 November 2014, 06:54:29

On 11/18/2014 9:36 PM, Yanrui Hu wrote:
> What I would like to know is how much the network changed impact on
> the tps changes in two cases.

you just measured that.



--
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: About the tps explanation of pgbench, please help

From

Francisco Olarte

Date:

19 November 2014, 08:24:49

Hi Yanrui:

On Wed, Nov 19, 2014 at 2:48 AM, Yanrui Hu <yhu@appannie.com> wrote:

I am partly understanding your explain.

Enough for me, it isn't a simple thing to write :)

the "including connection establish" case means the db connection not only socket connection, the "excluding connection establish" case include many db query and more depends on socket latency, right?

Right, as I said before, if you were doing the queries with psql including should be the time from shell prompt to shell prompt, excluding from psql prompt to psql prompt.

And what's your suggestion on my testing ? (to test network impact on my client user experience, my client on production environment is one dedicate machine already connected to database server with several connection, and occasionally send database select or update to database server depends on business).
My current test parameters is "-U pgbench -c 150 -j 150 -n -s 500 -T 60 -f script_1.sql -r -C"
Based on your mail, my senario is more like "excluding connection establish" case, right? So the network change make the capacity decrease to 1/5 (204->42). The network change impact the system so much!

I'm not too familiar with pgbench so I cannot comment on it, but it seems you network is slow and your DB fast. Bear in mind 42 transactions a second are quite a few. IIRC pgbench can be scripted to use the same type of queries as your DB, but if your client is a single machine with a single connection, you can easily do a simulator and test it.

I think the faster you can work with the DB with a normal client is about one query per RTT in autocommit mode, one per two RTT with explicit commits, and not knowing your exact network latency I cannot recommend anything, but if you go from one query each 5ms to one each 25 ms and your RTT is 20 ms I doubt anything can be done.

And the network changes impact the systems a lot, as it is a network server. Basically measure RTT ( use 100 copies of a 1k ping at least ), you are not going to be able to extract more than 1 query per RTT per connection. Solutions are increasing connections ( if your pipe is fat and long it can do wonders but the application needs to be able to do it ) and minimising RTT ( for this I've had good results pushing everything inside a single query in autocommit mode using stored procedures, so I only need one RTT per op ), but given the few details's you've given on your setup I cannot tell you more. Anything, those are the generic recomendations for any network service having troubles due to slow network.

Regards.

Francisco Olarte.