Thread: client performance v.s. server statistics

client performance v.s. server statistics

From

Zhou Han

Date:

14 February 2012, 23:59:58

Hi,<br /><br />I am checking a performance problem encountered after porting old embeded DB to postgreSQL. While the
systemis real-time sensitive, we are concerning for per-query cost. In our environment sequential scanning (select *
from...) for a table with tens of thousands of record costs 1 - 2 seconds, regardless of using ODBC driver or the
"timing"result shown in psql client (which in turn, relies on libpq). However, using EXPLAIN ANALYZE, or checking the
statisticsin pg_stat_statement view, the query costs only less than 100ms.<br /><br />So, is it client interface (ODBC,
libpq)'s cost mainly due to TCP? Has the pg_stat_statement or EXPLAIN ANALYZE included the cost of copying tuples from
sharedbuffers to result sets?<br /><br />Could you experts share your views on this big gap? And any suggestions to
optimise?<br/><br />P.S. In our original embeded DB a "fastpath" interface is provided to read directly from shared
memoryfor the records, thus provides extremely realtime access (of course sacrifice some other features such as
consistency).<br/><br />Best regards,<br />Han<br />

Re: client performance v.s. server statistics

From

Zhou Han

Date:

15 February 2012, 03:02:07

Hi,

I have tried unix domain socket and the performance is similar with TCP socket. It is MIPS architecture so memory copy to/from kernel can occupy much time, and apparently using unit domain socket has no difference than TCP in terms of memory copy.

But it is still unbelievable for the ten-fold gap between the client side statistic and the server side statistics. So I want to know what exactly the operations are involved in the server side statistics in EXPLAIN ANALYZE. May I check the code later on when I get time.

For the query itself, it was just for performance comparison. There are other index based queries, which are of course much faster, but still result in similar ten-fold of time gap between client side and server side statistics.

I am thinking of non-kernel involved client interface, is there such an option, or do I have to develop one from scratch?

Best regards,
Han

On Wed, Feb 15, 2012 at 1:23 PM, Amit Kapila <amit.kapila@huawei.com> wrote:

>>So, is it client interface (ODBC, libpq) 's cost mainly due to TCP?

The difference as compare to your embedded DB you are seeing is mainly seems to be due to TCP.
One optimization you can use is to use Unix-domain socket mode of PostgreSQL. You can refer unix_socket_directory parameter in postgresql.conf and other related parameters.
I am suggesting you this as earlier you were using embedded DB, so your client/server should be on same machine. If now this is not the case then it will not work.

Can you please clarify some more things like
1. After doing sequence scan, do you need all the records in client for which seq. scan is happening. If less records then why you have not created index.
2. What is exact scenario for fetching records

pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Zhou Han
Sent: Wednesday, February 15, 2012 9:30 AM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] client performance v.s. server statistics

Hi,

I am checking a performance problem encountered after porting old embeded DB to postgreSQL. While the system is real-time sensitive, we are concerning for per-query cost. In our environment sequential scanning (select * from ...) for a table with tens of thousands of record costs 1 - 2 seconds, regardless of using ODBC driver or the "timing" result shown in psql client (which in turn, relies on libpq). However, using EXPLAIN ANALYZE, or checking the statistics in pg_stat_statement view, the query costs only less than 100ms.
rface (ODBC, libpq) 's cost mainly due to TCP? Has the pg_stat_statement or EXPLAIN ANALYZE included the cost of copying tuples from shared buffers to result sets?

Could you experts share your views on this big gap? And any suggestions to optimise?

P.S. In our original embeded DB a "fastpath" interface is provided to read directly from shared memory for the records, thus provides extremely realtime access (of course sacrifice some other features such as consistency).

Best regards,
Han

Re: client performance v.s. server statistics

From

Amit Kapila

Date:

15 February 2012, 14:57:13

<div class="WordSection1"><p class="MsoNormal">>>So, is it client interface (ODBC, libpq) 's cost mainly due to
TCP?<pclass="MsoNormal"> <p class="MsoNormal">The difference as compare to your embedded DB you are seeing is mainly
seemsto be due to TCP.<p class="MsoNormal">One optimization you can use is to use Unix-domain socket mode of
PostgreSQL.You can refer unix_socket_directory parameter in postgresql.conf and other related parameters. <p
class="MsoNormal">Iam suggesting you this as earlier you were using embedded DB, so your client/server should be on
samemachine. If now this is not the case then it will not work.<p class="MsoNormal"> <p class="MsoNormal">Can you
pleaseclarify some more things like<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l0 level1
lfo1">1.<spanstyle="font:7.0pt "Times New Roman"">      </span>After doing sequence scan, do you need all the records
inclient for which seq. scan is happening. If less records then why you have not created index.<p
class="MsoListParagraph"style="text-indent:-.25in;mso-list:l0 level1 lfo1"><span style="mso-list:Ignore">2.<span
style="font:7.0pt"Times New Roman"">      </span></span>What is exact scenario for fetching records<p
class="MsoNormal"> <pclass="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><pclass="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><div
style="border:none;border-top:solid#B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in"><p class="MsoNormal"><b><span
font-size:10.0pt;font-family:"tahoma","sans-serif"'="font-size:10.0pt;font-family:"Tahoma","sans-serif"'"
style="font-size:10.0pt;font-fami
>From:</span></b><span style="> pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org]<b>On Behalf Of </b>Zhou Han<br /><b>Sent:</b> Wednesday, February 15, 2012
9:30AM<br /><b>To:</b> pgsql-hackers@postgresql.org<br /><b>Subject:</b> [HACKERS] client performance v.s. server
statistics</span></b></div><pclass="MsoNormal"> <p class="MsoNormal">Hi,<br /><br />I am checking a performance problem
encounteredafter porting old embeded DB to postgreSQL. While the system is real-time sensitive, we are concerning for
per-querycost. In our environment sequential scanning (select * from ...) for a table with tens of thousands of record
costs1 - 2 seconds, regardless of using ODBC driver or the "timing" result shown in psql client (which in turn, relies
onlibpq). However, using EXPLAIN ANALYZE, or checking the statistics in pg_stat_statement view, the query costs only
lessthan 100ms.<br /> rface (ODBC, libpq) 's cost mainly due to TCP? Has the pg_stat_statement or EXPLAIN ANALYZE
includedthe cost of copying tuples from shared buffers to result sets?<br /><br />Could you experts share your views on
thisbig gap? And any suggestions to optimise?<br /><br />P.S. In our original embeded DB a "fastpath" interface is
providedto read directly from shared memory for the records, thus provides extremely realtime access (of course
sacrificesome other features such as consistency).<br /><br />Best regards,<br />Han</div>

Re: client performance v.s. server statistics

From

Amit Kapila

Date:

15 February 2012, 14:57:29

>>So I want to know what exactly the operations are involved in the server side statistics in EXPLAIN ANALYZE

It gives the time for execution of Query on server. According to my knowledge, it doesn’t account for data to send over TCP.

From: Zhou Han [mailto:zhouhan@gmail.com]
Sent: Wednesday, February 15, 2012 12:32 PM
To: Amit Kapil kers@postgresql.org
Subject: Re: [HACKERS] client performance v.s. server statistics

Hi,

I have tried unix domain socket and the performance is similar with TCP socket. It is MIPS architecture so memory copy to/from kernel can occupy much time, and apparently using unit domain socket has no difference than TCP in terms of memory copy.

But it is still unbelievable for the ten-fold gap between the client side statistic and the server side statistics. So I want to know what exactly the operations are involved in the server side statistics in EXPLAIN ANALYZE. May I check the code later on when I get time.

For the query itself, it was just for performance comparison. There are other index based queries, which are of course much faster, but still result in similar ten-fold of time gap between client side and server side statistics.

I am t olved client interface, is there such an option, or do I have to develop one from scratch?

Best regards,
Han

On Wed, Feb 15, 2012 at 1:23 PM, Amit Kapila <amit.kapila@huawei.com> wrote:

>>So, is it client interface (ODBC, libpq) 's cost mainly due to TCP?

The difference as compare to your embedded DB you are seeing is mainly seems to be due to TCP.

One optimization you can use is to use Unix-domain socket mode of PostgreSQL. You can refer unix_socket_directory parameter in post ated parameters.

I am suggesting you this as earlier you were using embedded DB, so your client/server should be on same machine. If now this is not the case then it will not work.

Can you please clarify some more things like

1. After doing sequence scan, do you need all the records in client for which seq. scan is happening. If less records then why you have not created index.

2. What is exact scenario for fetching records

pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Zhou Han
Sent: Wednesday, February 15, 2012 9:30 AM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] client performance v.s. server statistics

Hi,

I am checking a performance problem encountered after porting old embeded DB to postgreSQL. While the system is real-time sensitive, we are concerning for per-query cost. In our environment sequential scanning (select * from ...) for a table with tens of thousands of record costs 1 - 2 seconds, regardless of using ODBC driver or the "timing" result shown in psql client (which in turn, relies on libpq). However, using EXPLAIN ANALYZE, or checking the statistics in pg_stat_statement view, the query costs only less than 100ms.

rface (ODBC, libpq) 's cost mainly due to TCP? Has the pg_stat_statement or EXPLAIN ANALYZE included the cost of copying tuples from shared buffers to result sets?

Could you experts share your views on this big gap? And any suggestions to optimise?

P.S. In our original embeded DB a "fastpath" read directly from shared memory for the records, thus provides extremely realtime access (of course sacrifice some other features such as consistency).

Best regards,
Han