Thread: Should PQconsumeInput/PQisBusy be expensive to use?

Should PQconsumeInput/PQisBusy be expensive to use?

From
Michael Clark
Date:
Hello everyone.

I have been investigating the PG async calls and trying to determine whether I should go down the road of using them.

In doing some experiments I found that using PQsendQueryParams/PQconsumeInput/PQisBusy/PQgetResult produces slower results than simply calling PQexecParams.
Upon some investigation I found that not calling PQconsumeInput/PQisBusy produces results in line with PQexecParams (which PQexecParams seems to be doing under the hood).

I profiled my test and found this calling stack:
(This is OS X 10.6)

lo_unix_scall
   recvfrom$UNIX2003
      recv$UNIX2003
         pqsecure_read
            pqReadData
               PQconsumeInput
                  .....


This showed up as the hottest part of the execution by far.  This was a pretty simple test of fetching 6000+ rows.

If I remove the PQconsumeInput/PQisBusy calls, which essentially makes the code blocking this hot spot goes away.

Fetching 1000 rows goes from <.5 seconds to >3 seconds when I have the PQconsumeInput/PQisBusy calls in.


I was wondering if maybe I am doing something wrong, or if there is a technique that might help reduce this penalty?

Thanks in advance for any suggestions,
Michael.

P.S. here is a code snippet of what I am doing basically:
(please keep in mind this is just test code and rather simplistic...)

    int send_result = PQsendQueryParams(self.db,
                                        [sql UTF8String],
                                        i,
                                        NULL,
                                        (const char *const *)vals,
                                        (const int *)lens,
                                        (const int *)formats,
                                        kTextResultFormat);
    int consume_result = 0;
    int is_busy_result = 0;
    
    while ( ((consume_result = PQconsumeInput(self.db)) == 1) && ((is_busy_result = PQisBusy(self.db)) == 1) )
        ;
    
    if (consume_result != 1)
        NSLog(@"Got an error in PQconsumeInput");

    PGresult* res = PQgetResult(self.db);
    while (PQgetResult(self.db) != NULL)
        NSLog(@"Oops, seems we got an extra response?");

Re: Should PQconsumeInput/PQisBusy be expensive to use?

From
Alex Hunsaker
Date:
On Wed, Oct 27, 2010 at 15:02, Michael Clark <codingninja@gmail.com> wrote:
> Hello everyone.
> Upon some investigation I found that not calling PQconsumeInput/PQisBusy
> produces results in line with PQexecParams (which PQexecParams seems to be
> doing under the hood).

> (please keep in mind this is just test code and rather simplistic...)
>     int send_result = PQsendQueryParams(self.db,
>                                         [sql UTF8String],
>                                         i,
>                                         NULL,
>                                         (const char *const *)vals,
>                                         (const int *)lens,
>                                         (const int *)formats,
>                                         kTextResultFormat);
>     int consume_result = 0;
>     int is_busy_result = 0;
>
>     while ( ((consume_result = PQconsumeInput(self.db)) == 1) &&
> ((is_busy_result = PQisBusy(self.db)) == 1) )
>         ;

You really want to select() or equivalent here...  This basically is a
busy loop using 100% cpu; neither PQconsumeInput or PQisBusy do any
kind of sleeping...

Something like:
fd_set read_mask;
int sock = PQsocket(st->con);
FD_ZERO(&read_mask);
FD_SET(sock, &read_mask);

while(1)
{
  struct timeval tv = {5, 0};
  select(sock+1, &read_mask, NULL, NULL, &tv);
  PQconsumeInput(self.db)
  if(!PQisBusy(self.db))
    break;
}

or something...

Re: Should PQconsumeInput/PQisBusy be expensive to use?

From
David Wilson
Date:


On Wed, Oct 27, 2010 at 5:02 PM, Michael Clark <codingninja@gmail.com> wrote:

    while ( ((consume_result = PQconsumeInput(self.db)) == 1) && ((is_busy_result = PQisBusy(self.db)) == 1) )
        ;
    

The problem with this code is that it's effectively useless as a test. You're just spinning in a loop; if you don't have anything else to be doing while waiting for responses, then this sort of calling pattern is always going to be worse than just blocking.

Only do async if you actually have an async problem, and only do a performance test on it if you're actually doing a real async test, otherwise the results are fairly useless.

--
- David T. Wilson
david.t.wilson@gmail.com

Re: Should PQconsumeInput/PQisBusy be expensive to use?

From
Tom Lane
Date:
Michael Clark <codingninja@gmail.com> writes:
> In doing some experiments I found that using
> PQsendQueryParams/PQconsumeInput/PQisBusy/PQgetResult produces slower
> results than simply calling PQexecParams.

Well, PQconsumeInput involves at least one extra kernel call (to see
whether data is available) so I don't know why this surprises you.
The value of those functions is if your application can do something
else useful while it's waiting.  If the data comes back so fast that
you can't afford any extra cycles expended on the client side, then
you don't have any use for those functions.

However, if you do have something useful to do, the problem with
this example code is that it's not doing that, it's just spinning:

>     while ( ((consume_result = PQconsumeInput(self.db)) == 1) &&
> ((is_busy_result = PQisBusy(self.db)) == 1) )
>         ;

That's a busy-wait loop, so it's no wonder you're eating cycles there.
You want to sleep, or more likely do something else productive, when
there is no data available from the underlying socket.  Usually the
idea is to include libpq's socket in the set of files being watched
by select() or poll(), and dispatch off to something that absorbs
the data whenever you see some data is available to read.

            regards, tom lane

Re: Should PQconsumeInput/PQisBusy be expensive to use?

From
Michael Clark
Date:
Hello all.

Thanks a lot for the responses, they are appreciated.

I think I now understand the folly of my loop, and how that was negatively impacting my "test".

I tried the suggestion Alex and Tom made to change my loop with a select() and my results are now very close to the non-async version.

The main reason for looking at this API is not to support async in our applications, that is being achieved architecturally in a PG agnostic way.  It is to give our PG agnostic layer the ability to cancel queries. (Admittedly the queries I mention in these emails are not candidates for cancelling...).

Again, thanks so much for the help.
Michael.


On Wed, Oct 27, 2010 at 6:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Clark <codingninja@gmail.com> writes:
> In doing some experiments I found that using
> PQsendQueryParams/PQconsumeInput/PQisBusy/PQgetResult produces slower
> results than simply calling PQexecParams.

Well, PQconsumeInput involves at least one extra kernel call (to see
whether data is available) so I don't know why this surprises you.
The value of those functions is if your application can do something
else useful while it's waiting.  If the data comes back so fast that
you can't afford any extra cycles expended on the client side, then
you don't have any use for those functions.

However, if you do have something useful to do, the problem with
this example code is that it's not doing that, it's just spinning:

>     while ( ((consume_result = PQconsumeInput(self.db)) == 1) &&
> ((is_busy_result = PQisBusy(self.db)) == 1) )
>         ;

That's a busy-wait loop, so it's no wonder you're eating cycles there.
You want to sleep, or more likely do something else productive, when
there is no data available from the underlying socket.  Usually the
idea is to include libpq's socket in the set of files being watched
by select() or poll(), and dispatch off to something that absorbs
the data whenever you see some data is available to read.

                       regards, tom lane

Re: Should PQconsumeInput/PQisBusy be expensive to use?

From
"A.M."
Date:
On Oct 28, 2010, at 11:08 AM, Michael Clark wrote:

> Hello all.
>
> Thanks a lot for the responses, they are appreciated.
>
> I think I now understand the folly of my loop, and how that was negatively
> impacting my "test".
>
> I tried the suggestion Alex and Tom made to change my loop with a select()
> and my results are now very close to the non-async version.
>
> The main reason for looking at this API is not to support async in our
> applications, that is being achieved architecturally in a PG agnostic way.
> It is to give our PG agnostic layer the ability to cancel queries.
> (Admittedly the queries I mention in these emails are not candidates for
> cancelling...).

Hm- I'm not sure how the async API will allow you to cancel queries. In PostgreSQL, query canceling is implemented by
openinga second connection and passing specific data which is received from the first connection (effectively sending a
cancelsignal to the connection instead of a specific query). This implementation is necessitated by the fact that the
PostgreSQLbackend isn't asynchronous. 

Even if you cancel the query, you still need to consume the socket input. Query cancellation is available for libpq
bothin sync and async modes. 

Cheers,
M

Re: Should PQconsumeInput/PQisBusy be expensive to use?

From
Michael Clark
Date:


On Thu, Oct 28, 2010 at 11:15 AM, A.M. <agentm@themactionfaction.com> wrote:

On Oct 28, 2010, at 11:08 AM, Michael Clark wrote:

> Hello all.
>
> Thanks a lot for the responses, they are appreciated.
>
> I think I now understand the folly of my loop, and how that was negatively
> impacting my "test".
>
> I tried the suggestion Alex and Tom made to change my loop with a select()
> and my results are now very close to the non-async version.
>
> The main reason for looking at this API is not to support async in our
> applications, that is being achieved architecturally in a PG agnostic way.
> It is to give our PG agnostic layer the ability to cancel queries.
> (Admittedly the queries I mention in these emails are not candidates for
> cancelling...).

Hm- I'm not sure how the async API will allow you to cancel queries. In PostgreSQL, query canceling is implemented by opening a second connection and passing specific data which is received from the first connection (effectively sending a cancel signal to the connection instead of a specific query). This implementation is necessitated by the fact that the PostgreSQL backend isn't asynchronous.

Even if you cancel the query, you still need to consume the socket input. Query cancellation is available for libpq both in sync and async modes.


Oh.  I misunderstood that.

I guess I can have one thread performing the query using the non async PG calls, then from another thread issue the cancellation.  Both threads accessing the same PGconn ?

I am glad I added that extra bit of info in my reply, and that your caught it!!

Thank you!
Michael.
 

Re: Should PQconsumeInput/PQisBusy be expensive to use?

From
"Daniel Verite"
Date:
    A.M. wrote:

> In PostgreSQL, query canceling is implemented by opening a
> second connection and passing specific data which is received
> from the first connection

With libpq's PQCancel(), a second connection is not necessary.

Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org

Re: Should PQconsumeInput/PQisBusy be expensive to use?

From
"Daniel Verite"
Date:
    Michael Clark wrote:

> I guess I can have one thread performing the query using the non async PG
> calls, then from another thread issue the cancellation.  Both threads
> accessing the same PGconn ?

Yes. See http://www.postgresql.org/docs/9.0/static/libpq-cancel.html

Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org

Re: Should PQconsumeInput/PQisBusy be expensive to use?

From
"A.M."
Date:
On Oct 28, 2010, at 12:04 PM, Daniel Verite wrote:

>     A.M. wrote:
>
>> In PostgreSQL, query canceling is implemented by opening a
>> second connection and passing specific data which is received
>> from the first connection
>
> With libpq's PQCancel(), a second connection is not necessary.

To clarify, PQcancel() opens a new socket to the backend and sends the cancel message. (The server's socket address is
passedas part of the cancel structure to PQcancel.) 


http://git.postgresql.org/gitweb?p=postgresql.git;a=blob;f=src/interfaces/libpq/fe-connect.c;h=8f318a1a8cc5bf2d49b2605dd76581609cf9be32;hb=HEAD#l2964

The point is that a query can be cancelled from anywhere really and cancellation will not use the original connection
socket.

Cheers,
M