Thread: C libpq frontend library fetchsize

C libpq frontend library fetchsize

From
Yeb Havinga
Date:
Hello list,

I'm wondering if there would be community support for adding using the 
execute message with a rownum > 0 in the c libpq client library, as it 
is used by the jdbc driver with setFetchSize.

kind regards,
Yeb Havinga



Re: C libpq frontend library fetchsize

From
Robert Haas
Date:
On Fri, Feb 26, 2010 at 3:28 PM, Yeb Havinga <yebhavinga@gmail.com> wrote:
> I'm wondering if there would be community support for adding using the
> execute message with a rownum > 0 in the c libpq client library, as it is
> used by the jdbc driver with setFetchSize.

Not sure I follow what you're asking...  what would the new/changed
function signature be?

...Robert


Re: C libpq frontend library fetchsize

From
Takahiro Itagaki
Date:
Yeb Havinga <yebhavinga@gmail.com> wrote:

> I'm wondering if there would be community support for adding using the 
> execute message with a rownum > 0 in the c libpq client library, as it 
> is used by the jdbc driver with setFetchSize.

The setFetchSize for libpq is difficult because of the interface
mismatch -- libpq uses array-based APIs (PGresult) and JDBC uses a
cursor-like API (ResultSet). Instead, you can use CURSOR and FETCH
commands to retrieve rows in separated PGresult objects.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center




Re: C libpq frontend library fetchsize

From
Yeb Havinga
Date:
Takahiro Itagaki wrote:
> Yeb Havinga <yebhavinga@gmail.com> wrote
>> I'm wondering if there would be community support for adding using the 
>> execute message with a rownum > 0 in the c libpq client library, as it 
>> is used by the jdbc driver with setFetchSize.
>>     
>
> The setFetchSize for libpq is difficult because of the interface
> mismatch -- libpq uses array-based APIs (PGresult) and JDBC uses a
> cursor-like API (ResultSet). Instead, you can use CURSOR and FETCH
> commands to retrieve rows in separated PGresult objects.
>   
Hello Takahiro,

Thank you for your reply. Yes there is a large overlap with SQL's 
declare cursor together with fetch, however intuitively it feels strange 
that the application needs to be changed for something that could be 
kept internal to the communication protocol. The application is forced 
to start an application explicitly, and also I tried to declare a cursor 
with parameters from sql but did not succeed, nor could I declare a 
cursors in combination with a prepared statement. Building fetchsize 
into libpq seems like a general solution that could work for all 
pgresult fetching.

Together with a college of mine I worked on this subject some time ago, 
and we thought it might be interesting enough for others as well. In 
short this is what we did:
- add a boolean 'complete' to the PGresult object - to indicate if the 
portalrun was complete
- add PQresultComplete api call that returns the boolean above.
- add PQsendSync
- in the query guts routines, do not send sync messages
- call pqsendsync when a complete pqresult is received, or if result 
fetching is stopped before fetching the last result.
- the fetchsize was a defined constant in our application, but would be 
trivial to replace with a  PQsetFetchsize.

There are some intricacies with reading with libpq with an asynchronous 
event handler. One was that parseinput does not 'eat' the whole message 
buffer, i.e. there can be another complete message in the buffer but a 
new read event might not be triggered because no more data arrives into 
the buffer. For this purpose we also added
- PQmsgAvail - returns true if the event handler may fire again 
immediately (to process more results)

The biggest challenge was when to send sync messages. We have that 
covered currently but it is done by our application outside of libpq 
(hence the pqsendsync visible in the api as well as pqresultcomplete). 
It would be better if that all could be kept inside libpq itself. In the 
end we believe this could provide useful functions for callers of the 
libpq api such as PHP.

regards,
Yeb Havinga



Re: C libpq frontend library fetchsize

From
Yeb Havinga
Date:
Robert Haas wrote:
> On Fri, Feb 26, 2010 at 3:28 PM, Yeb Havinga <yebhavinga@gmail.com> wrote:
>   
>> I'm wondering if there would be community support for adding using the
>> execute message with a rownum > 0 in the c libpq client library, as it is
>> used by the jdbc driver with setFetchSize.
>>     
>
> Not sure I follow what you're asking...  what would the new/changed
> function signature be?
>   
Hello Robert, list

I'm sorry I did not catch your reply until I searched in the archives on 
libpq, I hope you are not offended. However I think the question is 
answered somewhat in a reply I sent to Takahiro Itagaki, viz: 
http://archives.postgresql.org/pgsql-hackers/2010-03/msg00015.php

The recent posting in PERFORM where someone compares mysql vs postgresql 
speed is caused by libpq / whole pgresult as one time.  
(http://archives.postgresql.org/pgsql-performance/2010-03/msg00228.php)

ISTM that using cursors and then fetch is not an adequate solution, 
because 1) someone must realise that the pgresult object is 
gathered/transfered under the hood of libpq completely before the first 
row can be used by the application 2) the structure of the application 
layer is altered to make use of partial results.

What if the default operation of e.g. php using libpq would be as 
follows: set some default fetchsize (e.g. 1000 rows), then just issue 
getrow. In the php pg handling, a function like getnextrow would wait 
for the first pgresult with 1000 rows. Then if the pgresult is depleted 
or almost depleted, request the next pgresult automatically. I see a lot 
of benefits like less memory requirements in libpq, less new users with 
why is my query so slow before the first row, and almost no concerns. A 
small overhead of row description messages perhaps. Maybe the biggest 
benefit of a pgsetfetchsize api call would be to raise awareness that of 
the fact that pgresults are transfered completely (or partially if there 
is animo for me/collegue of mine working on a patch for this).

Besides that, another approach to get data to clients faster could be by 
perhaps using lzo, much in the same way that google uses zippy (see e.g. 
http://feedblog.org/2008/10/12/google-bigtable-compression-zippy-and-bmdiff/) 
to speed up data transfer and delivery. LZO has been mentioned before on 
mailing lists for pg_dump compression, but I think that with a 
--enable-lzo also libpq could benefit too. 
(http://archives.postgresql.org/pgsql-performance/2009-08/msg00053.php)

regards,
Yeb Havinga



Re: C libpq frontend library fetchsize

From
Tom Lane
Date:
Yeb Havinga <yebhavinga@gmail.com> writes:
> What if the default operation of e.g. php using libpq would be as 
> follows: set some default fetchsize (e.g. 1000 rows), then just issue 
> getrow. In the php pg handling, a function like getnextrow would wait 
> for the first pgresult with 1000 rows. Then if the pgresult is depleted 
> or almost depleted, request the next pgresult automatically. I see a lot 
> of benefits like less memory requirements in libpq, less new users with 
> why is my query so slow before the first row, and almost no concerns.

You are blithely ignoring the reasons why libpq doesn't do this.  The
main one being that it's impossible to cope sanely with queries that
fail partway through execution.  The described implementation would not
cope tremendously well with nonsequential access to the resultset, either.
        regards, tom lane


Re: C libpq frontend library fetchsize

From
Yeb Havinga
Date:
Tom Lane wrote:
> Yeb Havinga <yebhavinga@gmail.com> writes:
>   
>> What if the default operation of e.g. php using libpq would be as 
>> follows: set some default fetchsize (e.g. 1000 rows), then just issue 
>> getrow. In the php pg handling, a function like getnextrow would wait 
>> for the first pgresult with 1000 rows. Then if the pgresult is depleted 
>> or almost depleted, request the next pgresult automatically. I see a lot 
>> of benefits like less memory requirements in libpq, less new users with 
>> why is my query so slow before the first row, and almost no concerns.
>>     
>
> You are blithely ignoring the reasons why libpq doesn't do this.  The
> main one being that it's impossible to cope sanely with queries that
> fail partway through execution.
I'm sorry I forgot to add a reference to your post of 
http://archives.postgresql.org/pgsql-general/2010-02/msg00956.php which 
is the only reference to queries failing partway that I know of. But 
blithely is not a good description of me ignoring it. I though about how 
queries could fail, but can't think of anything else than e.g. memory 
exhaustion, and that is just one of the things that is improved this 
way. Maybe a user defined type with an error on certain data values, but 
then the same arguing could be: why support UDT? And if a query fails 
during execution, does that mean that the rows returned until that point 
are wrong?
>   The described implementation would not
> cope tremendously well with nonsequential access to the resultset, either.
>   
That's why I'm not proposing to replace the current way pgresults are 
made complete, but just an extra option to enable developers using the 
libpq libary making the choice themselves.

regards,
Yeb Havinga



Re: C libpq frontend library fetchsize

From
Robert Haas
Date:
On Thu, Mar 18, 2010 at 1:21 PM, Yeb Havinga <yebhavinga@gmail.com> wrote:
> Tom Lane wrote:
>>
>> Yeb Havinga <yebhavinga@gmail.com> writes:
>>
>>>
>>> What if the default operation of e.g. php using libpq would be as
>>> follows: set some default fetchsize (e.g. 1000 rows), then just issue
>>> getrow. In the php pg handling, a function like getnextrow would wait for
>>> the first pgresult with 1000 rows. Then if the pgresult is depleted or
>>> almost depleted, request the next pgresult automatically. I see a lot of
>>> benefits like less memory requirements in libpq, less new users with why is
>>> my query so slow before the first row, and almost no concerns.
>>>
>>
>> You are blithely ignoring the reasons why libpq doesn't do this.  The
>> main one being that it's impossible to cope sanely with queries that
>> fail partway through execution.
>
> I'm sorry I forgot to add a reference to your post of
> http://archives.postgresql.org/pgsql-general/2010-02/msg00956.php which is
> the only reference to queries failing partway that I know of. But blithely
> is not a good description of me ignoring it. I though about how queries
> could fail, but can't think of anything else than e.g. memory exhaustion,
> and that is just one of the things that is improved this way. Maybe a user
> defined type with an error on certain data values, but then the same arguing
> could be: why support UDT? And if a query fails during execution, does that
> mean that the rows returned until that point are wrong?
>>
>>  The described implementation would not
>> cope tremendously well with nonsequential access to the resultset, either.
>>
>
> That's why I'm not proposing to replace the current way pgresults are made
> complete, but just an extra option to enable developers using the libpq
> libary making the choice themselves.

This seems pretty reasonable to me, especially considering that JDBC
is apparently already doing it.  I suppose there will always be
projects that want to reimplement the backend protocol so that they
can be "pure" some-language, but chipping away at the list of other
reasons why someone might not want to use libpq still seems like a
good idea.

...Robert


Re: C libpq frontend library fetchsize

From
Yeb Havinga
Date:
Robert Haas wrote:
> This seems pretty reasonable to me, especially considering that JDBC
> is apparently already doing it.  I suppose there will always be
> projects that want to reimplement the backend protocol so that they
> can be "pure" some-language, but chipping away at the list of other
> reasons why someone might not want to use libpq still seems like a
> good idea.
>
> ...Robert
>   
Hello Robert,

Thank you for your support, but am unsure if it is wise to put time in 
it due to the other reactions.

regards,
Yeb Havinga