Thread: C libpq frontend library fetchsize
Hello list, I'm wondering if there would be community support for adding using the execute message with a rownum > 0 in the c libpq client library, as it is used by the jdbc driver with setFetchSize. kind regards, Yeb Havinga
On Fri, Feb 26, 2010 at 3:28 PM, Yeb Havinga <yebhavinga@gmail.com> wrote: > I'm wondering if there would be community support for adding using the > execute message with a rownum > 0 in the c libpq client library, as it is > used by the jdbc driver with setFetchSize. Not sure I follow what you're asking... what would the new/changed function signature be? ...Robert
Yeb Havinga <yebhavinga@gmail.com> wrote: > I'm wondering if there would be community support for adding using the > execute message with a rownum > 0 in the c libpq client library, as it > is used by the jdbc driver with setFetchSize. The setFetchSize for libpq is difficult because of the interface mismatch -- libpq uses array-based APIs (PGresult) and JDBC uses a cursor-like API (ResultSet). Instead, you can use CURSOR and FETCH commands to retrieve rows in separated PGresult objects. Regards, --- Takahiro Itagaki NTT Open Source Software Center
Takahiro Itagaki wrote: > Yeb Havinga <yebhavinga@gmail.com> wrote >> I'm wondering if there would be community support for adding using the >> execute message with a rownum > 0 in the c libpq client library, as it >> is used by the jdbc driver with setFetchSize. >> > > The setFetchSize for libpq is difficult because of the interface > mismatch -- libpq uses array-based APIs (PGresult) and JDBC uses a > cursor-like API (ResultSet). Instead, you can use CURSOR and FETCH > commands to retrieve rows in separated PGresult objects. > Hello Takahiro, Thank you for your reply. Yes there is a large overlap with SQL's declare cursor together with fetch, however intuitively it feels strange that the application needs to be changed for something that could be kept internal to the communication protocol. The application is forced to start an application explicitly, and also I tried to declare a cursor with parameters from sql but did not succeed, nor could I declare a cursors in combination with a prepared statement. Building fetchsize into libpq seems like a general solution that could work for all pgresult fetching. Together with a college of mine I worked on this subject some time ago, and we thought it might be interesting enough for others as well. In short this is what we did: - add a boolean 'complete' to the PGresult object - to indicate if the portalrun was complete - add PQresultComplete api call that returns the boolean above. - add PQsendSync - in the query guts routines, do not send sync messages - call pqsendsync when a complete pqresult is received, or if result fetching is stopped before fetching the last result. - the fetchsize was a defined constant in our application, but would be trivial to replace with a PQsetFetchsize. There are some intricacies with reading with libpq with an asynchronous event handler. One was that parseinput does not 'eat' the whole message buffer, i.e. there can be another complete message in the buffer but a new read event might not be triggered because no more data arrives into the buffer. For this purpose we also added - PQmsgAvail - returns true if the event handler may fire again immediately (to process more results) The biggest challenge was when to send sync messages. We have that covered currently but it is done by our application outside of libpq (hence the pqsendsync visible in the api as well as pqresultcomplete). It would be better if that all could be kept inside libpq itself. In the end we believe this could provide useful functions for callers of the libpq api such as PHP. regards, Yeb Havinga
Robert Haas wrote: > On Fri, Feb 26, 2010 at 3:28 PM, Yeb Havinga <yebhavinga@gmail.com> wrote: > >> I'm wondering if there would be community support for adding using the >> execute message with a rownum > 0 in the c libpq client library, as it is >> used by the jdbc driver with setFetchSize. >> > > Not sure I follow what you're asking... what would the new/changed > function signature be? > Hello Robert, list I'm sorry I did not catch your reply until I searched in the archives on libpq, I hope you are not offended. However I think the question is answered somewhat in a reply I sent to Takahiro Itagaki, viz: http://archives.postgresql.org/pgsql-hackers/2010-03/msg00015.php The recent posting in PERFORM where someone compares mysql vs postgresql speed is caused by libpq / whole pgresult as one time. (http://archives.postgresql.org/pgsql-performance/2010-03/msg00228.php) ISTM that using cursors and then fetch is not an adequate solution, because 1) someone must realise that the pgresult object is gathered/transfered under the hood of libpq completely before the first row can be used by the application 2) the structure of the application layer is altered to make use of partial results. What if the default operation of e.g. php using libpq would be as follows: set some default fetchsize (e.g. 1000 rows), then just issue getrow. In the php pg handling, a function like getnextrow would wait for the first pgresult with 1000 rows. Then if the pgresult is depleted or almost depleted, request the next pgresult automatically. I see a lot of benefits like less memory requirements in libpq, less new users with why is my query so slow before the first row, and almost no concerns. A small overhead of row description messages perhaps. Maybe the biggest benefit of a pgsetfetchsize api call would be to raise awareness that of the fact that pgresults are transfered completely (or partially if there is animo for me/collegue of mine working on a patch for this). Besides that, another approach to get data to clients faster could be by perhaps using lzo, much in the same way that google uses zippy (see e.g. http://feedblog.org/2008/10/12/google-bigtable-compression-zippy-and-bmdiff/) to speed up data transfer and delivery. LZO has been mentioned before on mailing lists for pg_dump compression, but I think that with a --enable-lzo also libpq could benefit too. (http://archives.postgresql.org/pgsql-performance/2009-08/msg00053.php) regards, Yeb Havinga
Yeb Havinga <yebhavinga@gmail.com> writes: > What if the default operation of e.g. php using libpq would be as > follows: set some default fetchsize (e.g. 1000 rows), then just issue > getrow. In the php pg handling, a function like getnextrow would wait > for the first pgresult with 1000 rows. Then if the pgresult is depleted > or almost depleted, request the next pgresult automatically. I see a lot > of benefits like less memory requirements in libpq, less new users with > why is my query so slow before the first row, and almost no concerns. You are blithely ignoring the reasons why libpq doesn't do this. The main one being that it's impossible to cope sanely with queries that fail partway through execution. The described implementation would not cope tremendously well with nonsequential access to the resultset, either. regards, tom lane
Tom Lane wrote: > Yeb Havinga <yebhavinga@gmail.com> writes: > >> What if the default operation of e.g. php using libpq would be as >> follows: set some default fetchsize (e.g. 1000 rows), then just issue >> getrow. In the php pg handling, a function like getnextrow would wait >> for the first pgresult with 1000 rows. Then if the pgresult is depleted >> or almost depleted, request the next pgresult automatically. I see a lot >> of benefits like less memory requirements in libpq, less new users with >> why is my query so slow before the first row, and almost no concerns. >> > > You are blithely ignoring the reasons why libpq doesn't do this. The > main one being that it's impossible to cope sanely with queries that > fail partway through execution. I'm sorry I forgot to add a reference to your post of http://archives.postgresql.org/pgsql-general/2010-02/msg00956.php which is the only reference to queries failing partway that I know of. But blithely is not a good description of me ignoring it. I though about how queries could fail, but can't think of anything else than e.g. memory exhaustion, and that is just one of the things that is improved this way. Maybe a user defined type with an error on certain data values, but then the same arguing could be: why support UDT? And if a query fails during execution, does that mean that the rows returned until that point are wrong? > The described implementation would not > cope tremendously well with nonsequential access to the resultset, either. > That's why I'm not proposing to replace the current way pgresults are made complete, but just an extra option to enable developers using the libpq libary making the choice themselves. regards, Yeb Havinga
On Thu, Mar 18, 2010 at 1:21 PM, Yeb Havinga <yebhavinga@gmail.com> wrote: > Tom Lane wrote: >> >> Yeb Havinga <yebhavinga@gmail.com> writes: >> >>> >>> What if the default operation of e.g. php using libpq would be as >>> follows: set some default fetchsize (e.g. 1000 rows), then just issue >>> getrow. In the php pg handling, a function like getnextrow would wait for >>> the first pgresult with 1000 rows. Then if the pgresult is depleted or >>> almost depleted, request the next pgresult automatically. I see a lot of >>> benefits like less memory requirements in libpq, less new users with why is >>> my query so slow before the first row, and almost no concerns. >>> >> >> You are blithely ignoring the reasons why libpq doesn't do this. The >> main one being that it's impossible to cope sanely with queries that >> fail partway through execution. > > I'm sorry I forgot to add a reference to your post of > http://archives.postgresql.org/pgsql-general/2010-02/msg00956.php which is > the only reference to queries failing partway that I know of. But blithely > is not a good description of me ignoring it. I though about how queries > could fail, but can't think of anything else than e.g. memory exhaustion, > and that is just one of the things that is improved this way. Maybe a user > defined type with an error on certain data values, but then the same arguing > could be: why support UDT? And if a query fails during execution, does that > mean that the rows returned until that point are wrong? >> >> The described implementation would not >> cope tremendously well with nonsequential access to the resultset, either. >> > > That's why I'm not proposing to replace the current way pgresults are made > complete, but just an extra option to enable developers using the libpq > libary making the choice themselves. This seems pretty reasonable to me, especially considering that JDBC is apparently already doing it. I suppose there will always be projects that want to reimplement the backend protocol so that they can be "pure" some-language, but chipping away at the list of other reasons why someone might not want to use libpq still seems like a good idea. ...Robert
Robert Haas wrote: > This seems pretty reasonable to me, especially considering that JDBC > is apparently already doing it. I suppose there will always be > projects that want to reimplement the backend protocol so that they > can be "pure" some-language, but chipping away at the list of other > reasons why someone might not want to use libpq still seems like a > good idea. > > ...Robert > Hello Robert, Thank you for your support, but am unsure if it is wise to put time in it due to the other reactions. regards, Yeb Havinga