Thread: [patch] libpq one-row-at-a-time API

[patch] libpq one-row-at-a-time API

From
Marko Kreen
Date:
The row-processor API is now in 9.2, but it solves only the
"different-row-storage" problem, but not the "one-row-at-a-time"
problem, as libpq is still in control until all rows are received.

This means libpq cannet still be used to implement iterative
result processing that almost all high-level languages are using.

We discussed potential API for fetching on single row at a time,
but did not reach conclusion.  Basic arguments were:

1) Tom: PQisBusy must keep current behaviour.  Thus also PQgetResult()
   must keep current behaviour:
   * PQisBusy() -> 0: need to call PQgetResult(), which returns PGresult
   * PQisBusy() -> 1: need to call PQconsumeInput()
   * PQisBusy() must be callable several times in a row, thus be
     stateless from clients POV.

2) Me: behaviour must not be controlled by callback, but client code
   that uses PQgetResult() + PQisBusy().

Now, looking at the problem with some perspective, the solution
is obvious: when in single-row mode, the PQgetResult() must return
proper PGresult for that single row.  And everything else follows that.

Such API is implemented in attached patch:

* PQsetSingleRowMode(conn): set's single-row mode.

* PQisBusy(): stops after each row in single-row mode, sets PGASYNC_ROW_READY.
  Thus keeping the property of being repeatedly callable.

* PQgetResult(): returns copy of the row if PGASYNC_ROW_READY.  Sets row
  resultStatus to PGRES_SINGLE_TUPLE.  This needs to be different from
  PGRES_TUPLES_OK to detect resultset end.

* PQgetRowData(): can be called instead PQgetResult() to get raw row data
  in buffer, for more efficient processing.  This is optional feature
  that provides the original row-callback promise of avoiding unnecessary
  row data copy.

* Although PQgetRowData() makes callback API unnecessary, it is still
  fully compatible with it - the callback should not see any difference
  whether the resultset is processed in single-row mode or
  old single-PGresult mode.  Unless it wants to - it can check
  PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE.

There is some duplicate code here that can be refactored (callback exec),
but I did not do it yet to avoid affecting existing code too much.

--
marko

PS. If a squint it seems like fix of exising API instead of new feature,
so perhaps it can still fit into 9.2?


Attachment

Re: [patch] libpq one-row-at-a-time API

From
Merlin Moncure
Date:
On Fri, Jun 15, 2012 at 1:21 PM, Marko Kreen <markokr@gmail.com> wrote:
> The row-processor API is now in 9.2, but it solves only the
> "different-row-storage" problem, but not the "one-row-at-a-time"
> problem, as libpq is still in control until all rows are received.
>
> This means libpq cannet still be used to implement iterative
> result processing that almost all high-level languages are using.
>
> We discussed potential API for fetching on single row at a time,
> but did not reach conclusion.  Basic arguments were:
>
> 1) Tom: PQisBusy must keep current behaviour.  Thus also PQgetResult()
>   must keep current behaviour:
>   * PQisBusy() -> 0: need to call PQgetResult(), which returns PGresult
>   * PQisBusy() -> 1: need to call PQconsumeInput()
>   * PQisBusy() must be callable several times in a row, thus be
>     stateless from clients POV.
>
> 2) Me: behaviour must not be controlled by callback, but client code
>   that uses PQgetResult() + PQisBusy().
>
> Now, looking at the problem with some perspective, the solution
> is obvious: when in single-row mode, the PQgetResult() must return
> proper PGresult for that single row.  And everything else follows that.
>
> Such API is implemented in attached patch:
>
> * PQsetSingleRowMode(conn): set's single-row mode.
>
> * PQisBusy(): stops after each row in single-row mode, sets PGASYNC_ROW_READY.
>  Thus keeping the property of being repeatedly callable.
>
> * PQgetResult(): returns copy of the row if PGASYNC_ROW_READY.  Sets row
>  resultStatus to PGRES_SINGLE_TUPLE.  This needs to be different from
>  PGRES_TUPLES_OK to detect resultset end.
>
> * PQgetRowData(): can be called instead PQgetResult() to get raw row data
>  in buffer, for more efficient processing.  This is optional feature
>  that provides the original row-callback promise of avoiding unnecessary
>  row data copy.
>
> * Although PQgetRowData() makes callback API unnecessary, it is still
>  fully compatible with it - the callback should not see any difference
>  whether the resultset is processed in single-row mode or
>  old single-PGresult mode.  Unless it wants to - it can check
>  PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE.
>
> There is some duplicate code here that can be refactored (callback exec),
> but I did not do it yet to avoid affecting existing code too much.
>
> --
> marko
>
> PS. If a squint it seems like fix of exising API instead of new feature,
> so perhaps it can still fit into 9.2?

+1 on rushing in row processing for 9.2, but only if the API feels
right (i'll spend some time to review).  I found the lack of iterative
row processing to be really unfortunate.

merlin


Re: [patch] libpq one-row-at-a-time API

From
Marko Kreen
Date:
Demos:

https://github.com/markokr/libpq-rowproc-demos/blob/master/demo-onerow-sync.c
https://github.com/markokr/libpq-rowproc-demos/blob/master/demo-onerow-async.c

Few clarifications below.

On Fri, Jun 15, 2012 at 9:21 PM, Marko Kreen <markokr@gmail.com> wrote:
> Now, looking at the problem with some perspective, the solution
> is obvious: when in single-row mode, the PQgetResult() must return
> proper PGresult for that single row.  And everything else follows that.
>
> Such API is implemented in attached patch:
>
> * PQsetSingleRowMode(conn): set's single-row mode.

The function can be called only after PQsend* and before any
rows have arrived.  This guarantees there will be no surprises
to PQexec* users who expect full resultset at once.  Also it
guarantees that user will process resultset with PQgetResult()
loop, either sync or async.  Next PQexec/PQsend call will
reset the flag.  So it is active only for duration of processing
results from one command.

Currently it returns FALSE if called in wrong place and does
nothing.  Only question I see here is whether it should set
error state on connection or not.  It does not seem to be
improvement.

> * PQgetRowData(): can be called instead PQgetResult() to get raw row data
>  in buffer, for more efficient processing.  This is optional feature
>  that provides the original row-callback promise of avoiding unnecessary
>  row data copy.
>
> * Although PQgetRowData() makes callback API unnecessary, it is still
>  fully compatible with it - the callback should not see any difference
>  whether the resultset is processed in single-row mode or
>  old single-PGresult mode.  Unless it wants to - it can check
>  PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE.

The PQgetResult() is compatible with callbacks, the PQgetRowData()
bypasses them.

--
marko


Re: [patch] libpq one-row-at-a-time API

From
Tom Lane
Date:
Marko Kreen <markokr@gmail.com> writes:
>> Now, looking at the problem with some perspective, the solution
>> is obvious: when in single-row mode, the PQgetResult() must return
>> proper PGresult for that single row.  And everything else follows that.
>> 
>> * PQgetRowData(): can be called instead PQgetResult() to get raw row data
>>  in buffer, for more efficient processing.  This is optional feature
>>  that provides the original row-callback promise of avoiding unnecessary
>>  row data copy.
>> 
>> * Although PQgetRowData() makes callback API unnecessary, it is still
>>  fully compatible with it - the callback should not see any difference
>>  whether the resultset is processed in single-row mode or
>>  old single-PGresult mode.  Unless it wants to - it can check
>>  PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE.

I guess this raises the question of whether we ought to revert the
row-callback patch entirely and support only this approach.  IMO
it is (barely) not too late to do that for 9.2, if we want to.
If we don't want to, then this is just another new feature and
should be considered for 9.3.

What I like about this is the greatly simpler and harder-to-misuse
API.  The only arguable drawback is that there's still at least one
malloc/free cycle per tuple, imposed by the creation of a PGresult
for each one, whereas the callback approach avoids that.  But worrying
about that could be considered to be vast overoptimization; the backend
has certainly spent a lot more overhead than that generating the tuple.
        regards, tom lane


Re: [patch] libpq one-row-at-a-time API

From
Marko Kreen
Date:
On Sat, Jun 16, 2012 at 6:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I guess this raises the question of whether we ought to revert the
> row-callback patch entirely and support only this approach.  IMO
> it is (barely) not too late to do that for 9.2, if we want to.
> If we don't want to, then this is just another new feature and
> should be considered for 9.3.

I think row-callback is dangerous API that does not solve any
important problems.

But I do like the 2-phase processing the rowproc patch introduced
and having a way to bypass unnecessary malloc()+copy.

So my preference would be to simply remove the callback API
but keep the processing and provide PQgetRowData() instead.

Although the win that it brings is significantly smaller thanks
to single-row PQgetResult().  So if it does not sound interesting
to others, it can be dropped.  Because the single-row processing
is the important feature we need, rest is extra.

--
marko


Re: [patch] libpq one-row-at-a-time API

From
Simon Riggs
Date:
On 16 June 2012 23:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marko Kreen <markokr@gmail.com> writes:
>>> Now, looking at the problem with some perspective, the solution
>>> is obvious: when in single-row mode, the PQgetResult() must return
>>> proper PGresult for that single row.  And everything else follows that.
>>>
>>> * PQgetRowData(): can be called instead PQgetResult() to get raw row data
>>>  in buffer, for more efficient processing.  This is optional feature
>>>  that provides the original row-callback promise of avoiding unnecessary
>>>  row data copy.
>>>
>>> * Although PQgetRowData() makes callback API unnecessary, it is still
>>>  fully compatible with it - the callback should not see any difference
>>>  whether the resultset is processed in single-row mode or
>>>  old single-PGresult mode.  Unless it wants to - it can check
>>>  PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE.
>
> I guess this raises the question of whether we ought to revert the
> row-callback patch entirely and support only this approach.  IMO
> it is (barely) not too late to do that for 9.2, if we want to.
> If we don't want to, then this is just another new feature and
> should be considered for 9.3.
>
> What I like about this is the greatly simpler and harder-to-misuse
> API.  The only arguable drawback is that there's still at least one
> malloc/free cycle per tuple, imposed by the creation of a PGresult
> for each one, whereas the callback approach avoids that.  But worrying
> about that could be considered to be vast overoptimization; the backend
> has certainly spent a lot more overhead than that generating the tuple.

I prefer the description of Marko's API than the one we have now.

Adopting one API in 9.2 and another in 9.3 would be fairly bad.
Perhaps we can have both?

Can we see a performance test? "Add a row processor API to libpq for
better handling of large result sets". So idea is we do this many,
many times so we need to double check the extra overhead is not a
problem in cases where the dumping overhead is significant.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [patch] libpq one-row-at-a-time API

From
Marko Kreen
Date:
On Sat, Jun 16, 2012 at 7:58 PM, Marko Kreen <markokr@gmail.com> wrote:
> So my preference would be to simply remove the callback API
> but keep the processing and provide PQgetRowData() instead.

This is implemented in attached patch.  It also
converts dblink to use single-row API.

The patch should be applied on top of previous
single-row patch.

Both can be seen also here:

  https://github.com/markokr/postgres/commits/single-row

--
marko

Attachment

Re: [patch] libpq one-row-at-a-time API

From
Marko Kreen
Date:
On Sun, Jun 17, 2012 at 2:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I prefer the description of Marko's API than the one we have now.
>
> Adopting one API in 9.2 and another in 9.3 would be fairly bad.
> Perhaps we can have both?

I see no reason the keep the (public) callback API around,
except if we don't bother to remove it now.

> Can we see a performance test? "Add a row processor API to libpq for
> better handling of large result sets". So idea is we do this many,
> many times so we need to double check the extra overhead is not a
> problem in cases where the dumping overhead is significant.

Not sure what do to want to performance test.

PQgetRowData() uses exactly the same pipeline
that callbacks used.  It will use few more C calls,
not sure it make sense to benchmark them.

Recent dblink change did change palloc() + copy
zero-termination dance to PQgetResult(), which
does malloc() + copy dance internally.  This malloc
vs. palloc might be benchmarkable, but it seems to
go into micro-benchmarking world as the big win came
from avoiding buffering rows.  So yeah, maybe using
PQgetRowData() might be tiny bit faster, but I don't
expect much difference.

But all this affects new users only.  The thing that affects
everybody was the 2-step row processing change
that was done during rowproc patch.

I did benchmark it, and it seems there are column-size
+ column-count patterns where new way is faster,
and some patterns where old way is faster.  But the
difference did not raise above test noise so I concluded
it is insignificant and the malloc+copy avoidance is worth it.

Ofcourse, additional any benchmarking is welcome, so feel
free to pick any situation you care about.

-- 
marko


Re: [patch] libpq one-row-at-a-time API

From
Simon Riggs
Date:
On 17 June 2012 19:37, Marko Kreen <markokr@gmail.com> wrote:
> On Sun, Jun 17, 2012 at 2:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> I prefer the description of Marko's API than the one we have now.
>>
>> Adopting one API in 9.2 and another in 9.3 would be fairly bad.
>> Perhaps we can have both?
>
> I see no reason the keep the (public) callback API around,
> except if we don't bother to remove it now.

OK by me.

>> Can we see a performance test? "Add a row processor API to libpq for
>> better handling of large result sets". So idea is we do this many,
>> many times so we need to double check the extra overhead is not a
>> problem in cases where the dumping overhead is significant.
...
> I did benchmark it, and it seems there are column-size
> + column-count patterns where new way is faster,
> and some patterns where old way is faster.  But the
> difference did not raise above test noise so I concluded
> it is insignificant and the malloc+copy avoidance is worth it.

As long as we've checked that's fine.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [patch] libpq one-row-at-a-time API

From
Tom Lane
Date:
Marko Kreen <markokr@gmail.com> writes:
> Now, looking at the problem with some perspective, the solution
> is obvious: when in single-row mode, the PQgetResult() must return
> proper PGresult for that single row.  And everything else follows that.

> Such API is implemented in attached patch:

I'm starting to look at this patch now.  I think we could drop the
PQgetRowData() API: it complicates matters for little gain that I can
see.  The argument for it was to avoid the cost of creating a PGresult
per row, but we're already going to pay the cost of creating a
PGresult in order to return the PGRES_SINGLE_TUPLE status.  And as was
pointed out upthread, any per-tuple malloc costs are going to be in
the noise compared to the server-side effort expended to create the
tuple, anyway.  The point of this feature is to avoid accumulating the
entire resultset in memory, not to micro-optimize linear-time costs.

Moreover, if the argument for changing 9.2 at this late date is to get
rid of a fragile, breakable API, surely an API that's designed around
returning pointers into the library's network buffer ought to be a
prime target.

And lastly, since the proposed patch for dblink doesn't use
PQgetRowData, there's not even much reason to think that it's
bug-free.
        regards, tom lane


Re: [patch] libpq one-row-at-a-time API

From
Marko Kreen
Date:
On Mon, Jul 16, 2012 at 4:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marko Kreen <markokr@gmail.com> writes:
>> Now, looking at the problem with some perspective, the solution
>> is obvious: when in single-row mode, the PQgetResult() must return
>> proper PGresult for that single row.  And everything else follows that.
>
>> Such API is implemented in attached patch:
>
> I'm starting to look at this patch now.  I think we could drop the
> PQgetRowData() API: it complicates matters for little gain that I can
> see.  The argument for it was to avoid the cost of creating a PGresult
> per row, but we're already going to pay the cost of creating a
> PGresult in order to return the PGRES_SINGLE_TUPLE status.

No.  Please look again, it is supposed to be called instead of PGgetResult().

-- 
marko


Re: [patch] libpq one-row-at-a-time API

From
Tom Lane
Date:
Marko Kreen <markokr@gmail.com> writes:
> On Mon, Jul 16, 2012 at 4:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm starting to look at this patch now.  I think we could drop the
>> PQgetRowData() API: it complicates matters for little gain that I can
>> see.  The argument for it was to avoid the cost of creating a PGresult
>> per row, but we're already going to pay the cost of creating a
>> PGresult in order to return the PGRES_SINGLE_TUPLE status.

> No.  Please look again, it is supposed to be called instead of PGgetResult().

Mm.  I still think we should drop it, because it's still a dangerous API
that's not necessary for the principal benefit of this feature.
        regards, tom lane


Re: [patch] libpq one-row-at-a-time API

From
Marko Kreen
Date:
On Mon, Jul 16, 2012 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marko Kreen <markokr@gmail.com> writes:
>> On Mon, Jul 16, 2012 at 4:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I'm starting to look at this patch now.  I think we could drop the
>>> PQgetRowData() API: it complicates matters for little gain that I can
>>> see.  The argument for it was to avoid the cost of creating a PGresult
>>> per row, but we're already going to pay the cost of creating a
>>> PGresult in order to return the PGRES_SINGLE_TUPLE status.
>
>> No.  Please look again, it is supposed to be called instead of PGgetResult().
>
> Mm.  I still think we should drop it, because it's still a dangerous API
> that's not necessary for the principal benefit of this feature.

Yes, it is a secondary feature, but it fits the needs of the actual target
audience of the single-row feature - various high-level wrappers of libpq.

Also it is needed for high-performance situations, where the
single-row-mode fits well even for C clients, except the
advantage is negated by new malloc-per-row overhead.

-- 
marko


Re: [patch] libpq one-row-at-a-time API

From
Tom Lane
Date:
Marko Kreen <markokr@gmail.com> writes:
> On Mon, Jul 16, 2012 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Mm.  I still think we should drop it, because it's still a dangerous API
>> that's not necessary for the principal benefit of this feature.

> Yes, it is a secondary feature, but it fits the needs of the actual target
> audience of the single-row feature - various high-level wrappers of libpq.

> Also it is needed for high-performance situations, where the
> single-row-mode fits well even for C clients, except the
> advantage is negated by new malloc-per-row overhead.

Absolutely no evidence has been presented that there's any useful
performance gain to be had there.  Moreover, if there were, we could
probably work a bit harder at making PGresult creation cheaper, rather
than having to expose a dangerous API.
        regards, tom lane


Re: [patch] libpq one-row-at-a-time API

From
Marko Kreen
Date:
On Mon, Jul 16, 2012 at 6:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marko Kreen <markokr@gmail.com> writes:
>> On Mon, Jul 16, 2012 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Mm.  I still think we should drop it, because it's still a dangerous API
>>> that's not necessary for the principal benefit of this feature.
>
>> Yes, it is a secondary feature, but it fits the needs of the actual target
>> audience of the single-row feature - various high-level wrappers of libpq.
>
>> Also it is needed for high-performance situations, where the
>> single-row-mode fits well even for C clients, except the
>> advantage is negated by new malloc-per-row overhead.
>
> Absolutely no evidence has been presented that there's any useful
> performance gain to be had there.  Moreover, if there were, we could
> probably work a bit harder at making PGresult creation cheaper, rather
> than having to expose a dangerous API.

Ok, I'm more interested in primary feature,
so no more objections from me.

-- 
marko


Re: [patch] libpq one-row-at-a-time API

From
Marko Kreen
Date:
Here is 2 approaches how to get to state where only PQsetSingleRowMode()
is available.  Both on top of REL9_2_STABLE branch.

a) Remove callback hooks, keep rowBuf, implement single-row-mode on
   top of that.   This was posted before, I just additionally removed
   the PQgetRowData() function.

    git pull git://github.com/markokr/postgres.git single-row-mode1
    https://github.com/markokr/postgres/commits/single-row-mode1

   Commits:
    libpq: Single-row based processing
    libpq, dblink: Remove row processor API

   Advantage: makes easier to play with PQgetRowData() or potatial
   faster PGresult creation methods.  Smaller change compared to
   libpq from 9.2beta than b).

b) Revert row-callback changes completely, implement single-row-mode on
   top of virgin libpq.  Only problem here was keeping fixes implemented
   as part of row-callback patch.  Single-row-mode itself is quite simple.

    git pull git://github.com/markokr/postgres.git  single-row-mode1
    https://github.com/markokr/postgres/commits/single-row-mode1

   Feature patch:
    https://github.com/markokr/postgres/commit/b5e822125c655f189875401c61317242705143b9

   Commits:
    dblink: revert conversion to row processor API patch
    libpq: revert row processor API patch
    libpq: random fixes
    libpq: single-row mode
    dblink: use single-row-mode

   Advantage: smaller change compared to libpq from 9.1 than a).

As the patch has suffered a lot from trying to provide both macro- and
micro-optimization (on-the-fly row processing vs. more efficient row
processing), maybe b) is safer choice for 9.2?

In case somebody wants to look at the patches without git (or web),
I attaches them as tgz too.

--
marko


Attachment

Re: [patch] libpq one-row-at-a-time API

From
Tom Lane
Date:
Marko Kreen <markokr@gmail.com> writes:
> Here is 2 approaches how to get to state where only PQsetSingleRowMode()
> is available.  Both on top of REL9_2_STABLE branch.

> a) Remove callback hooks, keep rowBuf, implement single-row-mode on
>    top of that.   This was posted before, I just additionally removed
>    the PQgetRowData() function.

> b) Revert row-callback changes completely, implement single-row-mode on
>    top of virgin libpq.  Only problem here was keeping fixes implemented
>    as part of row-callback patch.  Single-row-mode itself is quite simple.

Yeah, I was wondering if we shouldn't revisit the whole patch given that
where we're ending up looks little like where we thought we were going.
I hadn't had time to pursue the idea, but I'm glad you did.  Will look
at these.

One reason to stick with approach (a) is that it gives us the option
to easily add PQgetRowData(), in case future testing shows that that's
actually worth the risk and usage restrictions.  While I'm a bit dubious
of that, I'm prepared to be proven wrong.
        regards, tom lane


Re: [patch] libpq one-row-at-a-time API

From
Marko Kreen
Date:
Here is a simple test program that takes a SELECT
query, reads it and outputs a COPY-formatted stream
to standard output, to simulate some activity.

It operates on 3 modes, specified by commant-line switches:

-f   Load full resultset at once - old way.
-s   Single-Row mode using PQgetResult().
-z   Single-Row mode using PQgetRowData().

It is compiled with 2 different libpqs that correspond to
single-row-modeX branches in my github repo:

rowdump1 - libpq with rowBuf + PQgetRowData().   rowBuf is
           required for PQgetRowData.
           [ https://github.com/markokr/postgres/tree/single-row-mode1 ]

rowdump2 - Plain libpq patched for single-row mode.
           No PQgetRowData() here.
           [ https://github.com/markokr/postgres/tree/single-row-mode2 ]

Notes:

* Hardest part is picking realistic queries that matter.
  It's possible to construct artificial queries that make
  results go either way.

* It does not make sense for compare -f with others.  But it
  does make sense to compare -f from differently patched libpqs
  to detect any potential slowdowns.

* The time measured is User Time of client process.

-------------------------------------------------------
QUERY: select 10000,200,300000,rpad('x',30,'z') from
generate_series(1,5000000)
./rowdump1 -f:   3.90   3.90   3.93  avg:  3.91
./rowdump2 -f:   4.03   4.13   4.05  avg:  4.07
./rowdump1 -s:   6.26   6.33   6.49  avg:  6.36
./rowdump2 -s:   7.48   7.46   7.50  avg:  7.48
./rowdump1 -z:   2.88   2.90   2.79  avg:  2.86
QUERY: select
rpad('x',10,'z'),rpad('x',20,'z'),rpad('x',30,'z'),rpad('x',40,'z'),rpad('x',50,'z'),rpad('x',60,'z')
from generate_series(1,3000000)
./rowdump1 -f:   6.29   6.36   6.14  avg:  6.26
./rowdump2 -f:   6.79   6.69   6.72  avg:  6.73
./rowdump1 -s:   7.71   7.72   7.80  avg:  7.74
./rowdump2 -s:   8.14   8.16   8.57  avg:  8.29
./rowdump1 -z:   6.45   5.15   5.16  avg:  5.59
QUERY: select

1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100
from generate_series(1,800000)
./rowdump1 -f:   5.68   5.66   5.72  avg:  5.69
./rowdump2 -f:   5.69   5.84   5.67  avg:  5.73
./rowdump1 -s:   7.68   7.76   7.67  avg:  7.70
./rowdump2 -s:   7.57   7.54   7.62  avg:  7.58
./rowdump1 -z:   2.78   2.82   2.72  avg:  2.77
QUERY: select 1000,rpad('x', 400, 'z'),rpad('x', 4000, 'z') from
generate_series(1,100000)
./rowdump1 -f:   2.71   2.66   2.58  avg:  2.65
./rowdump2 -f:   3.11   3.14   3.16  avg:  3.14
./rowdump1 -s:   2.64   2.61   2.64  avg:  2.63
./rowdump2 -s:   3.15   3.11   3.11  avg:  3.12
./rowdump1 -z:   2.53   2.51   2.46  avg:  2.50
-------------------------------------------------------

Test code attached.  Please play with it.

By this test, both rowBuf and PQgetRowData() look good.

--
marko


Attachment

Re: [patch] libpq one-row-at-a-time API

From
Merlin Moncure
Date:
On Mon, Jul 23, 2012 at 2:05 PM, Marko Kreen <markokr@gmail.com> wrote:
>
> Here is a simple test program that takes a SELECT
> query, reads it and outputs a COPY-formatted stream
> to standard output, to simulate some activity.
>
> It operates on 3 modes, specified by commant-line switches:
>
> -f   Load full resultset at once - old way.
> -s   Single-Row mode using PQgetResult().
> -z   Single-Row mode using PQgetRowData().
>
> It is compiled with 2 different libpqs that correspond to
> single-row-modeX branches in my github repo:
>
> rowdump1 - libpq with rowBuf + PQgetRowData().   rowBuf is
>            required for PQgetRowData.
>            [ https://github.com/markokr/postgres/tree/single-row-mode1 ]
>
> rowdump2 - Plain libpq patched for single-row mode.
>            No PQgetRowData() here.
>            [ https://github.com/markokr/postgres/tree/single-row-mode2 ]
>
> Notes:
>
> * Hardest part is picking realistic queries that matter.
>   It's possible to construct artificial queries that make
>   results go either way.
>
> * It does not make sense for compare -f with others.  But it
>   does make sense to compare -f from differently patched libpqs
>   to detect any potential slowdowns.
>
> * The time measured is User Time of client process.
>
> -------------------------------------------------------
> QUERY: select 10000,200,300000,rpad('x',30,'z') from
> generate_series(1,5000000)
> ./rowdump1 -f:   3.90   3.90   3.93  avg:  3.91
> ./rowdump2 -f:   4.03   4.13   4.05  avg:  4.07
> ./rowdump1 -s:   6.26   6.33   6.49  avg:  6.36
> ./rowdump2 -s:   7.48   7.46   7.50  avg:  7.48
> ./rowdump1 -z:   2.88   2.90   2.79  avg:  2.86
> QUERY: select
> rpad('x',10,'z'),rpad('x',20,'z'),rpad('x',30,'z'),rpad('x',40,'z'),rpad('x',50,'z'),rpad('x',60,'z')
> from generate_series(1,3000000)
> ./rowdump1 -f:   6.29   6.36   6.14  avg:  6.26
> ./rowdump2 -f:   6.79   6.69   6.72  avg:  6.73
> ./rowdump1 -s:   7.71   7.72   7.80  avg:  7.74
> ./rowdump2 -s:   8.14   8.16   8.57  avg:  8.29
> ./rowdump1 -z:   6.45   5.15   5.16  avg:  5.59
> QUERY: select
>
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100
> from generate_series(1,800000)
> ./rowdump1 -f:   5.68   5.66   5.72  avg:  5.69
> ./rowdump2 -f:   5.69   5.84   5.67  avg:  5.73
> ./rowdump1 -s:   7.68   7.76   7.67  avg:  7.70
> ./rowdump2 -s:   7.57   7.54   7.62  avg:  7.58
> ./rowdump1 -z:   2.78   2.82   2.72  avg:  2.77
> QUERY: select 1000,rpad('x', 400, 'z'),rpad('x', 4000, 'z') from
> generate_series(1,100000)
> ./rowdump1 -f:   2.71   2.66   2.58  avg:  2.65
> ./rowdump2 -f:   3.11   3.14   3.16  avg:  3.14
> ./rowdump1 -s:   2.64   2.61   2.64  avg:  2.63
> ./rowdump2 -s:   3.15   3.11   3.11  avg:  3.12
> ./rowdump1 -z:   2.53   2.51   2.46  avg:  2.50
> -------------------------------------------------------
>
> Test code attached.  Please play with it.
>
> By this test, both rowBuf and PQgetRowData() look good.

I agree on performance grounds.   It's important for libpq to be fast.

It seems odd (but maybe ok) that you have to set the single row mode
on the connection only to have the server reset it whenever you call a
send function: maybe rename to PQsetResultSingleRowMode?

Does PQgetRowData() break the ability to call PQgetvalue() against the
result as well as other functions like PQgetisnull()?  If so, I
object.

merlin


Re: [patch] libpq one-row-at-a-time API

From
Marko Kreen
Date:
On Tue, Jul 24, 2012 at 12:27 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Mon, Jul 23, 2012 at 2:05 PM, Marko Kreen <markokr@gmail.com> wrote:
> It seems odd (but maybe ok) that you have to set the single row mode
> on the connection only to have the server reset it whenever you call a
> send function: maybe rename to PQsetResultSingleRowMode?

Server does not reset it, it's purely client-side flag.  It is reset
by next PQsend*/PQexec* call.  If there are several resultsets
sent by server for one query, they all share the same setting.

I don't think extra-long function names that "describe exactly"
the function behavior are improvement over shorter but inexact
names, as you need to read the docs anyway to know
the real behavior.  But its a matter of taste, so it can be
renamed if people want it.

Alternative is to create parallel PQsend* functions that set the flag.
It gets rid of the possibility of any extra activity between PQsend
and PQsetSingleRowMode().  But it seems messy to do that
just for single-row-mode, instead it makes sense to have PQsend*
that takes a flags argument to tune it's behavior.  But that
is worth doing only if we have more flags than just one.

> Does PQgetRowData() break the ability to call PQgetvalue() against the
> result as well as other functions like PQgetisnull()?  If so, I
> object.

I don't get what are you objecting to.  The PQgetRowData()
is called instead of PQgetResult() and it returns zero-row PGresult
for row header and list of PGdataValues that point to actual
data. You call the value functions, they don't crash, but as
the result has zero rows (the data is not copied to it)
they don't do anything useful either.

The whole point of the API is to avoid the unnecessary copying.

You can mix the calls to PQgetRowData() and PQgetResult()
during one resultset, it works fine although does not seem
that useful.

I guess you fear that some code can unexpectedly see
new behavior, and that exactly why the row-callback API
needs to be scrapped - it allowed such possibility.

But with PQsetSingleRowMode and PQgetRowData, the
new behavior is seen only by new code that clearly
expects it, so I don't see what the problem is.

-- 
marko


Re: [patch] libpq one-row-at-a-time API

From
Merlin Moncure
Date:
On Mon, Jul 23, 2012 at 5:07 PM, Marko Kreen <markokr@gmail.com> wrote:
>> Does PQgetRowData() break the ability to call PQgetvalue() against the
>> result as well as other functions like PQgetisnull()?  If so, I
>> object.
>
> I don't get what are you objecting to.  The PQgetRowData()
> is called instead of PQgetResult() and it returns zero-row PGresult
> for row header and list of PGdataValues that point to actual
> data. You call the value functions, they don't crash, but as
> the result has zero rows (the data is not copied to it)
> they don't do anything useful either.
>
> The whole point of the API is to avoid the unnecessary copying.
>
> You can mix the calls to PQgetRowData() and PQgetResult()
> during one resultset, it works fine although does not seem
> that useful.
>
> I guess you fear that some code can unexpectedly see
> new behavior, and that exactly why the row-callback API
> needs to be scrapped - it allowed such possibility.
>
> But with PQsetSingleRowMode and PQgetRowData, the
> new behavior is seen only by new code that clearly
> expects it, so I don't see what the problem is.

Well, for one, it breaks libpqtypes (see line 171@
http://libpqtypes.esilo.com/browse_source.html?file=exec.c), or at
least makes it unable to use the row-processing mode.   libpqtypes
wraps the libpq getter functions and exposes a richer api using only
the result object.  When writing libpqtypes we went through great
pains to keep access to server side data through the existing result
API.  Note, I'm not arguing that compatibility with libpqtypes is a
desired objective, but the wrapping model that it uses is pretty
reasonably going to be used by other code -- the awkwardness there
should be a red flag of sorts.

I'm arguing that *all* data getting must continue to do so through the
result object, and bypassing the result to get at data is breaking the
result abstraction in the libpq api.  I bet you can still maintain
data access through result object while avoiding extra copies.  For
example, maybe PQsetSingleRowMode maybe should return a result object?

merlin


Re: [patch] libpq one-row-at-a-time API

From
Tom Lane
Date:
Merlin Moncure <mmoncure@gmail.com> writes:
> I'm arguing that *all* data getting must continue to do so through the
> result object, and bypassing the result to get at data is breaking the
> result abstraction in the libpq api.

That's a fair point, but the single-row mode without PQgetRowData still
fits that model, doesn't it?  From the point of view of libpqtypes it
just looks like you got a lot of one-row query results.
        regards, tom lane


Re: [patch] libpq one-row-at-a-time API

From
Merlin Moncure
Date:
On Tue, Jul 24, 2012 at 10:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> I'm arguing that *all* data getting must continue to do so through the
>> result object, and bypassing the result to get at data is breaking the
>> result abstraction in the libpq api.
>
> That's a fair point, but the single-row mode without PQgetRowData still
> fits that model, doesn't it?  From the point of view of libpqtypes it
> just looks like you got a lot of one-row query results.

Sure: Marko's exec_query_single_row example looks like 100% reasonable
libpq code.  That said, I'd still spend a few cycles to think this
through and make sure we aren't walling ourselves off from 'copy free'
behavior, even if that's reserved for a future improvement.  In
particular, I'd like to explore if PQsetSingleRowMode() should be
returning a result.

merlin


Re: [patch] libpq one-row-at-a-time API

From
Marko Kreen
Date:
On Tue, Jul 24, 2012 at 6:18 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Mon, Jul 23, 2012 at 5:07 PM, Marko Kreen <markokr@gmail.com> wrote:
>>> Does PQgetRowData() break the ability to call PQgetvalue() against the
>>> result as well as other functions like PQgetisnull()?  If so, I
>>> object.
>>
>> I don't get what are you objecting to.  The PQgetRowData()
>> is called instead of PQgetResult() and it returns zero-row PGresult
>> for row header and list of PGdataValues that point to actual
>> data. You call the value functions, they don't crash, but as
>> the result has zero rows (the data is not copied to it)
>> they don't do anything useful either.
>>
>> The whole point of the API is to avoid the unnecessary copying.
>>
>> You can mix the calls to PQgetRowData() and PQgetResult()
>> during one resultset, it works fine although does not seem
>> that useful.
>>
>> I guess you fear that some code can unexpectedly see
>> new behavior, and that exactly why the row-callback API
>> needs to be scrapped - it allowed such possibility.
>>
>> But with PQsetSingleRowMode and PQgetRowData, the
>> new behavior is seen only by new code that clearly
>> expects it, so I don't see what the problem is.
>
> Well, for one, it breaks libpqtypes (see line 171@
> http://libpqtypes.esilo.com/browse_source.html?file=exec.c), or at
> least makes it unable to use the row-processing mode.   libpqtypes
> wraps the libpq getter functions and exposes a richer api using only
> the result object.  When writing libpqtypes we went through great
> pains to keep access to server side data through the existing result
> API.  Note, I'm not arguing that compatibility with libpqtypes is a
> desired objective, but the wrapping model that it uses is pretty
> reasonably going to be used by other code -- the awkwardness there
> should be a red flag of sorts.
>
> I'm arguing that *all* data getting must continue to do so through the
> result object, and bypassing the result to get at data is breaking the
> result abstraction in the libpq api.  I bet you can still maintain
> data access through result object while avoiding extra copies.

Well, the main problem this week is whether we should
apply PQsetSingleRowMode() from single-row-mode1
or from single-row-mode2 branch?

The PQgetRowData() is unimportant as it just exposes
the rowBuf to user and thats all.

Do you have opinion about that?

>  For example, maybe PQsetSingleRowMode maybe should return a result object?

What do you mean by that?  And have you though about both
sync and async usage patterns?

-- 
marko


Re: [patch] libpq one-row-at-a-time API

From
Merlin Moncure
Date:
On Tue, Jul 24, 2012 at 11:08 AM, Marko Kreen <markokr@gmail.com> wrote:
>> I'm arguing that *all* data getting must continue to do so through the
>> result object, and bypassing the result to get at data is breaking the
>> result abstraction in the libpq api.  I bet you can still maintain
>> data access through result object while avoiding extra copies.
>
> Well, the main problem this week is whether we should
> apply PQsetSingleRowMode() from single-row-mode1
> or from single-row-mode2 branch?
>
> The PQgetRowData() is unimportant as it just exposes
> the rowBuf to user and thats all.

right. branch 1 (containing PQgetRowData) seems wrong to me.  so, if
given that choice, I'd argue for branch 2, forcing a PGresult pull on
each row.  However, what you were gunning for via branch 1 which is
extra performance via removing the extra allocs is important and
useful; hopefully we can get the best of both worlds, or punt and
settle on branch 2.

>>  For example, maybe PQsetSingleRowMode maybe should return a result object?
>
> What do you mean by that?  And have you though about both
> sync and async usage patterns?

No, I haven't -- at least not very well.  The basic idea is that
PQsetSingleRowMode turns into PQgetSingleRowResult() and returns a
result object.  For row by row an extra API call gets called (maybe
PQgetNextRow(PGconn, PGresult)) that does the behind the scenes work
under the existing result object.  This is the same general structure
you have in branch 2, but the result allocation is move out of the
loop.  Presumably sync and async would then follow the same pattern,
but we'd still have to be able to guarantee non-blocking behavior in
the async api.

merlin


Re: [patch] libpq one-row-at-a-time API

From
Marko Kreen
Date:
On Tue, Jul 24, 2012 at 7:08 PM, Marko Kreen <markokr@gmail.com> wrote:
> The PQgetRowData() is unimportant as it just exposes
> the rowBuf to user and thats all.

So to get back to the more interesting PQgetRowData() discussion...

Did you notice that it's up to app to decide whether it calls
PQgetResult() or PQgetRowData() after PQsetSingleRowMode()?
So there is no way for it to get into conflicts with anything.
If libpqtypes keeps using PQgetResult it keeps getting
PGresult.  No problem.

The PQgetRowData() is meant for use-cases where PGresult is *not* the
main data structure the app operates on.  If the app does dumb
copy out of PGresult as soon as possible, it can move to PGgetRowData().

If app wants do to complex operations with PGresult or just
store it, then it can keep doing it.  Nobody forces the use
of PQgetRowData().


Now the about idea about doing more optimized PGresult - one way of doing
it would be to create zero-copy PGresult that points into network
buffer like PGgetRowData() does.  But this breaks all expectations
of lifetime rules for PGresult, thus seems like bad idea.

Second way is to optimize the copying step - basically just
do a malloc and 2 or 3 memcopies - for struct, headers
and data.  This leaves standalone PGresult around that
behaves as expected.  But for apps that don't care about
PGresult it is still unnecessary copy.  So even if we
optimize PGresult that way, it still seems worthwhile
to have PGgetRowData() around.


Hm, I think it's possible to rig the test to do dummy
copy of pgresult, thus it's possible to see what kind
of speed is possible..  Will try.

-- 
marko


Re: [patch] libpq one-row-at-a-time API

From
Marko Kreen
Date:
On Tue, Jul 24, 2012 at 7:25 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Tue, Jul 24, 2012 at 11:08 AM, Marko Kreen <markokr@gmail.com> wrote:
>>> I'm arguing that *all* data getting must continue to do so through the
>>> result object, and bypassing the result to get at data is breaking the
>>> result abstraction in the libpq api.  I bet you can still maintain
>>> data access through result object while avoiding extra copies.
>>
>> Well, the main problem this week is whether we should
>> apply PQsetSingleRowMode() from single-row-mode1
>> or from single-row-mode2 branch?
>>
>> The PQgetRowData() is unimportant as it just exposes
>> the rowBuf to user and thats all.
>
> right. branch 1 (containing PQgetRowData) seems wrong to me.

Indeed, you are missing the fact that PGgetResult works same
in both branches.,

And please see the benchmart I posted.

Even better, run it yourself...

>> What do you mean by that?  And have you though about both
>> sync and async usage patterns?
>
> No, I haven't -- at least not very well.  The basic idea is that
> PQsetSingleRowMode turns into PQgetSingleRowResult() and returns a
> result object.  For row by row an extra API call gets called (maybe
> PQgetNextRow(PGconn, PGresult)) that does the behind the scenes work
> under the existing result object.  This is the same general structure
> you have in branch 2, but the result allocation is move out of the
> loop.  Presumably sync and async would then follow the same pattern,
> but we'd still have to be able to guarantee non-blocking behavior in
> the async api.

Well, as discussed previously it's worthwhile to keep standard functions
like PQisBusy() and PQgetResult() working sensibly in new mode,
and currently they do so.

If you add new functions, you also need to define the behavior
when new and old one's are mixed and it gets messy quickly.

-- 
marko


Re: [patch] libpq one-row-at-a-time API

From
Merlin Moncure
Date:
On Tue, Jul 24, 2012 at 11:39 AM, Marko Kreen <markokr@gmail.com> wrote:
> On Tue, Jul 24, 2012 at 7:25 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> On Tue, Jul 24, 2012 at 11:08 AM, Marko Kreen <markokr@gmail.com> wrote:
>>>> I'm arguing that *all* data getting must continue to do so through the
>>>> result object, and bypassing the result to get at data is breaking the
>>>> result abstraction in the libpq api.  I bet you can still maintain
>>>> data access through result object while avoiding extra copies.
>>>
>>> Well, the main problem this week is whether we should
>>> apply PQsetSingleRowMode() from single-row-mode1
>>> or from single-row-mode2 branch?
>>>
>>> The PQgetRowData() is unimportant as it just exposes
>>> the rowBuf to user and thats all.
>>
>> right. branch 1 (containing PQgetRowData) seems wrong to me.
>
> Indeed, you are missing the fact that PGgetResult works same
> in both branches.,
>
> And please see the benchmart I posted.
>
> Even better, run it yourself...
>
>>> What do you mean by that?  And have you though about both
>>> sync and async usage patterns?
>>
>> No, I haven't -- at least not very well.  The basic idea is that
>> PQsetSingleRowMode turns into PQgetSingleRowResult() and returns a
>> result object.  For row by row an extra API call gets called (maybe
>> PQgetNextRow(PGconn, PGresult)) that does the behind the scenes work
>> under the existing result object.  This is the same general structure
>> you have in branch 2, but the result allocation is move out of the
>> loop.  Presumably sync and async would then follow the same pattern,
>> but we'd still have to be able to guarantee non-blocking behavior in
>> the async api.
>
> Well, as discussed previously it's worthwhile to keep standard functions
> like PQisBusy() and PQgetResult() working sensibly in new mode,
> and currently they do so.
>
> If you add new functions, you also need to define the behavior
> when new and old one's are mixed and it gets messy quickly.

Right, I'm aware that you can use 'slow' method in branch 1.  I also
saw the benchmarks and agree that single row mode should be at least
competitive with current methods for pretty much all cases.

But, the faster rowbuf method is a generally incompatible way of
dealing with data vs current libpq -- this is bad.  If it's truly
impossible to get those benefits without bypassing result API that
then I remove my objection on the grounds it's optional behavior (my
gut tells me it is possible though).

I think the dummy copy of PGresult is plausible (if by that you mean
optimizing PQgetResult when in single row mode).  That would be even
better: you'd remove the need for the rowbuf mode.

merlin


Re: [patch] libpq one-row-at-a-time API

From
Marko Kreen
Date:
On Tue, Jul 24, 2012 at 7:52 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> But, the faster rowbuf method is a generally incompatible way of
> dealing with data vs current libpq -- this is bad.  If it's truly
> impossible to get those benefits without bypassing result API that
> then I remove my objection on the grounds it's optional behavior (my
> gut tells me it is possible though).

Um, please clarify what are you talking about here?

What is the incompatibility of PGresult from branch 1?

-- 
marko


Re: [patch] libpq one-row-at-a-time API

From
Tom Lane
Date:
Merlin Moncure <mmoncure@gmail.com> writes:
> I think the dummy copy of PGresult is plausible (if by that you mean
> optimizing PQgetResult when in single row mode).  That would be even
> better: you'd remove the need for the rowbuf mode.

I haven't spent any time looking at this, but my gut tells me that a big
chunk of the expense is copying the PGresult's metadata (the column
names, types, etc).  It has to be possible to make that cheaper.

One idea is to rearrange the internal storage so that that part reduces
to one memcpy().  Another thought is to allow PGresults to share
metadata by treating the metadata as a separate reference-counted
object.  The second could be a bit hazardous though, as we advertise
that PGresults are independent objects that can be manipulated by
separate threads.  I don't want to introduce mutexes into PGresults,
but I'm not sure reference-counted metadata can be safe without them.
So maybe the memcpy idea is the only workable one.
        regards, tom lane


Re: [patch] libpq one-row-at-a-time API

From
Merlin Moncure
Date:
On Tue, Jul 24, 2012 at 11:57 AM, Marko Kreen <markokr@gmail.com> wrote:
> On Tue, Jul 24, 2012 at 7:52 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> But, the faster rowbuf method is a generally incompatible way of
>> dealing with data vs current libpq -- this is bad.  If it's truly
>> impossible to get those benefits without bypassing result API that
>> then I remove my objection on the grounds it's optional behavior (my
>> gut tells me it is possible though).
>
> Um, please clarify what are you talking about here?
>
> What is the incompatibility of PGresult from branch 1?

Incompatibility in terms of usage -- we should be getting data with
PQgetdata.  I think you're suspecting that I incorrectly believe your
forced to use the rowbuf API -- I don't (although I wasn't clear on
that earlier).  Basically I'm saying that we should only buy into that
if all other alternative routes to getting the faster performance are
exhausted.

On Tue, Jul 24, 2012 at 11:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> I think the dummy copy of PGresult is plausible (if by that you mean
>> optimizing PQgetResult when in single row mode).  That would be even
>> better: you'd remove the need for the rowbuf mode.
>
> I haven't spent any time looking at this, but my gut tells me that a big
> chunk of the expense is copying the PGresult's metadata (the column
> names, types, etc).  It has to be possible to make that cheaper.
>
> One idea is to rearrange the internal storage so that that part reduces
> to one memcpy().  Another thought is to allow PGresults to share
> metadata by treating the metadata as a separate reference-counted
> object.  The second could be a bit hazardous though, as we advertise
> that PGresults are independent objects that can be manipulated by
> separate threads.  I don't want to introduce mutexes into PGresults,
> but I'm not sure reference-counted metadata can be safe without them.
> So maybe the memcpy idea is the only workable one.

Yeah -- we had a very similar problem in libpqtypes and we solved it
exactly as you're thinking.  libpqtypes has to create a result with
each row iteration potentially (we expose rows and composites as on
the fly created result objects) and stores some extra non-trivial data
with the result.  We solved it with the optimized-memcpy method (look
here: http://libpqtypes.esilo.com/browse_source.html?file=libpqtypes.h
and you'll see all the important structs like PGtypeHandler are
somewhat haphazardly designed to be run through a memcpy.   We
couldn't do anything about internal libpq issues though, but some
micro optimization of PQsetResultAttrs (which is called via
PQcopyResult) might fit the bill.

The 'source' result (or source data that would be copied into the
destination result) would be stored in the PGconn, right? So, the idea
is that when you set up single row mode the connection generates a
template PGconn which is then copied out repeatedly during row-by-row
processing.  I like it, but only if we're reasonably confident the
PGresult can be sufficiently optimized like that.

merlin


Re: [patch] libpq one-row-at-a-time API

From
Merlin Moncure
Date:
On Tue, Jul 24, 2012 at 1:33 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> The 'source' result (or source data that would be copied into the
> destination result) would be stored in the PGconn, right? So, the idea
> is that when you set up single row mode the connection generates a
> template PGconn which is then copied out repeatedly during row-by-row
> processing.  I like it, but only if we're reasonably confident the
> PGresult can be sufficiently optimized like that.

hm, PGresAttDesc is unfortunately in the public header and as such
probably can't be changed?

merlin


Re: [patch] libpq one-row-at-a-time API

From
Merlin Moncure
Date:
On Tue, Jul 24, 2012 at 1:49 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Tue, Jul 24, 2012 at 1:33 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> The 'source' result (or source data that would be copied into the
>> destination result) would be stored in the PGconn, right? So, the idea
>> is that when you set up single row mode the connection generates a
>> template PGconn which is then copied out repeatedly during row-by-row
>> processing.  I like it, but only if we're reasonably confident the
>> PGresult can be sufficiently optimized like that.
>
> hm, PGresAttDesc is unfortunately in the public header and as such
> probably can't be changed?

It can't -- at least not without breaking compatibility.  So, as
attractive as it sounds, it looks like a memcpy based PGresult copy is
out unless we break the rules change it anyways (with the probably
safe assumption that the only userland code that cares is libpqtypes,
which we'd certainly change).

However, it's not clear that a shared metadata implementation would
require a mutex -- if you stored the shared data in the conn and were
willing to walk the results in the event the PGconn was freed before
the results, you'd probably be ok.  That's really unpleasant though.

Either way, it looks like there's plausible paths to optimizing
repeated result fetch without having expose an alternate data-fetching
API and that the proposed implementation doesn't appear to be a wall
in terms of getting there. So I'm firmly in the non-exposed-rowbuf
camp, even if we can't expose an optimal implementation in time for
9.2.

merlin


Re: [patch] libpq one-row-at-a-time API

From
Tom Lane
Date:
Merlin Moncure <mmoncure@gmail.com> writes:
> Either way, it looks like there's plausible paths to optimizing
> repeated result fetch without having expose an alternate data-fetching
> API and that the proposed implementation doesn't appear to be a wall
> in terms of getting there. So I'm firmly in the non-exposed-rowbuf
> camp, even if we can't expose an optimal implementation in time for
> 9.2.

Yeah, the schedule argument is a strong one.  If we put in PQgetRowData
now, we'll be stuck with it forevermore.  It would be better to push
that part of the patch to 9.3 so we can have more time to think it over
and investigate alternatives.  The single-row mode is already enough to
solve the demonstrated client-side performance issues we know about.
So IMO it would be a lot smarter to be spending our time right now on
making sure we have *that* part of the patch right.

In particular I agree that PQsetSingleRowMode is a bit ugly, so I'm
wondering about your thoughts on providing PQgetSingleRowResult instead.
I don't see how to make that work in async mode.  I think the library
needs to be aware of whether it's supposed to return a single-row result
in advance of actually doing so --- for instance, how can PQisBusy
possibly give the correct answer if it doesn't know whether having
received the first row is sufficient?  (Yeah, I know we could invent
a separate flavor of PQisBusy for single-row usage, but ick.  There are
implementation problems too, such as possibly having already copied a
lot of rows into a work-in-progress PGresult.)

The thing I fundamentally don't like about PQsetSingleRowMode is that
there's such a narrow window of time to use it correctly -- as soon as
you've done even one PQconsumeInput, it's too late.  (Or maybe not, if
the server is slow, which makes it timing-dependent whether you'll
observe a bug.  Maybe we should add more internal state so that we can
consistently throw error if you've done any PQconsumeInput?)  The most
obvious alternative is to invent new versions of the PQsendXXX functions
with an additional flag parameter; but there are enough of them to make
that not very attractive either.
        regards, tom lane


Re: [patch] libpq one-row-at-a-time API

From
Marko Kreen
Date:
> Hm, I think it's possible to rig the test to do dummy
> copy of pgresult, thus it's possible to see what kind
> of speed is possible..  Will try.

I added a new method (-x) to rowdump where it asks for row
with PQgetRowData() and then tries to emulate super-efficient
PGresult copy, then loads data from that PGresult.

Quick reference:
rowdump1 - single-row-mode1 [~ libpq 9.2]
rowdump2 - single-row-mode2 [~ libpq 9.1]

-s - single row mode with PQgetResult()
-z - single row mode with PQgetRowData()
-x - simulated optimized PQgetResult()

-------------------------------------------------------------
QUERY: select 10000,200,300000,rpad('x',30,'z') from
generate_series(1,5000000)
./rowdump1 -s:   6.28   6.25   6.39  avg:  6.31 [ 100.00 % ]
./rowdump2 -s:   7.49   7.40   7.57  avg:  7.49 [ 118.71 % ]
./rowdump1 -z:   2.86   2.77   2.79  avg:  2.81 [ 44.50 % ]
./rowdump1 -x:   3.46   3.27   3.29  avg:  3.34 [ 52.96 % ]
QUERY: select
rpad('x',10,'z'),rpad('x',20,'z'),rpad('x',30,'z'),rpad('x',40,'z'),rpad('x',50,'z'),rpad('x',60,'z')
from generate_series(1,3000000)
./rowdump1 -s:   7.76   7.76   7.68  avg:  7.73 [ 100.00 % ]
./rowdump2 -s:   8.24   8.12   8.66  avg:  8.34 [ 107.85 % ]
./rowdump1 -z:   5.34   5.07   5.23  avg:  5.21 [ 67.41 % ]
./rowdump1 -x:   5.53   5.61   5.61  avg:  5.58 [ 72.20 % ]
QUERY: select

1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100
from generate_series(1,800000)
./rowdump1 -s:   7.49   7.66   7.59  avg:  7.58 [ 100.00 % ]
./rowdump2 -s:   7.56   8.12   7.95  avg:  7.88 [ 103.91 % ]
./rowdump1 -z:   2.77   2.76   2.76  avg:  2.76 [ 36.46 % ]
./rowdump1 -x:   3.07   3.05   3.18  avg:  3.10 [ 40.90 % ]
QUERY: select 1000,rpad('x', 400, 'z'),rpad('x', 4000, 'z') from
generate_series(1,100000)
./rowdump1 -s:   2.66   2.62   2.67  avg:  2.65 [ 100.00 % ]
./rowdump2 -s:   3.11   3.14   3.11  avg:  3.12 [ 117.74 % ]
./rowdump1 -z:   2.49   2.46   2.47  avg:  2.47 [ 93.33 % ]
./rowdump1 -x:   2.59   2.57   2.57  avg:  2.58 [ 97.23 % ]
-----------------------------------------------------------------

It shows that even if the actual "fast" row copy will be slower
than this one here, it's still quote competitive approach to
PQgetRowData(), as long it's not too slow.

So the optimized PGgetResult() may be good enough, thus we
can drop the idea of PQgetRowData().

Code attached, also in https://github.com/markokr/pqtest repo.

--
marko


Attachment

Re: [patch] libpq one-row-at-a-time API

From
Merlin Moncure
Date:
On Tue, Jul 24, 2012 at 2:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> Either way, it looks like there's plausible paths to optimizing
>> repeated result fetch without having expose an alternate data-fetching
>> API and that the proposed implementation doesn't appear to be a wall
>> in terms of getting there. So I'm firmly in the non-exposed-rowbuf
>> camp, even if we can't expose an optimal implementation in time for
>> 9.2.
>
> Yeah, the schedule argument is a strong one.  If we put in PQgetRowData
> now, we'll be stuck with it forevermore.  It would be better to push
> that part of the patch to 9.3 so we can have more time to think it over
> and investigate alternatives.  The single-row mode is already enough to
> solve the demonstrated client-side performance issues we know about.
> So IMO it would be a lot smarter to be spending our time right now on
> making sure we have *that* part of the patch right.
>
> In particular I agree that PQsetSingleRowMode is a bit ugly, so I'm
> wondering about your thoughts on providing PQgetSingleRowResult instead.
> I don't see how to make that work in async mode.  I think the library
> needs to be aware of whether it's supposed to return a single-row result
> in advance of actually doing so --- for instance, how can PQisBusy
> possibly give the correct answer if it doesn't know whether having
> received the first row is sufficient?

Well (Marko probably wants to slap me with a halibut by now), the
original intent was for it not to have to: PQgetSingleRowResult is
more 'construct result for single row iteration', so it would never
block -- it only sets the internal single row mode and instantiates
the result object.  After that, you can do do standard
consumeinput/isbusy processing on the conn.  The actual iteration
routine would be like PQiterateResult which you could guard with
PQisBusy.  Like I said: it's the same general structure but the result
construction is moved out of the loop.

I don't think this breaks result scoping rules...the conn keeps a copy
of the result during 'result construction' which we are going to
define as ending when the query terminates.  Until the query resolves,
the result remains 'under construction' and avoids the expectation of
const-ness that you normally get so you don't get to interleave
threads reading the result with threads iterating from the connection
(which is fine) and you have to avoid doing stupid things like closing
the connection before all the data has been read through.

(but maybe all this is moot per Marko's latest benchmarks)

merlin


Re: [patch] libpq one-row-at-a-time API

From
Marko Kreen
Date:
On Tue, Jul 24, 2012 at 11:34 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Tue, Jul 24, 2012 at 2:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> In particular I agree that PQsetSingleRowMode is a bit ugly, so I'm
>> wondering about your thoughts on providing PQgetSingleRowResult instead.
>> I don't see how to make that work in async mode.  I think the library
>> needs to be aware of whether it's supposed to return a single-row result
>> in advance of actually doing so --- for instance, how can PQisBusy
>> possibly give the correct answer if it doesn't know whether having
>> received the first row is sufficient?
>
> Well (Marko probably wants to slap me with a halibut by now), the
> original intent was for it not to have to: PQgetSingleRowResult is
> more 'construct result for single row iteration', so it would never
> block -- it only sets the internal single row mode and instantiates
> the result object.  After that, you can do do standard
> consumeinput/isbusy processing on the conn.  The actual iteration
> routine would be like PQiterateResult which you could guard with
> PQisBusy.  Like I said: it's the same general structure but the result
> construction is moved out of the loop.

If you really don't like PQsetSingleRowMode, then I would prefer
new PQsend* functions to new fetch functions.

Because while send is one-shot affair, receive is complex state-machine
with requires multiple function calls.

-- 
marko


Re: [patch] libpq one-row-at-a-time API

From
Marko Kreen
Date:
On Tue, Jul 24, 2012 at 10:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> Either way, it looks like there's plausible paths to optimizing
>> repeated result fetch without having expose an alternate data-fetching
>> API and that the proposed implementation doesn't appear to be a wall
>> in terms of getting there. So I'm firmly in the non-exposed-rowbuf
>> camp, even if we can't expose an optimal implementation in time for
>> 9.2.
>
> Yeah, the schedule argument is a strong one.  If we put in PQgetRowData
> now, we'll be stuck with it forevermore.  It would be better to push
> that part of the patch to 9.3 so we can have more time to think it over
> and investigate alternatives.  The single-row mode is already enough to
> solve the demonstrated client-side performance issues we know about.
> So IMO it would be a lot smarter to be spending our time right now on
> making sure we have *that* part of the patch right.

Just to show agreement: both PQgetRowData() and optimized PGresult
do not belong to 9.2.

Only open questions are:

* Is there better API than PQsetSingleRowMode()?  New PQsend* functions is my alternative.

* Should we rollback rowBuf change? I think no, as per benchmark it performs better than old code.

> The thing I fundamentally don't like about PQsetSingleRowMode is that
> there's such a narrow window of time to use it correctly -- as soon as
> you've done even one PQconsumeInput, it's too late.  (Or maybe not, if
> the server is slow, which makes it timing-dependent whether you'll
> observe a bug.  Maybe we should add more internal state so that we can
> consistently throw error if you've done any PQconsumeInput?)  The most
> obvious alternative is to invent new versions of the PQsendXXX functions
> with an additional flag parameter; but there are enough of them to make
> that not very attractive either.

It belongs logically together with PQsend, so if you don't like
current separation, then proper fix is to make them single
function call.

-- 
marko


Re: [patch] libpq one-row-at-a-time API

From
Merlin Moncure
Date:
On Tue, Jul 24, 2012 at 3:52 PM, Marko Kreen <markokr@gmail.com> wrote:
> * Is there better API than PQsetSingleRowMode()?  New PQsend*
>   functions is my alternative.

I would prefer to have PQsetSingleRowMode() over new flavor of PQsend.

To consolidate my argument above: since you're throwing PQgetResult in
the main iteration loop you're potentially walling yourself off from
one important optimization: not copying the result at all and reusing
the old one; that's going to produce the fastest possible code since
you're recovering all the attribute structures that have already been
set up unless you're willing to do the following:

Reading your original patch, what if at the line inside PQgetResult:

res = pqSingleRowResult(conn);

you instead manipulated the result the connection already had and
directly returned it -- discarding the result data but not the
attributes etc?  Actually, you could even keep your API 'as is' if
you're willing to adjust the behavior of PQclear while the connection
is doing row by row results processing to be a no-op unless done.

Single row results' data  would then be volatile across iterations,
not const -- even if you save off the pointer the connection can and
will rewrite the data portion of the PGresult.  Any pointers to
PQgetdata'd values would have to be copied off between iterations
naturally (or the user can copy off using the user-facing copy result
function).  This would be extremely efficient since we'd only even
need to extend PGresult buffer if a particular row was longer than the
longest one already fetched.

If all that's a bridge too far, we can look at re-jiggering the API
like I was thinking ealier to make the adjusted result scoping more
user visible or run your non-rowbuf-exposing patch as-is and hope for
optimizations down the line.

merlin


Re: [patch] libpq one-row-at-a-time API

From
Marko Kreen
Date:
On Wed, Jul 25, 2012 at 12:35 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Tue, Jul 24, 2012 at 3:52 PM, Marko Kreen <markokr@gmail.com> wrote:
>> * Is there better API than PQsetSingleRowMode()?  New PQsend*
>>   functions is my alternative.
>
> I would prefer to have PQsetSingleRowMode() over new flavor of PQsend.
>
> To consolidate my argument above: since you're throwing PQgetResult in
> the main iteration loop you're potentially walling yourself off from
> one important optimization: not copying the result at all and reusing
> the old one; that's going to produce the fastest possible code since
> you're recovering all the attribute structures that have already been
> set up unless you're willing to do the following:

I am not forgetting such optimization, I've already implemented it:
it's called PQgetRowData().  Main reason why it works, and so simply,
is that it does not try to give you PGresult.

PGresult carries various historical assumptions:
- Fully user-controlled lifetime.
- Zero-terminated strings.
- Aligned binary values.

Breaking them all does not seem like a good idea.

Trying to make them work with zero-copy seems
not worth the pain.

And if you are ready to introduce new accessor functions,
why care about PGresult at all?  Why not give user
PQgetRowData()?

Btw, PQgetRowData() and any other potential zero-copy API
is not incompatible with both slow PQgetResult() or optimized
PQgetResult().

So if we give only PQgetResult() in 9.2, I do not see that we
are locked out from any interesting optimizations.

-- 
marko


Re: [patch] libpq one-row-at-a-time API

From
Merlin Moncure
Date:
On Tue, Jul 24, 2012 at 4:59 PM, Marko Kreen <markokr@gmail.com> wrote:
> So if we give only PQgetResult() in 9.2, I do not see that we
> are locked out from any interesting optimizations.

Well, you are locked out of having PQgetResult reuse the conn's result
since that would then introduce potentially breaking changes to user
code.  Here are the approaches I see:

1) introduce rowbuf (but not user exposed rowbuf) patch:
Not the fastest method.  Users would begin using the feature and API
behaviors would be locked in -- only possible optmiization route would
be to try and make PQcopyResult as fast as possible.

2) expose PQgetRowData
Very fast, but foreign and somewhat awkward.  Existing libpq wrappers
(libpqtypes, libpqxx etc) could not be converted to use without
extensive revision, if at all, and would be stuck with the slower
version of iteration.  Also a side benefit here is that optimizing
result copying later has usefulness outside of row by row processing.

3) as #1, but instead of copying result, overwrite the data area.
this is bending the rule 'user defined lifetime of result' since each
iteration clears the data area of the PGresult. This is almost as fast
as #2, and existing libpq wrappers would be trivially converted to the
API.

4) as #3, but introduce a new iteration function
(PQiterateResult(conn, result)) that would be called instead of a
paired PQgetResult/PQclear.  This would also be fast, and could almost
lay directly on top of #2 as an alternate optimization strategy -- the
set result mode would have to be modified (or and additional function
introduced) to return a result.

I feel like you're partial to #2 and appreciate that, but I'm really
quite skeptical about it as noted.  #3 and #4 are not as well thought
out as what you've put together and would need some extra work and
buy-in to get out the door, so #2 seems like the lowest common
denominator (it would permanently preclude #3 and would require #4 to
introduce two new functions instead of just one).  #1 of course would
bolt on to #2.

merlin


Re: [patch] libpq one-row-at-a-time API

From
Merlin Moncure
Date:
On Tue, Jul 24, 2012 at 5:29 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> so #2 seems like the lowest common
> denominator (it would permanently preclude #3 and would require #4 to
> introduce two new functions instead of just one).  #1 of course would
> bolt on to #2.

oops, got #1 and #2 backwards there.

merlin


Re: [patch] libpq one-row-at-a-time API

From
Marko Kreen
Date:
On Wed, Jul 25, 2012 at 1:29 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Tue, Jul 24, 2012 at 4:59 PM, Marko Kreen <markokr@gmail.com> wrote:
>> So if we give only PQgetResult() in 9.2, I do not see that we
>> are locked out from any interesting optimizations.
>
> Well, you are locked out of having PQgetResult reuse the conn's result
> since that would then introduce potentially breaking changes to user
> code.

You can specify special flags to PQsend or have special PQgetResultWeird()
calls to get PGresults with unusual behavior.  Like I did with PQgetRowData().

I see no reason here to reject PQgetResult() that returns normal PGresult.

-- 
marko


Re: [patch] libpq one-row-at-a-time API

From
Merlin Moncure
Date:
On Tuesday, July 24, 2012, Marko Kreen <<a href="mailto:markokr@gmail.com">markokr@gmail.com</a>> wrote:<br
/>>On Wed, Jul 25, 2012 at 1:29 AM, Merlin Moncure <<a
href="mailto:mmoncure@gmail.com">mmoncure@gmail.com</a>>wrote:<br /> >> On Tue, Jul 24, 2012 at 4:59 PM, Marko
Kreen<<a href="mailto:markokr@gmail.com">markokr@gmail.com</a>> wrote:<br />>>> So if we give only
PQgetResult()in 9.2, I do not see that we<br />>>> are locked out from any interesting optimizations.<br />
>><br/>>> Well, you are locked out of having PQgetResult reuse the conn's result<br />>> since that
wouldthen introduce potentially breaking changes to user<br />>> code.<br />><br />> You can specify
specialflags to PQsend or have special PQgetResultWeird()<br /> > calls to get PGresults with unusual behavior.
 LikeI did with PQgetRowData().<br />><br />> I see no reason here to reject PQgetResult() that returns normal
PGresult.<br/><br />Yeah -- I agree.  So, given the scheduling, I think we should go with non-PQgetRowData patch and
reserveoptimized path for 9.3.<br /><br />merlin    

Re: [patch] libpq one-row-at-a-time API

From
Leon Smith
Date:
Hey, this thread was pointed out to me just a few days ago, but I'll start by saying that I think this thread is on exactly the right track.   I don't like the callback API,  and think that PQsetSingleRowMode should be offered in place of it.   But I do have one

On Sat, Jun 16, 2012 at 10:22 AM, Marko Kreen <markokr@gmail.com> wrote:
The function can be called only after PQsend* and before any
rows have arrived.  This guarantees there will be no surprises
to PQexec* users who expect full resultset at once.

Ok,  I'm guessing you mean that "before you call PQgetResult or PQgetRowData",  or maybe "before you call PQgetResult or PQgetRowData and it returns a result or partial result."    Because it would be a race condition if you meant exactly what you said.   (Though I don't understand how this could possibly be implemented without some source of concurrency, which libpq doesn't do.)   Maybe this is a little overly pendantic,  but I do want to confirm the intention here.

One other possibility,  Tom Lane fretted ever so slightly about the use of malloc/free per row... what about instead of PQsetSingleRowMode,  you have PQsetChunkedRowMode that takes a chunkSize parameter.   A chunkSize <= 0 would be equivalent to what we have today,   a chunkSize of 1 means you get what you have from PQsetSingleRowMode,  and larger chunkSizes would wait until n rows have been received before returning them all in a single result.      I don't know that this suggestion is all that important, but it seems like an obvious generalization that might possibly be useful.

Best,
Leon

Re: [patch] libpq one-row-at-a-time API

From
Jan Wieck
Date:
On 7/30/2012 8:11 PM, Leon Smith wrote:
> One other possibility,  Tom Lane fretted ever so slightly about the use
> of malloc/free per row... what about instead of PQsetSingleRowMode,  you
> have PQsetChunkedRowMode that takes a chunkSize parameter.   A chunkSize
> <= 0 would be equivalent to what we have today,   a chunkSize of 1 means
> you get what you have from PQsetSingleRowMode,  and larger chunkSizes
> would wait until n rows have been received before returning them all in
> a single result.      I don't know that this suggestion is all that
> important, but it seems like an obvious generalization that might
> possibly be useful.

It is questionable if that actually adds any useful functionality. Any 
"collecting" of multiple rows will only run the risk to stall receiving 
the following rows while processing this batch. Processing each row as 
soon as it is available will ensure making most use network buffers.

Collecting multiple rows, like in the FETCH command for cursors does, 
makes sense when each batch introduces a network round trip, like for 
the FETCH command. But does it make any sense for a true streaming mode, 
like what is discussed here?


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


Re: [patch] libpq one-row-at-a-time API

From
Leon Smith
Date:


On Mon, Jul 30, 2012 at 9:59 PM, Jan Wieck <JanWieck@yahoo.com> wrote:
On 7/30/2012 8:11 PM, Leon Smith wrote:
One other possibility,  Tom Lane fretted ever so slightly about the use
of malloc/free per row... what about instead of PQsetSingleRowMode,  you
have PQsetChunkedRowMode that takes a chunkSize parameter.   A chunkSize
<= 0 would be equivalent to what we have today,   a chunkSize of 1 means
you get what you have from PQsetSingleRowMode,  and larger chunkSizes
would wait until n rows have been received before returning them all in
a single result.      I don't know that this suggestion is all that
important, but it seems like an obvious generalization that might
possibly be useful.

It is questionable if that actually adds any useful functionality.

This is true, I'm not sure my suggestion is necessarily useful.   I'm just throwing it out there.
 
Any "collecting" of multiple rows will only run the risk to stall receiving the following rows while processing this batch. Processing each row as soon as it is available will ensure making most use network buffers.

This is not necessarily true,  on multiple levels.   I mean,  some of the programs I write are highly concurrent,  and this form of batching would have almost no risk of stalling the network buffer.    And the possible use case would be when you are dealing with very small rows,  when there would typically be several rows inside a single network packet or network buffer.  


Collecting multiple rows, like in the FETCH command for cursors does, makes sense when each batch introduces a network round trip, like for the FETCH command. But does it make any sense for a true streaming mode, like what is discussed here?

Maybe?    I mean,  I anticipate that there are (probably) still use cases for FETCH,  even when the row-at-a-time interface is a viable option and the transport between postgres and the client has reasonable flow-control.

Leon

Re: [patch] libpq one-row-at-a-time API

From
Jan Wieck
Date:
On 7/30/2012 10:31 PM, Leon Smith wrote:
> This is not necessarily true,  on multiple levels.   I mean,  some of
> the programs I write are highly concurrent,  and this form of batching
> would have almost no risk of stalling the network buffer.    And
> the possible use case would be when you are dealing with very small
> rows,  when there would typically be several rows inside a single
> network packet or network buffer.

With "highly concurrent" you mean multi-threaded? Like one thread reads 
the rows in batches and pushes them into a queue while another thread 
processes them from that queue?

If that is the case, then you just added a useless layer of buffering 
and the need for thread/thread context switches to PQsetSingleRowMode. 
Libpq's "receiver thread" is the kernel itself. Libpq tries to never 
read partial kernel buffers already. It always makes sure that there are 
at least 8K of free space in the inBuffer. In the case you describe 
above, where several rows fit into a single packet, libpq will receive 
them with a single system call in one read(2), then the application can 
get them as fast as possible, without causing any further context 
switches because they are already in the inBuffer.

I've written that sort of code myself in the past. Look at the Slony 
worker thread prior to 2.2. We switched to the COPY protocol instead of 
waiting for the single row mode and got rid of all that extra buffering 
already (and then some more).


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


Re: [patch] libpq one-row-at-a-time API

From
Merlin Moncure
Date:
On Mon, Jul 30, 2012 at 10:26 PM, Jan Wieck <JanWieck@yahoo.com> wrote:
> On 7/30/2012 10:31 PM, Leon Smith wrote:
>>
>> This is not necessarily true,  on multiple levels.   I mean,  some of
>> the programs I write are highly concurrent,  and this form of batching
>> would have almost no risk of stalling the network buffer.    And
>> the possible use case would be when you are dealing with very small
>> rows,  when there would typically be several rows inside a single
>> network packet or network buffer.
>
>
> With "highly concurrent" you mean multi-threaded? Like one thread reads the
> rows in batches and pushes them into a queue while another thread processes
> them from that queue?
>
> If that is the case, then you just added a useless layer of buffering and
> the need for thread/thread context switches to PQsetSingleRowMode. Libpq's
> "receiver thread" is the kernel itself. Libpq tries to never read partial
> kernel buffers already. It always makes sure that there are at least 8K of
> free space in the inBuffer. In the case you describe above, where several
> rows fit into a single packet, libpq will receive them with a single system
> call in one read(2), then the application can get them as fast as possible,
> without causing any further context switches because they are already in the
> inBuffer.

Yeah: with asynchronous query processing the query gets sent and
control returns immediately to your code: that's the whole point.
Even if some data races to the network buffer, libpq doesn't 'see' any
data until you tell it to by asking for a result (which can block) or
draining the buffers with PQconsumeInput.  So there is no race in the
traditional sense and I'm ok with the PQsetSingleRowMode as such.

Removing malloc/free on row iteration seems only to be possible via
one of two methods: either a) you introduce a non-PGresult based
method of data extraction or b) you preserve the PGresult across row
iterations.

merlin


Re: [patch] libpq one-row-at-a-time API

From
Tom Lane
Date:
[ getting back to this now after assorted distractions ]

Marko Kreen <markokr@gmail.com> writes:
> Just to show agreement: both PQgetRowData() and optimized PGresult
> do not belong to 9.2.

OK, we're all on board with leaving those for later.

> Only open questions are:

> * Is there better API than PQsetSingleRowMode()?  New PQsend*
>   functions is my alternative.

After thinking it over, I'm really unexcited about adding new versions
of all the PQsend functions.  If we had the prospect of more flags in
future that could be added to a bitmask flags argument, it would be
more palatable, but it's far from clear what those might be.  So I'm
now leaning towards using PQsetSingleRowMode as-is.

> * Should we rollback rowBuf change? I think no, as per benchmark
>   it performs better than old code.

I had already pretty much come to that conclusion just based on code
review, without thinking about performance.  In particular, we had done
some nontrivial work towards improving error-case handling in the data
message parsing code, and I don't really want to give that up, nor
rewrite it on the fly now.  About the only reason I could see for
reverting rowBuf was that I thought it might hurt performance; so now
that you've proven the opposite, we should leave it alone.

So I'm working from the first set of patches in your message
<20120721194907.GA28021@gmail.com>.
        regards, tom lane


Re: [patch] libpq one-row-at-a-time API

From
Marko Kreen
Date:
On Wed, Aug 1, 2012 at 6:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marko Kreen <markokr@gmail.com> writes:
>> * Is there better API than PQsetSingleRowMode()?  New PQsend*
>>   functions is my alternative.
>
> After thinking it over, I'm really unexcited about adding new versions
> of all the PQsend functions.  If we had the prospect of more flags in
> future that could be added to a bitmask flags argument, it would be
> more palatable, but it's far from clear what those might be.  So I'm
> now leaning towards using PQsetSingleRowMode as-is.

I can imagine such flag - I'd like to have a flag to allow send more
queries to server without waiting the finish of old ones.

Currently, when a batch-processing app wants to keep
backend busy, they need to stuff many statements into
single query.  Which is ugly.  Among other things it
loses the possibility to separate arguments from query,
and it breaks error reporting.  The flag would fix this,
and it would be useful also in other scenarios.

Interestingly, although it affects different area, it would be also
be flag for PQsend* and not for PQexec*.  So maybe
the direction is not completely wrong.

But I cannot imagine why the PQsetSingleRowMode would be
hard to keep working even if we have PQsend functions with flags,
so I'm not worried about getting it exactly right from the start.

>> * Should we rollback rowBuf change? I think no, as per benchmark
>>   it performs better than old code.
>
> I had already pretty much come to that conclusion just based on code
> review, without thinking about performance.  In particular, we had done
> some nontrivial work towards improving error-case handling in the data
> message parsing code, and I don't really want to give that up, nor
> rewrite it on the fly now.  About the only reason I could see for
> reverting rowBuf was that I thought it might hurt performance; so now
> that you've proven the opposite, we should leave it alone.

Additional argument for rowBuf is Merlin's wish for weirdly optimized
PGresults.  Basically, with rowBuf anybody who wants to experiment
with different ways to process row data just needs to implement
single function which processes data then advances the state
machine.  Like I did with PQgetRowData().

Without it, quite lot of code needs to be patched.

> So I'm working from the first set of patches in your message
> <20120721194907.GA28021@gmail.com>.

Great!

-- 
marko


Re: [patch] libpq one-row-at-a-time API

From
Tom Lane
Date:
Marko Kreen <markokr@gmail.com> writes:
> On Wed, Aug 1, 2012 at 6:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So I'm working from the first set of patches in your message
>> <20120721194907.GA28021@gmail.com>.

> Great!

Here's an updated draft patch.  I've not looked at the docs yet so
this doesn't include that, but I'm reasonably happy with the code
changes now.  The main difference from what you had is that I pushed
the creation of the single-row PGresults into pqRowProcessor, so that
they're formed immediately while scanning the input message.  That
other method with postponing examination of the rowBuf does not work,
any more than it did with the original patch, because you can't assume
that the network input buffer is going to hold still.  For example,
calling PQconsumeInput after parseInput has absorbed a row-data message
but before calling PQgetResult would likely break things.

In principle I suppose we could hack PQconsumeInput enough that it
didn't damage the row buffer while still meeting its API contract of
clearing the read-ready condition on the socket; but it wouldn't be
simple or cheap to do so.

            regards, tom lane

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 1e62d8091a9d2bdf60af6745d5a01ee14ee5cf5a..7cf86176c2385b9e4ee37c72d7e3c662ea079f7a 100644
*** a/contrib/dblink/dblink.c
--- b/contrib/dblink/dblink.c
*************** typedef struct storeInfo
*** 70,75 ****
--- 70,78 ----
      AttInMetadata *attinmeta;
      MemoryContext tmpcontext;
      char      **cstrs;
+     /* temp storage for results to avoid leaks on exception */
+     PGresult   *last_res;
+     PGresult   *cur_res;
  } storeInfo;

  /*
*************** static void materializeQueryResult(Funct
*** 83,90 ****
                         const char *conname,
                         const char *sql,
                         bool fail);
! static int storeHandler(PGresult *res, const PGdataValue *columns,
!              const char **errmsgp, void *param);
  static remoteConn *getConnectionByName(const char *name);
  static HTAB *createConnHash(void);
  static void createNewConnection(const char *name, remoteConn *rconn);
--- 86,93 ----
                         const char *conname,
                         const char *sql,
                         bool fail);
! static PGresult *storeQueryResult(storeInfo *sinfo, PGconn *conn, const char *sql);
! static void storeRow(storeInfo *sinfo, PGresult *res, bool first);
  static remoteConn *getConnectionByName(const char *name);
  static HTAB *createConnHash(void);
  static void createNewConnection(const char *name, remoteConn *rconn);
*************** materializeResult(FunctionCallInfo fcinf
*** 927,934 ****
  /*
   * Execute the given SQL command and store its results into a tuplestore
   * to be returned as the result of the current function.
   * This is equivalent to PQexec followed by materializeResult, but we make
!  * use of libpq's "row processor" API to reduce per-row overhead.
   */
  static void
  materializeQueryResult(FunctionCallInfo fcinfo,
--- 930,939 ----
  /*
   * Execute the given SQL command and store its results into a tuplestore
   * to be returned as the result of the current function.
+  *
   * This is equivalent to PQexec followed by materializeResult, but we make
!  * use of libpq's single-row mode to avoid accumulating the whole result
!  * inside libpq before it gets transferred to the tuplestore.
   */
  static void
  materializeQueryResult(FunctionCallInfo fcinfo,
*************** materializeQueryResult(FunctionCallInfo
*** 944,962 ****
      /* prepTuplestoreResult must have been called previously */
      Assert(rsinfo->returnMode == SFRM_Materialize);

      PG_TRY();
      {
!         /* initialize storeInfo to empty */
!         memset(&sinfo, 0, sizeof(sinfo));
!         sinfo.fcinfo = fcinfo;
!
!         /* We'll collect tuples using storeHandler */
!         PQsetRowProcessor(conn, storeHandler, &sinfo);
!
!         res = PQexec(conn, sql);
!
!         /* We don't keep the custom row processor installed permanently */
!         PQsetRowProcessor(conn, NULL, NULL);

          if (!res ||
              (PQresultStatus(res) != PGRES_COMMAND_OK &&
--- 949,962 ----
      /* prepTuplestoreResult must have been called previously */
      Assert(rsinfo->returnMode == SFRM_Materialize);

+     /* initialize storeInfo to empty */
+     memset(&sinfo, 0, sizeof(sinfo));
+     sinfo.fcinfo = fcinfo;
+
      PG_TRY();
      {
!         /* execute query, collecting any tuples into the tuplestore */
!         res = storeQueryResult(&sinfo, conn, sql);

          if (!res ||
              (PQresultStatus(res) != PGRES_COMMAND_OK &&
*************** materializeQueryResult(FunctionCallInfo
*** 975,982 ****
          else if (PQresultStatus(res) == PGRES_COMMAND_OK)
          {
              /*
!              * storeHandler didn't get called, so we need to convert the
!              * command status string to a tuple manually
               */
              TupleDesc    tupdesc;
              AttInMetadata *attinmeta;
--- 975,982 ----
          else if (PQresultStatus(res) == PGRES_COMMAND_OK)
          {
              /*
!              * storeRow didn't get called, so we need to convert the command
!              * status string to a tuple manually
               */
              TupleDesc    tupdesc;
              AttInMetadata *attinmeta;
*************** materializeQueryResult(FunctionCallInfo
*** 1008,1032 ****
              tuplestore_puttuple(tupstore, tuple);

              PQclear(res);
          }
          else
          {
              Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
!             /* storeHandler should have created a tuplestore */
              Assert(rsinfo->setResult != NULL);

              PQclear(res);
          }
      }
      PG_CATCH();
      {
-         /* be sure to unset the custom row processor */
-         PQsetRowProcessor(conn, NULL, NULL);
          /* be sure to release any libpq result we collected */
!         if (res)
!             PQclear(res);
          /* and clear out any pending data in libpq */
!         while ((res = PQskipResult(conn)) != NULL)
              PQclear(res);
          PG_RE_THROW();
      }
--- 1008,1037 ----
              tuplestore_puttuple(tupstore, tuple);

              PQclear(res);
+             res = NULL;
          }
          else
          {
              Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
!             /* storeRow should have created a tuplestore */
              Assert(rsinfo->setResult != NULL);

              PQclear(res);
+             res = NULL;
          }
+         PQclear(sinfo.last_res);
+         sinfo.last_res = NULL;
+         PQclear(sinfo.cur_res);
+         sinfo.cur_res = NULL;
      }
      PG_CATCH();
      {
          /* be sure to release any libpq result we collected */
!         PQclear(res);
!         PQclear(sinfo.last_res);
!         PQclear(sinfo.cur_res);
          /* and clear out any pending data in libpq */
!         while ((res = PQgetResult(conn)) != NULL)
              PQclear(res);
          PG_RE_THROW();
      }
*************** materializeQueryResult(FunctionCallInfo
*** 1034,1056 ****
  }

  /*
!  * Custom row processor for materializeQueryResult.
!  * Prototype of this function must match PQrowProcessor.
   */
! static int
! storeHandler(PGresult *res, const PGdataValue *columns,
!              const char **errmsgp, void *param)
  {
-     storeInfo  *sinfo = (storeInfo *) param;
      int            nfields = PQnfields(res);
-     char      **cstrs = sinfo->cstrs;
      HeapTuple    tuple;
-     char       *pbuf;
-     int            pbuflen;
      int            i;
      MemoryContext oldcontext;

!     if (columns == NULL)
      {
          /* Prepare for new result set */
          ReturnSetInfo *rsinfo = (ReturnSetInfo *) sinfo->fcinfo->resultinfo;
--- 1039,1108 ----
  }

  /*
!  * Execute query, and send any result rows to sinfo->tuplestore.
   */
! static PGresult *
! storeQueryResult(storeInfo *sinfo, PGconn *conn, const char *sql)
! {
!     bool        first = true;
!     PGresult   *res;
!
!     if (!PQsendQuery(conn, sql))
!         return PQgetResult(conn);
!
!     if (!PQsetSingleRowMode(conn))        /* shouldn't fail */
!         elog(ERROR, "dblink: failed to set single-row mode");
!
!     for (;;)
!     {
!         CHECK_FOR_INTERRUPTS();
!
!         sinfo->cur_res = PQgetResult(conn);
!         if (!sinfo->cur_res)
!             break;
!
!         if (PQresultStatus(sinfo->cur_res) == PGRES_SINGLE_TUPLE)
!         {
!             /* got one row from possibly-bigger resultset */
!             storeRow(sinfo, sinfo->cur_res, first);
!
!             PQclear(sinfo->cur_res);
!             sinfo->cur_res = NULL;
!             first = false;
!         }
!         else
!         {
!             /* if empty resultset, fill tuplestore header */
!             if (first && PQresultStatus(sinfo->cur_res) == PGRES_TUPLES_OK)
!                 storeRow(sinfo, sinfo->cur_res, first);
!
!             /* store completed result at last_res */
!             PQclear(sinfo->last_res);
!             sinfo->last_res = sinfo->cur_res;
!             sinfo->cur_res = NULL;
!             first = true;
!         }
!     }
!
!     /* return last_res */
!     res = sinfo->last_res;
!     sinfo->last_res = NULL;
!     return res;
! }
!
! /*
!  * Send single row to sinfo->tuplestore.
!  * If "first" is true, create the tuplestore using PGresult's metadata.
!  */
! static void
! storeRow(storeInfo *sinfo, PGresult *res, bool first)
  {
      int            nfields = PQnfields(res);
      HeapTuple    tuple;
      int            i;
      MemoryContext oldcontext;

!     if (first)
      {
          /* Prepare for new result set */
          ReturnSetInfo *rsinfo = (ReturnSetInfo *) sinfo->fcinfo->resultinfo;
*************** storeHandler(PGresult *res, const PGdata
*** 1098,1110 ****
          sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);

          /* Create a new, empty tuplestore */
!         oldcontext = MemoryContextSwitchTo(
!                                     rsinfo->econtext->ecxt_per_query_memory);
          sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
          rsinfo->setResult = sinfo->tuplestore;
          rsinfo->setDesc = tupdesc;
          MemoryContextSwitchTo(oldcontext);

          /*
           * Set up sufficiently-wide string pointers array; this won't change
           * in size so it's easy to preallocate.
--- 1150,1165 ----
          sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);

          /* Create a new, empty tuplestore */
!         oldcontext = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
          sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
          rsinfo->setResult = sinfo->tuplestore;
          rsinfo->setDesc = tupdesc;
          MemoryContextSwitchTo(oldcontext);

+         /* Done if empty resultset */
+         if (PQntuples(res) == 0)
+             return;
+
          /*
           * Set up sufficiently-wide string pointers array; this won't change
           * in size so it's easy to preallocate.
*************** storeHandler(PGresult *res, const PGdata
*** 1121,1131 ****
                                        ALLOCSET_DEFAULT_MINSIZE,
                                        ALLOCSET_DEFAULT_INITSIZE,
                                        ALLOCSET_DEFAULT_MAXSIZE);
-
-         return 1;
      }

!     CHECK_FOR_INTERRUPTS();

      /*
       * Do the following work in a temp context that we reset after each tuple.
--- 1176,1185 ----
                                        ALLOCSET_DEFAULT_MINSIZE,
                                        ALLOCSET_DEFAULT_INITSIZE,
                                        ALLOCSET_DEFAULT_MAXSIZE);
      }

!     /* Should have a single-row result if we get here */
!     Assert(PQntuples(res) == 1);

      /*
       * Do the following work in a temp context that we reset after each tuple.
*************** storeHandler(PGresult *res, const PGdata
*** 1135,1180 ****
      oldcontext = MemoryContextSwitchTo(sinfo->tmpcontext);

      /*
!      * The strings passed to us are not null-terminated, but the datatype
!      * input functions we're about to call require null termination.  Copy the
!      * strings and add null termination.  As a micro-optimization, allocate
!      * all the strings with one palloc.
       */
-     pbuflen = nfields;            /* count the null terminators themselves */
-     for (i = 0; i < nfields; i++)
-     {
-         int            len = columns[i].len;
-
-         if (len > 0)
-             pbuflen += len;
-     }
-     pbuf = (char *) palloc(pbuflen);
-
      for (i = 0; i < nfields; i++)
      {
!         int            len = columns[i].len;
!
!         if (len < 0)
!             cstrs[i] = NULL;
          else
!         {
!             cstrs[i] = pbuf;
!             memcpy(pbuf, columns[i].value, len);
!             pbuf += len;
!             *pbuf++ = '\0';
!         }
      }

      /* Convert row to a tuple, and add it to the tuplestore */
!     tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);

      tuplestore_puttuple(sinfo->tuplestore, tuple);

      /* Clean up */
      MemoryContextSwitchTo(oldcontext);
      MemoryContextReset(sinfo->tmpcontext);
-
-     return 1;
  }

  /*
--- 1189,1212 ----
      oldcontext = MemoryContextSwitchTo(sinfo->tmpcontext);

      /*
!      * Fill cstrs with null-terminated strings of column values.
       */
      for (i = 0; i < nfields; i++)
      {
!         if (PQgetisnull(res, 0, i))
!             sinfo->cstrs[i] = NULL;
          else
!             sinfo->cstrs[i] = PQgetvalue(res, 0, i);
      }

      /* Convert row to a tuple, and add it to the tuplestore */
!     tuple = BuildTupleFromCStrings(sinfo->attinmeta, sinfo->cstrs);

      tuplestore_puttuple(sinfo->tuplestore, tuple);

      /* Clean up */
      MemoryContextSwitchTo(oldcontext);
      MemoryContextReset(sinfo->tmpcontext);
  }

  /*
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 1251455f1f6d92c74e206b5a9b8dcdeb36ac9b98..9d95e262be3fbf26731c0926013db56dbc4e00ab 100644
*** a/src/interfaces/libpq/exports.txt
--- b/src/interfaces/libpq/exports.txt
*************** PQconnectStartParams      157
*** 160,165 ****
  PQping                    158
  PQpingParams              159
  PQlibVersion              160
! PQsetRowProcessor         161
! PQgetRowProcessor         162
! PQskipResult              163
--- 160,163 ----
  PQping                    158
  PQpingParams              159
  PQlibVersion              160
! PQsetSingleRowMode        161
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index a32258a8cbab5e655484d0471c031864fa5084de..adaab7aaade60a980b7eaf5b6e9734444c72b930 100644
*** a/src/interfaces/libpq/fe-connect.c
--- b/src/interfaces/libpq/fe-connect.c
*************** makeEmptyPGconn(void)
*** 2709,2716 ****
      /* Zero all pointers and booleans */
      MemSet(conn, 0, sizeof(PGconn));

!     /* install default row processor and notice hooks */
!     PQsetRowProcessor(conn, NULL, NULL);
      conn->noticeHooks.noticeRec = defaultNoticeReceiver;
      conn->noticeHooks.noticeProc = defaultNoticeProcessor;

--- 2709,2715 ----
      /* Zero all pointers and booleans */
      MemSet(conn, 0, sizeof(PGconn));

!     /* install default notice hooks */
      conn->noticeHooks.noticeRec = defaultNoticeReceiver;
      conn->noticeHooks.noticeProc = defaultNoticeProcessor;

*************** conninfo_uri_parse_options(PQconninfoOpt
*** 4658,4664 ****
          if (p == host)
          {
              printfPQExpBuffer(errorMessage,
!             libpq_gettext("IPv6 host address may not be empty in URI: \"%s\"\n"),
                                uri);
              goto cleanup;
          }
--- 4657,4663 ----
          if (p == host)
          {
              printfPQExpBuffer(errorMessage,
!                               libpq_gettext("IPv6 host address may not be empty in URI: \"%s\"\n"),
                                uri);
              goto cleanup;
          }
*************** conninfo_uri_parse_params(char *params,
*** 4878,4884 ****

              printfPQExpBuffer(errorMessage,
                                libpq_gettext(
!                                      "invalid URI query parameter: \"%s\"\n"),
                                keyword);
              return false;
          }
--- 4877,4883 ----

              printfPQExpBuffer(errorMessage,
                                libpq_gettext(
!                                     "invalid URI query parameter: \"%s\"\n"),
                                keyword);
              return false;
          }
*************** conninfo_uri_decode(const char *str, PQE
*** 4943,4949 ****
              if (!(get_hexdigit(*q++, &hi) && get_hexdigit(*q++, &lo)))
              {
                  printfPQExpBuffer(errorMessage,
!                         libpq_gettext("invalid percent-encoded token: \"%s\"\n"),
                                    str);
                  free(buf);
                  return NULL;
--- 4942,4948 ----
              if (!(get_hexdigit(*q++, &hi) && get_hexdigit(*q++, &lo)))
              {
                  printfPQExpBuffer(errorMessage,
!                     libpq_gettext("invalid percent-encoded token: \"%s\"\n"),
                                    str);
                  free(buf);
                  return NULL;
*************** static void
*** 5594,5601 ****
  dot_pg_pass_warning(PGconn *conn)
  {
      /* If it was 'invalid authorization', add .pgpass mention */
-     if (conn->dot_pgpass_used && conn->password_needed && conn->result &&
      /* only works with >= 9.0 servers */
          strcmp(PQresultErrorField(conn->result, PG_DIAG_SQLSTATE),
                 ERRCODE_INVALID_PASSWORD) == 0)
      {
--- 5593,5600 ----
  dot_pg_pass_warning(PGconn *conn)
  {
      /* If it was 'invalid authorization', add .pgpass mention */
      /* only works with >= 9.0 servers */
+     if (conn->dot_pgpass_used && conn->password_needed && conn->result &&
          strcmp(PQresultErrorField(conn->result, PG_DIAG_SQLSTATE),
                 ERRCODE_INVALID_PASSWORD) == 0)
      {
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index badc0b32a8e8f3e33e42729a8f8899475a0de31b..53516db723492f11fc9f4bdc6d4d351693c4c22f 100644
*** a/src/interfaces/libpq/fe-exec.c
--- b/src/interfaces/libpq/fe-exec.c
*************** char       *const pgresStatus[] = {
*** 38,44 ****
      "PGRES_BAD_RESPONSE",
      "PGRES_NONFATAL_ERROR",
      "PGRES_FATAL_ERROR",
!     "PGRES_COPY_BOTH"
  };

  /*
--- 38,45 ----
      "PGRES_BAD_RESPONSE",
      "PGRES_NONFATAL_ERROR",
      "PGRES_FATAL_ERROR",
!     "PGRES_COPY_BOTH",
!     "PGRES_SINGLE_TUPLE"
  };

  /*
*************** static bool static_std_strings = false;
*** 51,58 ****

  static PGEvent *dupEvents(PGEvent *events, int count);
  static bool pqAddTuple(PGresult *res, PGresAttValue *tup);
- static int pqStdRowProcessor(PGresult *res, const PGdataValue *columns,
-                   const char **errmsgp, void *param);
  static bool PQsendQueryStart(PGconn *conn);
  static int PQsendQueryGuts(PGconn *conn,
                  const char *command,
--- 52,57 ----
*************** static int PQsendQueryGuts(PGconn *conn,
*** 64,71 ****
                  const int *paramFormats,
                  int resultFormat);
  static void parseInput(PGconn *conn);
- static int dummyRowProcessor(PGresult *res, const PGdataValue *columns,
-                   const char **errmsgp, void *param);
  static bool PQexecStart(PGconn *conn);
  static PGresult *PQexecFinish(PGconn *conn);
  static int PQsendDescribe(PGconn *conn, char desc_type,
--- 63,68 ----
*************** PQmakeEmptyPGresult(PGconn *conn, ExecSt
*** 181,186 ****
--- 178,184 ----
              case PGRES_COPY_OUT:
              case PGRES_COPY_IN:
              case PGRES_COPY_BOTH:
+             case PGRES_SINGLE_TUPLE:
                  /* non-error cases */
                  break;
              default:
*************** PQclear(PGresult *res)
*** 698,703 ****
--- 696,703 ----

  /*
   * Handy subroutine to deallocate any partially constructed async result.
+  *
+  * Any "next" result gets cleared too.
   */
  void
  pqClearAsyncResult(PGconn *conn)
*************** pqClearAsyncResult(PGconn *conn)
*** 705,710 ****
--- 705,713 ----
      if (conn->result)
          PQclear(conn->result);
      conn->result = NULL;
+     if (conn->next_result)
+         PQclear(conn->next_result);
+     conn->next_result = NULL;
  }

  /*
*************** pqPrepareAsyncResult(PGconn *conn)
*** 758,764 ****
       * conn->errorMessage.
       */
      res = conn->result;
-     conn->result = NULL;        /* handing over ownership to caller */
      if (!res)
          res = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
      else
--- 761,766 ----
*************** pqPrepareAsyncResult(PGconn *conn)
*** 771,776 ****
--- 773,788 ----
          appendPQExpBufferStr(&conn->errorMessage,
                               PQresultErrorMessage(res));
      }
+
+     /*
+      * Replace conn->result with next_result, if any.  In the normal case
+      * there isn't a next result and we're just dropping ownership of the
+      * current result.    In single-row mode this restores the situation to what
+      * it was before we created the current single-row result.
+      */
+     conn->result = conn->next_result;
+     conn->next_result = NULL;
+
      return res;
  }

*************** pqSaveParameterStatus(PGconn *conn, cons
*** 981,1065 ****


  /*
!  * PQsetRowProcessor
!  *      Set function that copies row data out from the network buffer,
!  *      along with a passthrough parameter for it.
!  */
! void
! PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param)
! {
!     if (!conn)
!         return;
!
!     if (func)
!     {
!         /* set custom row processor */
!         conn->rowProcessor = func;
!         conn->rowProcessorParam = param;
!     }
!     else
!     {
!         /* set default row processor */
!         conn->rowProcessor = pqStdRowProcessor;
!         conn->rowProcessorParam = conn;
!     }
! }
!
! /*
!  * PQgetRowProcessor
!  *      Get current row processor of PGconn.
!  *      If param is not NULL, also store the passthrough parameter at *param.
!  */
! PQrowProcessor
! PQgetRowProcessor(const PGconn *conn, void **param)
! {
!     if (!conn)
!     {
!         if (param)
!             *param = NULL;
!         return NULL;
!     }
!
!     if (param)
!         *param = conn->rowProcessorParam;
!     return conn->rowProcessor;
! }
!
! /*
!  * pqStdRowProcessor
!  *      Add the received row to the PGresult structure
!  *      Returns 1 if OK, -1 if error occurred.
   *
!  * Note: "param" should point to the PGconn, but we don't actually need that
!  * as of the current coding.
   */
! static int
! pqStdRowProcessor(PGresult *res, const PGdataValue *columns,
!                   const char **errmsgp, void *param)
  {
      int            nfields = res->numAttributes;
      PGresAttValue *tup;
      int            i;

!     if (columns == NULL)
      {
!         /* New result set ... we have nothing to do in this function. */
!         return 1;
      }

      /*
       * Basically we just allocate space in the PGresult for each field and
       * copy the data over.
       *
!      * Note: on malloc failure, we return -1 leaving *errmsgp still NULL,
!      * which caller will take to mean "out of memory".    This is preferable to
!      * trying to set up such a message here, because evidently there's not
!      * enough memory for gettext() to do anything.
       */
      tup = (PGresAttValue *)
          pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE);
      if (tup == NULL)
!         return -1;

      for (i = 0; i < nfields; i++)
      {
--- 993,1047 ----


  /*
!  * pqRowProcessor
!  *      Add the received row to the current async result (conn->result).
!  *      Returns 1 if OK, 0 if error occurred.
   *
!  * On error, *errmsgp can be set to an error string to be returned.
!  * If it is left NULL, the error is presumed to be "out of memory".
!  *
!  * In single-row mode, we create a new result holding just the current row,
!  * stashing the previous result in conn->next_result so that it becomes
!  * active again after pqPrepareAsyncResult().  This allows the result metadata
!  * (column descriptions) to be carried forward to each result row.
   */
! int
! pqRowProcessor(PGconn *conn, const char **errmsgp)
  {
+     PGresult   *res = conn->result;
      int            nfields = res->numAttributes;
+     const PGdataValue *columns = conn->rowBuf;
      PGresAttValue *tup;
      int            i;

!     /*
!      * In single-row mode, make a new PGresult that will hold just this one
!      * row; the original conn->result is left unchanged so that it can be used
!      * again as the template for future rows.
!      */
!     if (conn->singleRowMode)
      {
!         /* Copy everything that should be in the result at this point */
!         res = PQcopyResult(res,
!                            PG_COPYRES_ATTRS | PG_COPYRES_EVENTS |
!                            PG_COPYRES_NOTICEHOOKS);
!         if (!res)
!             return 0;
      }

      /*
       * Basically we just allocate space in the PGresult for each field and
       * copy the data over.
       *
!      * Note: on malloc failure, we return 0 leaving *errmsgp still NULL, which
!      * caller will take to mean "out of memory".  This is preferable to trying
!      * to set up such a message here, because evidently there's not enough
!      * memory for gettext() to do anything.
       */
      tup = (PGresAttValue *)
          pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE);
      if (tup == NULL)
!         goto fail;

      for (i = 0; i < nfields; i++)
      {
*************** pqStdRowProcessor(PGresult *res, const P
*** 1078,1084 ****

              val = (char *) pqResultAlloc(res, clen + 1, isbinary);
              if (val == NULL)
!                 return -1;

              /* copy and zero-terminate the data (even if it's binary) */
              memcpy(val, columns[i].value, clen);
--- 1060,1066 ----

              val = (char *) pqResultAlloc(res, clen + 1, isbinary);
              if (val == NULL)
!                 goto fail;

              /* copy and zero-terminate the data (even if it's binary) */
              memcpy(val, columns[i].value, clen);
*************** pqStdRowProcessor(PGresult *res, const P
*** 1091,1100 ****

      /* And add the tuple to the PGresult's tuple array */
      if (!pqAddTuple(res, tup))
!         return -1;

-     /* Success */
      return 1;
  }


--- 1073,1102 ----

      /* And add the tuple to the PGresult's tuple array */
      if (!pqAddTuple(res, tup))
!         goto fail;
!
!     /*
!      * Success.  In single-row mode, make the result available to the client
!      * immediately.
!      */
!     if (conn->singleRowMode)
!     {
!         /* Change result status to special single-row value */
!         res->resultStatus = PGRES_SINGLE_TUPLE;
!         /* Stash old result for re-use later */
!         conn->next_result = conn->result;
!         conn->result = res;
!         /* And mark the result ready to return */
!         conn->asyncStatus = PGASYNC_READY;
!     }

      return 1;
+
+ fail:
+     /* release locally allocated PGresult, if we made one */
+     if (res != conn->result)
+         PQclear(res);
+     return 0;
  }


*************** PQsendQueryStart(PGconn *conn)
*** 1343,1348 ****
--- 1345,1354 ----

      /* initialize async result-accumulation state */
      conn->result = NULL;
+     conn->next_result = NULL;
+
+     /* reset single-row processing mode */
+     conn->singleRowMode = false;

      /* ready to send command message */
      return true;
*************** pqHandleSendFailure(PGconn *conn)
*** 1548,1553 ****
--- 1554,1584 ----
  }

  /*
+  * Select row-by-row processing mode
+  */
+ int
+ PQsetSingleRowMode(PGconn *conn)
+ {
+     /*
+      * Only allow setting the flag when we have launched a query and not yet
+      * received any results.
+      */
+     if (!conn)
+         return 0;
+     if (conn->asyncStatus != PGASYNC_BUSY)
+         return 0;
+     if (conn->queryclass != PGQUERY_SIMPLE &&
+         conn->queryclass != PGQUERY_EXTENDED)
+         return 0;
+     if (conn->result)
+         return 0;
+
+     /* OK, set flag */
+     conn->singleRowMode = true;
+     return 1;
+ }
+
+ /*
   * Consume any available input from the backend
   * 0 return: some kind of trouble
   * 1 return: no problem
*************** PQconsumeInput(PGconn *conn)
*** 1587,1595 ****
   * parseInput: if appropriate, parse input data from backend
   * until input is exhausted or a stopping state is reached.
   * Note that this function will NOT attempt to read more data from the backend.
-  *
-  * Note: callers of parseInput must be prepared for a longjmp exit when we are
-  * in PGASYNC_BUSY state, since an external row processor might do that.
   */
  static void
  parseInput(PGconn *conn)
--- 1618,1623 ----
*************** PQgetResult(PGconn *conn)
*** 1737,1785 ****
      return res;
  }

- /*
-  * PQskipResult
-  *      Get the next PGresult produced by a query, but discard any data rows.
-  *
-  * This is mainly useful for cleaning up after a longjmp out of a row
-  * processor, when resuming processing of the current query result isn't
-  * wanted.    Note that this is of little value in an async-style application,
-  * since any preceding calls to PQisBusy would have already called the regular
-  * row processor.
-  */
- PGresult *
- PQskipResult(PGconn *conn)
- {
-     PGresult   *res;
-     PQrowProcessor savedRowProcessor;
-
-     if (!conn)
-         return NULL;
-
-     /* temporarily install dummy row processor */
-     savedRowProcessor = conn->rowProcessor;
-     conn->rowProcessor = dummyRowProcessor;
-     /* no need to save/change rowProcessorParam */
-
-     /* fetch the next result */
-     res = PQgetResult(conn);
-
-     /* restore previous row processor */
-     conn->rowProcessor = savedRowProcessor;
-
-     return res;
- }
-
- /*
-  * Do-nothing row processor for PQskipResult
-  */
- static int
- dummyRowProcessor(PGresult *res, const PGdataValue *columns,
-                   const char **errmsgp, void *param)
- {
-     return 1;
- }
-

  /*
   * PQexec
--- 1765,1770 ----
*************** PQexecStart(PGconn *conn)
*** 1886,1892 ****
       * Silently discard any prior query result that application didn't eat.
       * This is probably poor design, but it's here for backward compatibility.
       */
!     while ((result = PQskipResult(conn)) != NULL)
      {
          ExecStatusType resultStatus = result->resultStatus;

--- 1871,1877 ----
       * Silently discard any prior query result that application didn't eat.
       * This is probably poor design, but it's here for backward compatibility.
       */
!     while ((result = PQgetResult(conn)) != NULL)
      {
          ExecStatusType resultStatus = result->resultStatus;

diff --git a/src/interfaces/libpq/fe-lobj.c b/src/interfaces/libpq/fe-lobj.c
index 13fd98c2f913d3818758e75bac96822306981b52..f3a6d0341c13ce644a7f1b2f15919385b85d6a6e 100644
*** a/src/interfaces/libpq/fe-lobj.c
--- b/src/interfaces/libpq/fe-lobj.c
*************** lo_initialize(PGconn *conn)
*** 682,689 ****
      int            n;
      const char *query;
      const char *fname;
-     PQrowProcessor savedRowProcessor;
-     void       *savedRowProcessorParam;
      Oid            foid;

      if (!conn)
--- 682,687 ----
*************** lo_initialize(PGconn *conn)
*** 732,747 ****
              "or proname = 'loread' "
              "or proname = 'lowrite'";

-     /* Ensure the standard row processor is used to collect the result */
-     savedRowProcessor = conn->rowProcessor;
-     savedRowProcessorParam = conn->rowProcessorParam;
-     PQsetRowProcessor(conn, NULL, NULL);
-
      res = PQexec(conn, query);
-
-     conn->rowProcessor = savedRowProcessor;
-     conn->rowProcessorParam = savedRowProcessorParam;
-
      if (res == NULL)
      {
          free(lobjfuncs);
--- 730,736 ----
diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c
index 8dbd6b6982395850fc167eb20cb8d5590d33122f..1ba5885cd3b419a97f4fc5ea897eac209f364897 100644
*** a/src/interfaces/libpq/fe-protocol2.c
--- b/src/interfaces/libpq/fe-protocol2.c
*************** static int    getNotify(PGconn *conn);
*** 49,67 ****
  PostgresPollingStatusType
  pqSetenvPoll(PGconn *conn)
  {
-     PostgresPollingStatusType result;
      PGresult   *res;
-     PQrowProcessor savedRowProcessor;
-     void       *savedRowProcessorParam;

      if (conn == NULL || conn->status == CONNECTION_BAD)
          return PGRES_POLLING_FAILED;

-     /* Ensure the standard row processor is used to collect any results */
-     savedRowProcessor = conn->rowProcessor;
-     savedRowProcessorParam = conn->rowProcessorParam;
-     PQsetRowProcessor(conn, NULL, NULL);
-
      /* Check whether there are any data for us */
      switch (conn->setenv_state)
      {
--- 49,59 ----
*************** pqSetenvPoll(PGconn *conn)
*** 77,86 ****
                  if (n < 0)
                      goto error_return;
                  if (n == 0)
!                 {
!                     result = PGRES_POLLING_READING;
!                     goto normal_return;
!                 }

                  break;
              }
--- 69,75 ----
                  if (n < 0)
                      goto error_return;
                  if (n == 0)
!                     return PGRES_POLLING_READING;

                  break;
              }
*************** pqSetenvPoll(PGconn *conn)
*** 94,101 ****

              /* Should we raise an error if called when not active? */
          case SETENV_STATE_IDLE:
!             result = PGRES_POLLING_OK;
!             goto normal_return;

          default:
              printfPQExpBuffer(&conn->errorMessage,
--- 83,89 ----

              /* Should we raise an error if called when not active? */
          case SETENV_STATE_IDLE:
!             return PGRES_POLLING_OK;

          default:
              printfPQExpBuffer(&conn->errorMessage,
*************** pqSetenvPoll(PGconn *conn)
*** 192,201 ****
              case SETENV_STATE_CLIENT_ENCODING_WAIT:
                  {
                      if (PQisBusy(conn))
!                     {
!                         result = PGRES_POLLING_READING;
!                         goto normal_return;
!                     }

                      res = PQgetResult(conn);

--- 180,186 ----
              case SETENV_STATE_CLIENT_ENCODING_WAIT:
                  {
                      if (PQisBusy(conn))
!                         return PGRES_POLLING_READING;

                      res = PQgetResult(conn);

*************** pqSetenvPoll(PGconn *conn)
*** 220,229 ****
              case SETENV_STATE_OPTION_WAIT:
                  {
                      if (PQisBusy(conn))
!                     {
!                         result = PGRES_POLLING_READING;
!                         goto normal_return;
!                     }

                      res = PQgetResult(conn);

--- 205,211 ----
              case SETENV_STATE_OPTION_WAIT:
                  {
                      if (PQisBusy(conn))
!                         return PGRES_POLLING_READING;

                      res = PQgetResult(conn);

*************** pqSetenvPoll(PGconn *conn)
*** 262,278 ****
                          goto error_return;

                      conn->setenv_state = SETENV_STATE_QUERY1_WAIT;
!                     result = PGRES_POLLING_READING;
!                     goto normal_return;
                  }

              case SETENV_STATE_QUERY1_WAIT:
                  {
                      if (PQisBusy(conn))
!                     {
!                         result = PGRES_POLLING_READING;
!                         goto normal_return;
!                     }

                      res = PQgetResult(conn);

--- 244,256 ----
                          goto error_return;

                      conn->setenv_state = SETENV_STATE_QUERY1_WAIT;
!                     return PGRES_POLLING_READING;
                  }

              case SETENV_STATE_QUERY1_WAIT:
                  {
                      if (PQisBusy(conn))
!                         return PGRES_POLLING_READING;

                      res = PQgetResult(conn);

*************** pqSetenvPoll(PGconn *conn)
*** 349,365 ****
                          goto error_return;

                      conn->setenv_state = SETENV_STATE_QUERY2_WAIT;
!                     result = PGRES_POLLING_READING;
!                     goto normal_return;
                  }

              case SETENV_STATE_QUERY2_WAIT:
                  {
                      if (PQisBusy(conn))
!                     {
!                         result = PGRES_POLLING_READING;
!                         goto normal_return;
!                     }

                      res = PQgetResult(conn);

--- 327,339 ----
                          goto error_return;

                      conn->setenv_state = SETENV_STATE_QUERY2_WAIT;
!                     return PGRES_POLLING_READING;
                  }

              case SETENV_STATE_QUERY2_WAIT:
                  {
                      if (PQisBusy(conn))
!                         return PGRES_POLLING_READING;

                      res = PQgetResult(conn);

*************** pqSetenvPoll(PGconn *conn)
*** 406,413 ****
                      {
                          /* Query finished, so we're done */
                          conn->setenv_state = SETENV_STATE_IDLE;
!                         result = PGRES_POLLING_OK;
!                         goto normal_return;
                      }
                      break;
                  }
--- 380,386 ----
                      {
                          /* Query finished, so we're done */
                          conn->setenv_state = SETENV_STATE_IDLE;
!                         return PGRES_POLLING_OK;
                      }
                      break;
                  }
*************** pqSetenvPoll(PGconn *conn)
*** 425,436 ****

  error_return:
      conn->setenv_state = SETENV_STATE_IDLE;
!     result = PGRES_POLLING_FAILED;
!
! normal_return:
!     conn->rowProcessor = savedRowProcessor;
!     conn->rowProcessorParam = savedRowProcessorParam;
!     return result;
  }


--- 398,404 ----

  error_return:
      conn->setenv_state = SETENV_STATE_IDLE;
!     return PGRES_POLLING_FAILED;
  }


*************** normal_return:
*** 438,446 ****
   * parseInput: if appropriate, parse input data from backend
   * until input is exhausted or a stopping state is reached.
   * Note that this function will NOT attempt to read more data from the backend.
-  *
-  * Note: callers of parseInput must be prepared for a longjmp exit when we are
-  * in PGASYNC_BUSY state, since an external row processor might do that.
   */
  void
  pqParseInput2(PGconn *conn)
--- 406,411 ----
*************** getRowDescriptions(PGconn *conn)
*** 746,776 ****
      /* Success! */
      conn->result = result;

!     /*
!      * Advance inStart to show that the "T" message has been processed.  We
!      * must do this before calling the row processor, in case it longjmps.
!      */
      conn->inStart = conn->inCursor;

!     /* Give the row processor a chance to initialize for new result set */
!     errmsg = NULL;
!     switch ((*conn->rowProcessor) (result, NULL, &errmsg,
!                                    conn->rowProcessorParam))
!     {
!         case 1:
!             /* everything is good */
!             return 0;
!
!         case -1:
!             /* error, report the errmsg below */
!             break;

!         default:
!             /* unrecognized return code */
!             errmsg = libpq_gettext("unrecognized return value from row processor");
!             break;
!     }
!     goto set_error_result;

  advance_and_error:

--- 711,726 ----
      /* Success! */
      conn->result = result;

!     /* Advance inStart to show that the "T" message has been processed. */
      conn->inStart = conn->inCursor;

!     /*
!      * We could perform additional setup for the new result set here, but for
!      * now there's nothing else to do.
!      */

!     /* And we're done. */
!     return 0;

  advance_and_error:

*************** advance_and_error:
*** 781,788 ****
       */
      conn->inStart = conn->inEnd;

- set_error_result:
-
      /*
       * Replace partially constructed result with an error result. First
       * discard the old result to try to win back some memory.
--- 731,736 ----
*************** set_error_result:
*** 790,796 ****
      pqClearAsyncResult(conn);

      /*
!      * If row processor didn't provide an error message, assume "out of
       * memory" was meant.  The advantage of having this special case is that
       * freeing the old result first greatly improves the odds that gettext()
       * will succeed in providing a translation.
--- 738,744 ----
      pqClearAsyncResult(conn);

      /*
!      * If preceding code didn't provide an error message, assume "out of
       * memory" was meant.  The advantage of having this special case is that
       * freeing the old result first greatly improves the odds that gettext()
       * will succeed in providing a translation.
*************** getAnotherTuple(PGconn *conn, bool binar
*** 937,967 ****
          free(bitmap);
      bitmap = NULL;

!     /*
!      * Advance inStart to show that the "D" message has been processed.  We
!      * must do this before calling the row processor, in case it longjmps.
!      */
      conn->inStart = conn->inCursor;

!     /* Pass the completed row values to rowProcessor */
      errmsg = NULL;
!     switch ((*conn->rowProcessor) (result, rowbuf, &errmsg,
!                                    conn->rowProcessorParam))
!     {
!         case 1:
!             /* everything is good */
!             return 0;
!
!         case -1:
!             /* error, report the errmsg below */
!             break;

!         default:
!             /* unrecognized return code */
!             errmsg = libpq_gettext("unrecognized return value from row processor");
!             break;
!     }
!     goto set_error_result;

  advance_and_error:

--- 885,899 ----
          free(bitmap);
      bitmap = NULL;

!     /* Advance inStart to show that the "D" message has been processed. */
      conn->inStart = conn->inCursor;

!     /* Process the collected row */
      errmsg = NULL;
!     if (pqRowProcessor(conn, &errmsg))
!         return 0;                /* normal, successful exit */

!     goto set_error_result;        /* pqRowProcessor failed, report it */

  advance_and_error:

*************** set_error_result:
*** 981,987 ****
      pqClearAsyncResult(conn);

      /*
!      * If row processor didn't provide an error message, assume "out of
       * memory" was meant.  The advantage of having this special case is that
       * freeing the old result first greatly improves the odds that gettext()
       * will succeed in providing a translation.
--- 913,919 ----
      pqClearAsyncResult(conn);

      /*
!      * If preceding code didn't provide an error message, assume "out of
       * memory" was meant.  The advantage of having this special case is that
       * freeing the old result first greatly improves the odds that gettext()
       * will succeed in providing a translation.
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 173af2e0a79ef3b443de515cda851e277deaed2d..d289f82285fea00d5de20542e43ea103493f9e58 100644
*** a/src/interfaces/libpq/fe-protocol3.c
--- b/src/interfaces/libpq/fe-protocol3.c
*************** static int build_startup_packet(const PG
*** 61,69 ****
   * parseInput: if appropriate, parse input data from backend
   * until input is exhausted or a stopping state is reached.
   * Note that this function will NOT attempt to read more data from the backend.
-  *
-  * Note: callers of parseInput must be prepared for a longjmp exit when we are
-  * in PGASYNC_BUSY state, since an external row processor might do that.
   */
  void
  pqParseInput3(PGconn *conn)
--- 61,66 ----
*************** handleSyncLoss(PGconn *conn, char id, in
*** 446,455 ****
   * Returns: 0 if processed message successfully, EOF to suspend parsing
   * (the latter case is not actually used currently).
   * In either case, conn->inStart has been advanced past the message.
-  *
-  * Note: the row processor could also choose to longjmp out of libpq,
-  * in which case the library's state must allow for resumption at the
-  * next message.
   */
  static int
  getRowDescriptions(PGconn *conn, int msgLength)
--- 443,448 ----
*************** getRowDescriptions(PGconn *conn, int msg
*** 564,573 ****
      /* Success! */
      conn->result = result;

!     /*
!      * Advance inStart to show that the "T" message has been processed.  We
!      * must do this before calling the row processor, in case it longjmps.
!      */
      conn->inStart = conn->inCursor;

      /*
--- 557,563 ----
      /* Success! */
      conn->result = result;

!     /* Advance inStart to show that the "T" message has been processed. */
      conn->inStart = conn->inCursor;

      /*
*************** getRowDescriptions(PGconn *conn, int msg
*** 580,604 ****
          return 0;
      }

!     /* Give the row processor a chance to initialize for new result set */
!     errmsg = NULL;
!     switch ((*conn->rowProcessor) (result, NULL, &errmsg,
!                                    conn->rowProcessorParam))
!     {
!         case 1:
!             /* everything is good */
!             return 0;
!
!         case -1:
!             /* error, report the errmsg below */
!             break;

!         default:
!             /* unrecognized return code */
!             errmsg = libpq_gettext("unrecognized return value from row processor");
!             break;
!     }
!     goto set_error_result;

  advance_and_error:
      /* Discard unsaved result, if any */
--- 570,582 ----
          return 0;
      }

!     /*
!      * We could perform additional setup for the new result set here, but for
!      * now there's nothing else to do.
!      */

!     /* And we're done. */
!     return 0;

  advance_and_error:
      /* Discard unsaved result, if any */
*************** advance_and_error:
*** 608,615 ****
      /* Discard the failed message by pretending we read it */
      conn->inStart += 5 + msgLength;

- set_error_result:
-
      /*
       * Replace partially constructed result with an error result. First
       * discard the old result to try to win back some memory.
--- 586,591 ----
*************** set_error_result:
*** 617,624 ****
      pqClearAsyncResult(conn);

      /*
!      * If row processor didn't provide an error message, assume "out of
!      * memory" was meant.
       */
      if (!errmsg)
          errmsg = libpq_gettext("out of memory for query result");
--- 593,602 ----
      pqClearAsyncResult(conn);

      /*
!      * If preceding code didn't provide an error message, assume "out of
!      * memory" was meant.  The advantage of having this special case is that
!      * freeing the old result first greatly improves the odds that gettext()
!      * will succeed in providing a translation.
       */
      if (!errmsg)
          errmsg = libpq_gettext("out of memory for query result");
*************** failure:
*** 695,704 ****
   * Returns: 0 if processed message successfully, EOF to suspend parsing
   * (the latter case is not actually used currently).
   * In either case, conn->inStart has been advanced past the message.
-  *
-  * Note: the row processor could also choose to longjmp out of libpq,
-  * in which case the library's state must allow for resumption at the
-  * next message.
   */
  static int
  getAnotherTuple(PGconn *conn, int msgLength)
--- 673,678 ----
*************** getAnotherTuple(PGconn *conn, int msgLen
*** 778,808 ****
          goto advance_and_error;
      }

!     /*
!      * Advance inStart to show that the "D" message has been processed.  We
!      * must do this before calling the row processor, in case it longjmps.
!      */
      conn->inStart = conn->inCursor;

!     /* Pass the completed row values to rowProcessor */
      errmsg = NULL;
!     switch ((*conn->rowProcessor) (result, rowbuf, &errmsg,
!                                    conn->rowProcessorParam))
!     {
!         case 1:
!             /* everything is good */
!             return 0;
!
!         case -1:
!             /* error, report the errmsg below */
!             break;

!         default:
!             /* unrecognized return code */
!             errmsg = libpq_gettext("unrecognized return value from row processor");
!             break;
!     }
!     goto set_error_result;

  advance_and_error:
      /* Discard the failed message by pretending we read it */
--- 752,766 ----
          goto advance_and_error;
      }

!     /* Advance inStart to show that the "D" message has been processed. */
      conn->inStart = conn->inCursor;

!     /* Process the collected row */
      errmsg = NULL;
!     if (pqRowProcessor(conn, &errmsg))
!         return 0;                /* normal, successful exit */

!     goto set_error_result;        /* pqRowProcessor failed, report it */

  advance_and_error:
      /* Discard the failed message by pretending we read it */
*************** set_error_result:
*** 817,823 ****
      pqClearAsyncResult(conn);

      /*
!      * If row processor didn't provide an error message, assume "out of
       * memory" was meant.  The advantage of having this special case is that
       * freeing the old result first greatly improves the odds that gettext()
       * will succeed in providing a translation.
--- 775,781 ----
      pqClearAsyncResult(conn);

      /*
!      * If preceding code didn't provide an error message, assume "out of
       * memory" was meant.  The advantage of having this special case is that
       * freeing the old result first greatly improves the odds that gettext()
       * will succeed in providing a translation.
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index 67db6119bbaa35ae31fb58bb902a455b093ea23f..9d05dd20605a84ab4c4ec9d61ef8697fb2f3b77e 100644
*** a/src/interfaces/libpq/libpq-fe.h
--- b/src/interfaces/libpq/libpq-fe.h
*************** typedef enum
*** 90,96 ****
                                   * backend */
      PGRES_NONFATAL_ERROR,        /* notice or warning message */
      PGRES_FATAL_ERROR,            /* query failed */
!     PGRES_COPY_BOTH                /* Copy In/Out data transfer in progress */
  } ExecStatusType;

  typedef enum
--- 90,97 ----
                                   * backend */
      PGRES_NONFATAL_ERROR,        /* notice or warning message */
      PGRES_FATAL_ERROR,            /* query failed */
!     PGRES_COPY_BOTH,            /* Copy In/Out data transfer in progress */
!     PGRES_SINGLE_TUPLE            /* single tuple from larger resultset */
  } ExecStatusType;

  typedef enum
*************** typedef struct pg_conn PGconn;
*** 129,145 ****
   */
  typedef struct pg_result PGresult;

- /* PGdataValue represents a data field value being passed to a row processor.
-  * It could be either text or binary data; text data is not zero-terminated.
-  * A SQL NULL is represented by len < 0; then value is still valid but there
-  * are no data bytes there.
-  */
- typedef struct pgDataValue
- {
-     int            len;            /* data length in bytes, or <0 if NULL */
-     const char *value;            /* data value, without zero-termination */
- } PGdataValue;
-
  /* PGcancel encapsulates the information needed to cancel a running
   * query on an existing connection.
   * The contents of this struct are not supposed to be known to applications.
--- 130,135 ----
*************** typedef struct pgNotify
*** 161,170 ****
      struct pgNotify *next;        /* list link */
  } PGnotify;

- /* Function type for row-processor callback */
- typedef int (*PQrowProcessor) (PGresult *res, const PGdataValue *columns,
-                                            const char **errmsgp, void *param);
-
  /* Function types for notice-handling callbacks */
  typedef void (*PQnoticeReceiver) (void *arg, const PGresult *res);
  typedef void (*PQnoticeProcessor) (void *arg, const char *message);
--- 151,156 ----
*************** extern int PQsendQueryPrepared(PGconn *c
*** 403,419 ****
                      const int *paramLengths,
                      const int *paramFormats,
                      int resultFormat);
  extern PGresult *PQgetResult(PGconn *conn);
- extern PGresult *PQskipResult(PGconn *conn);

  /* Routines for managing an asynchronous query */
  extern int    PQisBusy(PGconn *conn);
  extern int    PQconsumeInput(PGconn *conn);

- /* Override default per-row processing */
- extern void PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param);
- extern PQrowProcessor PQgetRowProcessor(const PGconn *conn, void **param);
-
  /* LISTEN/NOTIFY support */
  extern PGnotify *PQnotifies(PGconn *conn);

--- 389,401 ----
                      const int *paramLengths,
                      const int *paramFormats,
                      int resultFormat);
+ extern int    PQsetSingleRowMode(PGconn *conn);
  extern PGresult *PQgetResult(PGconn *conn);

  /* Routines for managing an asynchronous query */
  extern int    PQisBusy(PGconn *conn);
  extern int    PQconsumeInput(PGconn *conn);

  /* LISTEN/NOTIFY support */
  extern PGnotify *PQnotifies(PGconn *conn);

diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 4bc89269fababe5e4d8ecbf6e80ca1a8625d4bd5..2bac59c3d879ecabce42ceab3b5133df03a0886a 100644
*** a/src/interfaces/libpq/libpq-int.h
--- b/src/interfaces/libpq/libpq-int.h
*************** typedef struct pgLobjfuncs
*** 277,282 ****
--- 277,293 ----
      Oid            fn_lo_write;    /* OID of backend function LOwrite        */
  } PGlobjfuncs;

+ /* PGdataValue represents a data field value being passed to a row processor.
+  * It could be either text or binary data; text data is not zero-terminated.
+  * A SQL NULL is represented by len < 0; then value is still valid but there
+  * are no data bytes there.
+  */
+ typedef struct pgDataValue
+ {
+     int            len;            /* data length in bytes, or <0 if NULL */
+     const char *value;            /* data value, without zero-termination */
+ } PGdataValue;
+
  /*
   * PGconn stores all the state data associated with a single connection
   * to a backend.
*************** struct pg_conn
*** 324,333 ****
      /* Optional file to write trace info to */
      FILE       *Pfdebug;

-     /* Callback procedure for per-row processing */
-     PQrowProcessor rowProcessor;    /* function pointer */
-     void       *rowProcessorParam;        /* passthrough argument */
-
      /* Callback procedures for notice message processing */
      PGNoticeHooks noticeHooks;

--- 335,340 ----
*************** struct pg_conn
*** 346,351 ****
--- 353,359 ----
      bool        options_valid;    /* true if OK to attempt connection */
      bool        nonblocking;    /* whether this connection is using nonblock
                                   * sending semantics */
+     bool        singleRowMode;    /* return current query result row-by-row? */
      char        copy_is_binary; /* 1 = copy binary, 0 = copy text */
      int            copy_already_done;        /* # bytes already returned in COPY
                                           * OUT */
*************** struct pg_conn
*** 406,411 ****
--- 414,420 ----

      /* Status for asynchronous result construction */
      PGresult   *result;            /* result being constructed */
+     PGresult   *next_result;    /* next result (used in single-row mode) */

      /* Assorted state for SSL, GSS, etc */

*************** extern void pqSaveMessageField(PGresult
*** 517,522 ****
--- 526,532 ----
                     const char *value);
  extern void pqSaveParameterStatus(PGconn *conn, const char *name,
                        const char *value);
+ extern int    pqRowProcessor(PGconn *conn, const char **errmsgp);
  extern void pqHandleSendFailure(PGconn *conn);

  /* === in fe-protocol2.c === */

Re: [patch] libpq one-row-at-a-time API

From
Marko Kreen
Date:
On Thu, Aug 2, 2012 at 8:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marko Kreen <markokr@gmail.com> writes:
>> On Wed, Aug 1, 2012 at 6:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> So I'm working from the first set of patches in your message
>>> <20120721194907.GA28021@gmail.com>.
>
>> Great!
>
> Here's an updated draft patch.  I've not looked at the docs yet so
> this doesn't include that, but I'm reasonably happy with the code
> changes now.  The main difference from what you had is that I pushed
> the creation of the single-row PGresults into pqRowProcessor, so that
> they're formed immediately while scanning the input message.  That
> other method with postponing examination of the rowBuf does not work,
> any more than it did with the original patch, because you can't assume
> that the network input buffer is going to hold still.  For example,
> calling PQconsumeInput after parseInput has absorbed a row-data message
> but before calling PQgetResult would likely break things.
>
> In principle I suppose we could hack PQconsumeInput enough that it
> didn't damage the row buffer while still meeting its API contract of
> clearing the read-ready condition on the socket; but it wouldn't be
> simple or cheap to do so.

Any code using single-row-mode is new.  Any code calling consumeInput
before fully draining rows from buffer is buggy, as it allows unlimited growth
of network buffer, which breaks whole reason to use single-row mode.

It was indeed bug in previous patch that it did not handle this situation,
but IMHO it should be handled with error and not with allowing such code
to work.

Previously, whatever sequence the fetch functions were called, the apps
max memory usage was either 1x resultset size, or max 2x resultset size,
if it messed the sequence somehow.  But no more.  So it app knew
that resultset was big, it needed to split it up.

Now with single-row-mode, the apps can assume their max memory usage
is 1*row size + network buffer, lets simplify that to 2x row size.
But no more.  And now they can process huge resultsets assuming
their memory usage will be no more than 2x row size.

And now libpq allows such apps to go from 2x row size to full resultset
size with unfortunate coding.  Why?


I have opinions about reorg details too, but that's mostly matter of taste,
so rather unimportant compared to question whether we should allow
code to break the guarantees the single-row-mode gives.

-- 
marko


Re: [patch] libpq one-row-at-a-time API

From
Tom Lane
Date:
Marko Kreen <markokr@gmail.com> writes:
> On Thu, Aug 2, 2012 at 8:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> In principle I suppose we could hack PQconsumeInput enough that it
>> didn't damage the row buffer while still meeting its API contract of
>> clearing the read-ready condition on the socket; but it wouldn't be
>> simple or cheap to do so.

> Any code using single-row-mode is new.  Any code calling consumeInput
> before fully draining rows from buffer is buggy, as it allows unlimited growth
> of network buffer, which breaks whole reason to use single-row mode.

Sorry, that argument will not fly.  The API specification for
PQconsumeInput is that it can be called at any time to drain the kernel
input buffer, without changing the logical state of the PGconn.  It's
not acceptable to insert a parenthetical "oh, but you'd better not try
it in single-row mode" caveat.

The reason I find this of importance is that PQconsumeInput is meant to
be used in an application's main event loop for the sole purpose of
clearing the read-ready condition on the connection's socket, so that
it can wait for other conditions.  This could very well be decoupled
from the logic that is involved in testing PQisBusy and
fetching/processing actual results.  (If we had not intended to allow
those activities to be decoupled, we'd not have separated PQconsumeInput
and PQisBusy in the first place.)  Introducing a dependency between
these things could lead to non-reproducible, timing-dependent, very
hard-to-find bugs.

By the same token, if somebody were trying to put single-row-mode
support into an existing async application, they'd be fooling with the
parts that issue new queries and collect results, but probably not with
the main event loop.  Thus, I do not believe your argument above that
"any code using single-row mode is new".  The whole app is not going to
be new.  It could be very hard to guarantee that there is no path of
control that invokes PQconsumeInput before the new data is actually
processed, because there was never before a reason to avoid extra
PQconsumeInput calls.

As I said, this is the exact same objection I had to the original scheme
of exposing the row buffer directly.  Putting a libpq function in front
of the row buffer doesn't solve the problem if that function is
implemented in a way that's still vulnerable to misuse or race conditions.

> And now libpq allows such apps to go from 2x row size to full resultset
> size with unfortunate coding.  Why?

This argument is completely nonsensical.  The contract for
PQconsumeInput has always been that it consumes whatever the kernel has
available.  If you don't extract data from the library before calling it
again, the library's internal buffer may grow to more than the minimum
workable size, but that's a tradeoff you may be willing to make to
simplify your application logic.  I do not think that it's an
improvement to change the API spec to say either that you can't call
PQconsumeInput in certain states, or that you can but it won't absorb
data from the kernel.
        regards, tom lane


Re: [patch] libpq one-row-at-a-time API

From
Marko Kreen
Date:
On Fri, Aug 3, 2012 at 12:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marko Kreen <markokr@gmail.com> writes:
>> On Thu, Aug 2, 2012 at 8:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> In principle I suppose we could hack PQconsumeInput enough that it
>>> didn't damage the row buffer while still meeting its API contract of
>>> clearing the read-ready condition on the socket; but it wouldn't be
>>> simple or cheap to do so.
>
>> Any code using single-row-mode is new.  Any code calling consumeInput
>> before fully draining rows from buffer is buggy, as it allows unlimited growth
>> of network buffer, which breaks whole reason to use single-row mode.
>
> Sorry, that argument will not fly.  The API specification for
> PQconsumeInput is that it can be called at any time to drain the kernel
> input buffer, without changing the logical state of the PGconn.  It's
> not acceptable to insert a parenthetical "oh, but you'd better not try
> it in single-row mode" caveat.

Well, if the old API contract must be kept, then so be it.

> The reason I find this of importance is that PQconsumeInput is meant to
> be used in an application's main event loop for the sole purpose of
> clearing the read-ready condition on the connection's socket, so that
> it can wait for other conditions.  This could very well be decoupled
> from the logic that is involved in testing PQisBusy and
> fetching/processing actual results.  (If we had not intended to allow
> those activities to be decoupled, we'd not have separated PQconsumeInput
> and PQisBusy in the first place.)  Introducing a dependency between
> these things could lead to non-reproducible, timing-dependent, very
> hard-to-find bugs.
>
> By the same token, if somebody were trying to put single-row-mode
> support into an existing async application, they'd be fooling with the
> parts that issue new queries and collect results, but probably not with
> the main event loop.  Thus, I do not believe your argument above that
> "any code using single-row mode is new".  The whole app is not going to
> be new.  It could be very hard to guarantee that there is no path of
> control that invokes PQconsumeInput before the new data is actually
> processed, because there was never before a reason to avoid extra
> PQconsumeInput calls.

I've thought about this.  On first glance indeed, if async app
has both reader and writer registered in event loop, it might be
simpler to keep reading from source socket, event if writer stalls.

But this is exactly the situation where error from consumeInput would help.
Imagine fast source and slow target (common scenario - a database query
for each row).  Reading from source *must* be stopped to get
predictable memory usage,.

> As I said, this is the exact same objection I had to the original scheme
> of exposing the row buffer directly.  Putting a libpq function in front
> of the row buffer doesn't solve the problem if that function is
> implemented in a way that's still vulnerable to misuse or race conditions.
>
>> And now libpq allows such apps to go from 2x row size to full resultset
>> size with unfortunate coding.  Why?
>
> This argument is completely nonsensical.  The contract for
> PQconsumeInput has always been that it consumes whatever the kernel has
> available.  If you don't extract data from the library before calling it
> again, the library's internal buffer may grow to more than the minimum
> workable size, but that's a tradeoff you may be willing to make to
> simplify your application logic.  I do not think that it's an
> improvement to change the API spec to say either that you can't call
> PQconsumeInput in certain states, or that you can but it won't absorb
> data from the kernel.

Old patch had a nice property that we could replace PQgetResult()
with something else.  To allow that and also PQconsumeInput(),
we could store offsets in rowBuf, and then skip realign in PQconsumeInput().

Not sure if the replacement reason is worth keeping, but the
offsets may be useful even now as they might give additional
speedup as they decrease the per-column storage
from 16 to 8 bytes on 64-bit architectures.

May be better left for 9.3 though...

-- 
marko