Thread: [patch] libpq one-row-at-a-time API
The row-processor API is now in 9.2, but it solves only the "different-row-storage" problem, but not the "one-row-at-a-time" problem, as libpq is still in control until all rows are received. This means libpq cannet still be used to implement iterative result processing that almost all high-level languages are using. We discussed potential API for fetching on single row at a time, but did not reach conclusion. Basic arguments were: 1) Tom: PQisBusy must keep current behaviour. Thus also PQgetResult() must keep current behaviour: * PQisBusy() -> 0: need to call PQgetResult(), which returns PGresult * PQisBusy() -> 1: need to call PQconsumeInput() * PQisBusy() must be callable several times in a row, thus be stateless from clients POV. 2) Me: behaviour must not be controlled by callback, but client code that uses PQgetResult() + PQisBusy(). Now, looking at the problem with some perspective, the solution is obvious: when in single-row mode, the PQgetResult() must return proper PGresult for that single row. And everything else follows that. Such API is implemented in attached patch: * PQsetSingleRowMode(conn): set's single-row mode. * PQisBusy(): stops after each row in single-row mode, sets PGASYNC_ROW_READY. Thus keeping the property of being repeatedly callable. * PQgetResult(): returns copy of the row if PGASYNC_ROW_READY. Sets row resultStatus to PGRES_SINGLE_TUPLE. This needs to be different from PGRES_TUPLES_OK to detect resultset end. * PQgetRowData(): can be called instead PQgetResult() to get raw row data in buffer, for more efficient processing. This is optional feature that provides the original row-callback promise of avoiding unnecessary row data copy. * Although PQgetRowData() makes callback API unnecessary, it is still fully compatible with it - the callback should not see any difference whether the resultset is processed in single-row mode or old single-PGresult mode. Unless it wants to - it can check PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE. There is some duplicate code here that can be refactored (callback exec), but I did not do it yet to avoid affecting existing code too much. -- marko PS. If a squint it seems like fix of exising API instead of new feature, so perhaps it can still fit into 9.2?
Attachment
On Fri, Jun 15, 2012 at 1:21 PM, Marko Kreen <markokr@gmail.com> wrote: > The row-processor API is now in 9.2, but it solves only the > "different-row-storage" problem, but not the "one-row-at-a-time" > problem, as libpq is still in control until all rows are received. > > This means libpq cannet still be used to implement iterative > result processing that almost all high-level languages are using. > > We discussed potential API for fetching on single row at a time, > but did not reach conclusion. Basic arguments were: > > 1) Tom: PQisBusy must keep current behaviour. Thus also PQgetResult() > must keep current behaviour: > * PQisBusy() -> 0: need to call PQgetResult(), which returns PGresult > * PQisBusy() -> 1: need to call PQconsumeInput() > * PQisBusy() must be callable several times in a row, thus be > stateless from clients POV. > > 2) Me: behaviour must not be controlled by callback, but client code > that uses PQgetResult() + PQisBusy(). > > Now, looking at the problem with some perspective, the solution > is obvious: when in single-row mode, the PQgetResult() must return > proper PGresult for that single row. And everything else follows that. > > Such API is implemented in attached patch: > > * PQsetSingleRowMode(conn): set's single-row mode. > > * PQisBusy(): stops after each row in single-row mode, sets PGASYNC_ROW_READY. > Thus keeping the property of being repeatedly callable. > > * PQgetResult(): returns copy of the row if PGASYNC_ROW_READY. Sets row > resultStatus to PGRES_SINGLE_TUPLE. This needs to be different from > PGRES_TUPLES_OK to detect resultset end. > > * PQgetRowData(): can be called instead PQgetResult() to get raw row data > in buffer, for more efficient processing. This is optional feature > that provides the original row-callback promise of avoiding unnecessary > row data copy. > > * Although PQgetRowData() makes callback API unnecessary, it is still > fully compatible with it - the callback should not see any difference > whether the resultset is processed in single-row mode or > old single-PGresult mode. Unless it wants to - it can check > PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE. > > There is some duplicate code here that can be refactored (callback exec), > but I did not do it yet to avoid affecting existing code too much. > > -- > marko > > PS. If a squint it seems like fix of exising API instead of new feature, > so perhaps it can still fit into 9.2? +1 on rushing in row processing for 9.2, but only if the API feels right (i'll spend some time to review). I found the lack of iterative row processing to be really unfortunate. merlin
Demos: https://github.com/markokr/libpq-rowproc-demos/blob/master/demo-onerow-sync.c https://github.com/markokr/libpq-rowproc-demos/blob/master/demo-onerow-async.c Few clarifications below. On Fri, Jun 15, 2012 at 9:21 PM, Marko Kreen <markokr@gmail.com> wrote: > Now, looking at the problem with some perspective, the solution > is obvious: when in single-row mode, the PQgetResult() must return > proper PGresult for that single row. And everything else follows that. > > Such API is implemented in attached patch: > > * PQsetSingleRowMode(conn): set's single-row mode. The function can be called only after PQsend* and before any rows have arrived. This guarantees there will be no surprises to PQexec* users who expect full resultset at once. Also it guarantees that user will process resultset with PQgetResult() loop, either sync or async. Next PQexec/PQsend call will reset the flag. So it is active only for duration of processing results from one command. Currently it returns FALSE if called in wrong place and does nothing. Only question I see here is whether it should set error state on connection or not. It does not seem to be improvement. > * PQgetRowData(): can be called instead PQgetResult() to get raw row data > in buffer, for more efficient processing. This is optional feature > that provides the original row-callback promise of avoiding unnecessary > row data copy. > > * Although PQgetRowData() makes callback API unnecessary, it is still > fully compatible with it - the callback should not see any difference > whether the resultset is processed in single-row mode or > old single-PGresult mode. Unless it wants to - it can check > PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE. The PQgetResult() is compatible with callbacks, the PQgetRowData() bypasses them. -- marko
Marko Kreen <markokr@gmail.com> writes: >> Now, looking at the problem with some perspective, the solution >> is obvious: when in single-row mode, the PQgetResult() must return >> proper PGresult for that single row. And everything else follows that. >> >> * PQgetRowData(): can be called instead PQgetResult() to get raw row data >> in buffer, for more efficient processing. This is optional feature >> that provides the original row-callback promise of avoiding unnecessary >> row data copy. >> >> * Although PQgetRowData() makes callback API unnecessary, it is still >> fully compatible with it - the callback should not see any difference >> whether the resultset is processed in single-row mode or >> old single-PGresult mode. Unless it wants to - it can check >> PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE. I guess this raises the question of whether we ought to revert the row-callback patch entirely and support only this approach. IMO it is (barely) not too late to do that for 9.2, if we want to. If we don't want to, then this is just another new feature and should be considered for 9.3. What I like about this is the greatly simpler and harder-to-misuse API. The only arguable drawback is that there's still at least one malloc/free cycle per tuple, imposed by the creation of a PGresult for each one, whereas the callback approach avoids that. But worrying about that could be considered to be vast overoptimization; the backend has certainly spent a lot more overhead than that generating the tuple. regards, tom lane
On Sat, Jun 16, 2012 at 6:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I guess this raises the question of whether we ought to revert the > row-callback patch entirely and support only this approach. IMO > it is (barely) not too late to do that for 9.2, if we want to. > If we don't want to, then this is just another new feature and > should be considered for 9.3. I think row-callback is dangerous API that does not solve any important problems. But I do like the 2-phase processing the rowproc patch introduced and having a way to bypass unnecessary malloc()+copy. So my preference would be to simply remove the callback API but keep the processing and provide PQgetRowData() instead. Although the win that it brings is significantly smaller thanks to single-row PQgetResult(). So if it does not sound interesting to others, it can be dropped. Because the single-row processing is the important feature we need, rest is extra. -- marko
On 16 June 2012 23:09, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Marko Kreen <markokr@gmail.com> writes: >>> Now, looking at the problem with some perspective, the solution >>> is obvious: when in single-row mode, the PQgetResult() must return >>> proper PGresult for that single row. And everything else follows that. >>> >>> * PQgetRowData(): can be called instead PQgetResult() to get raw row data >>> in buffer, for more efficient processing. This is optional feature >>> that provides the original row-callback promise of avoiding unnecessary >>> row data copy. >>> >>> * Although PQgetRowData() makes callback API unnecessary, it is still >>> fully compatible with it - the callback should not see any difference >>> whether the resultset is processed in single-row mode or >>> old single-PGresult mode. Unless it wants to - it can check >>> PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE. > > I guess this raises the question of whether we ought to revert the > row-callback patch entirely and support only this approach. IMO > it is (barely) not too late to do that for 9.2, if we want to. > If we don't want to, then this is just another new feature and > should be considered for 9.3. > > What I like about this is the greatly simpler and harder-to-misuse > API. The only arguable drawback is that there's still at least one > malloc/free cycle per tuple, imposed by the creation of a PGresult > for each one, whereas the callback approach avoids that. But worrying > about that could be considered to be vast overoptimization; the backend > has certainly spent a lot more overhead than that generating the tuple. I prefer the description of Marko's API than the one we have now. Adopting one API in 9.2 and another in 9.3 would be fairly bad. Perhaps we can have both? Can we see a performance test? "Add a row processor API to libpq for better handling of large result sets". So idea is we do this many, many times so we need to double check the extra overhead is not a problem in cases where the dumping overhead is significant. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Jun 16, 2012 at 7:58 PM, Marko Kreen <markokr@gmail.com> wrote: > So my preference would be to simply remove the callback API > but keep the processing and provide PQgetRowData() instead. This is implemented in attached patch. It also converts dblink to use single-row API. The patch should be applied on top of previous single-row patch. Both can be seen also here: https://github.com/markokr/postgres/commits/single-row -- marko
Attachment
On Sun, Jun 17, 2012 at 2:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > I prefer the description of Marko's API than the one we have now. > > Adopting one API in 9.2 and another in 9.3 would be fairly bad. > Perhaps we can have both? I see no reason the keep the (public) callback API around, except if we don't bother to remove it now. > Can we see a performance test? "Add a row processor API to libpq for > better handling of large result sets". So idea is we do this many, > many times so we need to double check the extra overhead is not a > problem in cases where the dumping overhead is significant. Not sure what do to want to performance test. PQgetRowData() uses exactly the same pipeline that callbacks used. It will use few more C calls, not sure it make sense to benchmark them. Recent dblink change did change palloc() + copy zero-termination dance to PQgetResult(), which does malloc() + copy dance internally. This malloc vs. palloc might be benchmarkable, but it seems to go into micro-benchmarking world as the big win came from avoiding buffering rows. So yeah, maybe using PQgetRowData() might be tiny bit faster, but I don't expect much difference. But all this affects new users only. The thing that affects everybody was the 2-step row processing change that was done during rowproc patch. I did benchmark it, and it seems there are column-size + column-count patterns where new way is faster, and some patterns where old way is faster. But the difference did not raise above test noise so I concluded it is insignificant and the malloc+copy avoidance is worth it. Ofcourse, additional any benchmarking is welcome, so feel free to pick any situation you care about. -- marko
On 17 June 2012 19:37, Marko Kreen <markokr@gmail.com> wrote: > On Sun, Jun 17, 2012 at 2:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> I prefer the description of Marko's API than the one we have now. >> >> Adopting one API in 9.2 and another in 9.3 would be fairly bad. >> Perhaps we can have both? > > I see no reason the keep the (public) callback API around, > except if we don't bother to remove it now. OK by me. >> Can we see a performance test? "Add a row processor API to libpq for >> better handling of large result sets". So idea is we do this many, >> many times so we need to double check the extra overhead is not a >> problem in cases where the dumping overhead is significant. ... > I did benchmark it, and it seems there are column-size > + column-count patterns where new way is faster, > and some patterns where old way is faster. But the > difference did not raise above test noise so I concluded > it is insignificant and the malloc+copy avoidance is worth it. As long as we've checked that's fine. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Marko Kreen <markokr@gmail.com> writes: > Now, looking at the problem with some perspective, the solution > is obvious: when in single-row mode, the PQgetResult() must return > proper PGresult for that single row. And everything else follows that. > Such API is implemented in attached patch: I'm starting to look at this patch now. I think we could drop the PQgetRowData() API: it complicates matters for little gain that I can see. The argument for it was to avoid the cost of creating a PGresult per row, but we're already going to pay the cost of creating a PGresult in order to return the PGRES_SINGLE_TUPLE status. And as was pointed out upthread, any per-tuple malloc costs are going to be in the noise compared to the server-side effort expended to create the tuple, anyway. The point of this feature is to avoid accumulating the entire resultset in memory, not to micro-optimize linear-time costs. Moreover, if the argument for changing 9.2 at this late date is to get rid of a fragile, breakable API, surely an API that's designed around returning pointers into the library's network buffer ought to be a prime target. And lastly, since the proposed patch for dblink doesn't use PQgetRowData, there's not even much reason to think that it's bug-free. regards, tom lane
On Mon, Jul 16, 2012 at 4:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Marko Kreen <markokr@gmail.com> writes: >> Now, looking at the problem with some perspective, the solution >> is obvious: when in single-row mode, the PQgetResult() must return >> proper PGresult for that single row. And everything else follows that. > >> Such API is implemented in attached patch: > > I'm starting to look at this patch now. I think we could drop the > PQgetRowData() API: it complicates matters for little gain that I can > see. The argument for it was to avoid the cost of creating a PGresult > per row, but we're already going to pay the cost of creating a > PGresult in order to return the PGRES_SINGLE_TUPLE status. No. Please look again, it is supposed to be called instead of PGgetResult(). -- marko
Marko Kreen <markokr@gmail.com> writes: > On Mon, Jul 16, 2012 at 4:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm starting to look at this patch now. I think we could drop the >> PQgetRowData() API: it complicates matters for little gain that I can >> see. The argument for it was to avoid the cost of creating a PGresult >> per row, but we're already going to pay the cost of creating a >> PGresult in order to return the PGRES_SINGLE_TUPLE status. > No. Please look again, it is supposed to be called instead of PGgetResult(). Mm. I still think we should drop it, because it's still a dangerous API that's not necessary for the principal benefit of this feature. regards, tom lane
On Mon, Jul 16, 2012 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Marko Kreen <markokr@gmail.com> writes: >> On Mon, Jul 16, 2012 at 4:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I'm starting to look at this patch now. I think we could drop the >>> PQgetRowData() API: it complicates matters for little gain that I can >>> see. The argument for it was to avoid the cost of creating a PGresult >>> per row, but we're already going to pay the cost of creating a >>> PGresult in order to return the PGRES_SINGLE_TUPLE status. > >> No. Please look again, it is supposed to be called instead of PGgetResult(). > > Mm. I still think we should drop it, because it's still a dangerous API > that's not necessary for the principal benefit of this feature. Yes, it is a secondary feature, but it fits the needs of the actual target audience of the single-row feature - various high-level wrappers of libpq. Also it is needed for high-performance situations, where the single-row-mode fits well even for C clients, except the advantage is negated by new malloc-per-row overhead. -- marko
Marko Kreen <markokr@gmail.com> writes: > On Mon, Jul 16, 2012 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Mm. I still think we should drop it, because it's still a dangerous API >> that's not necessary for the principal benefit of this feature. > Yes, it is a secondary feature, but it fits the needs of the actual target > audience of the single-row feature - various high-level wrappers of libpq. > Also it is needed for high-performance situations, where the > single-row-mode fits well even for C clients, except the > advantage is negated by new malloc-per-row overhead. Absolutely no evidence has been presented that there's any useful performance gain to be had there. Moreover, if there were, we could probably work a bit harder at making PGresult creation cheaper, rather than having to expose a dangerous API. regards, tom lane
On Mon, Jul 16, 2012 at 6:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Marko Kreen <markokr@gmail.com> writes: >> On Mon, Jul 16, 2012 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Mm. I still think we should drop it, because it's still a dangerous API >>> that's not necessary for the principal benefit of this feature. > >> Yes, it is a secondary feature, but it fits the needs of the actual target >> audience of the single-row feature - various high-level wrappers of libpq. > >> Also it is needed for high-performance situations, where the >> single-row-mode fits well even for C clients, except the >> advantage is negated by new malloc-per-row overhead. > > Absolutely no evidence has been presented that there's any useful > performance gain to be had there. Moreover, if there were, we could > probably work a bit harder at making PGresult creation cheaper, rather > than having to expose a dangerous API. Ok, I'm more interested in primary feature, so no more objections from me. -- marko
Here is 2 approaches how to get to state where only PQsetSingleRowMode() is available. Both on top of REL9_2_STABLE branch. a) Remove callback hooks, keep rowBuf, implement single-row-mode on top of that. This was posted before, I just additionally removed the PQgetRowData() function. git pull git://github.com/markokr/postgres.git single-row-mode1 https://github.com/markokr/postgres/commits/single-row-mode1 Commits: libpq: Single-row based processing libpq, dblink: Remove row processor API Advantage: makes easier to play with PQgetRowData() or potatial faster PGresult creation methods. Smaller change compared to libpq from 9.2beta than b). b) Revert row-callback changes completely, implement single-row-mode on top of virgin libpq. Only problem here was keeping fixes implemented as part of row-callback patch. Single-row-mode itself is quite simple. git pull git://github.com/markokr/postgres.git single-row-mode1 https://github.com/markokr/postgres/commits/single-row-mode1 Feature patch: https://github.com/markokr/postgres/commit/b5e822125c655f189875401c61317242705143b9 Commits: dblink: revert conversion to row processor API patch libpq: revert row processor API patch libpq: random fixes libpq: single-row mode dblink: use single-row-mode Advantage: smaller change compared to libpq from 9.1 than a). As the patch has suffered a lot from trying to provide both macro- and micro-optimization (on-the-fly row processing vs. more efficient row processing), maybe b) is safer choice for 9.2? In case somebody wants to look at the patches without git (or web), I attaches them as tgz too. -- marko
Attachment
Marko Kreen <markokr@gmail.com> writes: > Here is 2 approaches how to get to state where only PQsetSingleRowMode() > is available. Both on top of REL9_2_STABLE branch. > a) Remove callback hooks, keep rowBuf, implement single-row-mode on > top of that. This was posted before, I just additionally removed > the PQgetRowData() function. > b) Revert row-callback changes completely, implement single-row-mode on > top of virgin libpq. Only problem here was keeping fixes implemented > as part of row-callback patch. Single-row-mode itself is quite simple. Yeah, I was wondering if we shouldn't revisit the whole patch given that where we're ending up looks little like where we thought we were going. I hadn't had time to pursue the idea, but I'm glad you did. Will look at these. One reason to stick with approach (a) is that it gives us the option to easily add PQgetRowData(), in case future testing shows that that's actually worth the risk and usage restrictions. While I'm a bit dubious of that, I'm prepared to be proven wrong. regards, tom lane
Here is a simple test program that takes a SELECT query, reads it and outputs a COPY-formatted stream to standard output, to simulate some activity. It operates on 3 modes, specified by commant-line switches: -f Load full resultset at once - old way. -s Single-Row mode using PQgetResult(). -z Single-Row mode using PQgetRowData(). It is compiled with 2 different libpqs that correspond to single-row-modeX branches in my github repo: rowdump1 - libpq with rowBuf + PQgetRowData(). rowBuf is required for PQgetRowData. [ https://github.com/markokr/postgres/tree/single-row-mode1 ] rowdump2 - Plain libpq patched for single-row mode. No PQgetRowData() here. [ https://github.com/markokr/postgres/tree/single-row-mode2 ] Notes: * Hardest part is picking realistic queries that matter. It's possible to construct artificial queries that make results go either way. * It does not make sense for compare -f with others. But it does make sense to compare -f from differently patched libpqs to detect any potential slowdowns. * The time measured is User Time of client process. ------------------------------------------------------- QUERY: select 10000,200,300000,rpad('x',30,'z') from generate_series(1,5000000) ./rowdump1 -f: 3.90 3.90 3.93 avg: 3.91 ./rowdump2 -f: 4.03 4.13 4.05 avg: 4.07 ./rowdump1 -s: 6.26 6.33 6.49 avg: 6.36 ./rowdump2 -s: 7.48 7.46 7.50 avg: 7.48 ./rowdump1 -z: 2.88 2.90 2.79 avg: 2.86 QUERY: select rpad('x',10,'z'),rpad('x',20,'z'),rpad('x',30,'z'),rpad('x',40,'z'),rpad('x',50,'z'),rpad('x',60,'z') from generate_series(1,3000000) ./rowdump1 -f: 6.29 6.36 6.14 avg: 6.26 ./rowdump2 -f: 6.79 6.69 6.72 avg: 6.73 ./rowdump1 -s: 7.71 7.72 7.80 avg: 7.74 ./rowdump2 -s: 8.14 8.16 8.57 avg: 8.29 ./rowdump1 -z: 6.45 5.15 5.16 avg: 5.59 QUERY: select 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100 from generate_series(1,800000) ./rowdump1 -f: 5.68 5.66 5.72 avg: 5.69 ./rowdump2 -f: 5.69 5.84 5.67 avg: 5.73 ./rowdump1 -s: 7.68 7.76 7.67 avg: 7.70 ./rowdump2 -s: 7.57 7.54 7.62 avg: 7.58 ./rowdump1 -z: 2.78 2.82 2.72 avg: 2.77 QUERY: select 1000,rpad('x', 400, 'z'),rpad('x', 4000, 'z') from generate_series(1,100000) ./rowdump1 -f: 2.71 2.66 2.58 avg: 2.65 ./rowdump2 -f: 3.11 3.14 3.16 avg: 3.14 ./rowdump1 -s: 2.64 2.61 2.64 avg: 2.63 ./rowdump2 -s: 3.15 3.11 3.11 avg: 3.12 ./rowdump1 -z: 2.53 2.51 2.46 avg: 2.50 ------------------------------------------------------- Test code attached. Please play with it. By this test, both rowBuf and PQgetRowData() look good. -- marko
Attachment
On Mon, Jul 23, 2012 at 2:05 PM, Marko Kreen <markokr@gmail.com> wrote: > > Here is a simple test program that takes a SELECT > query, reads it and outputs a COPY-formatted stream > to standard output, to simulate some activity. > > It operates on 3 modes, specified by commant-line switches: > > -f Load full resultset at once - old way. > -s Single-Row mode using PQgetResult(). > -z Single-Row mode using PQgetRowData(). > > It is compiled with 2 different libpqs that correspond to > single-row-modeX branches in my github repo: > > rowdump1 - libpq with rowBuf + PQgetRowData(). rowBuf is > required for PQgetRowData. > [ https://github.com/markokr/postgres/tree/single-row-mode1 ] > > rowdump2 - Plain libpq patched for single-row mode. > No PQgetRowData() here. > [ https://github.com/markokr/postgres/tree/single-row-mode2 ] > > Notes: > > * Hardest part is picking realistic queries that matter. > It's possible to construct artificial queries that make > results go either way. > > * It does not make sense for compare -f with others. But it > does make sense to compare -f from differently patched libpqs > to detect any potential slowdowns. > > * The time measured is User Time of client process. > > ------------------------------------------------------- > QUERY: select 10000,200,300000,rpad('x',30,'z') from > generate_series(1,5000000) > ./rowdump1 -f: 3.90 3.90 3.93 avg: 3.91 > ./rowdump2 -f: 4.03 4.13 4.05 avg: 4.07 > ./rowdump1 -s: 6.26 6.33 6.49 avg: 6.36 > ./rowdump2 -s: 7.48 7.46 7.50 avg: 7.48 > ./rowdump1 -z: 2.88 2.90 2.79 avg: 2.86 > QUERY: select > rpad('x',10,'z'),rpad('x',20,'z'),rpad('x',30,'z'),rpad('x',40,'z'),rpad('x',50,'z'),rpad('x',60,'z') > from generate_series(1,3000000) > ./rowdump1 -f: 6.29 6.36 6.14 avg: 6.26 > ./rowdump2 -f: 6.79 6.69 6.72 avg: 6.73 > ./rowdump1 -s: 7.71 7.72 7.80 avg: 7.74 > ./rowdump2 -s: 8.14 8.16 8.57 avg: 8.29 > ./rowdump1 -z: 6.45 5.15 5.16 avg: 5.59 > QUERY: select > 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100 > from generate_series(1,800000) > ./rowdump1 -f: 5.68 5.66 5.72 avg: 5.69 > ./rowdump2 -f: 5.69 5.84 5.67 avg: 5.73 > ./rowdump1 -s: 7.68 7.76 7.67 avg: 7.70 > ./rowdump2 -s: 7.57 7.54 7.62 avg: 7.58 > ./rowdump1 -z: 2.78 2.82 2.72 avg: 2.77 > QUERY: select 1000,rpad('x', 400, 'z'),rpad('x', 4000, 'z') from > generate_series(1,100000) > ./rowdump1 -f: 2.71 2.66 2.58 avg: 2.65 > ./rowdump2 -f: 3.11 3.14 3.16 avg: 3.14 > ./rowdump1 -s: 2.64 2.61 2.64 avg: 2.63 > ./rowdump2 -s: 3.15 3.11 3.11 avg: 3.12 > ./rowdump1 -z: 2.53 2.51 2.46 avg: 2.50 > ------------------------------------------------------- > > Test code attached. Please play with it. > > By this test, both rowBuf and PQgetRowData() look good. I agree on performance grounds. It's important for libpq to be fast. It seems odd (but maybe ok) that you have to set the single row mode on the connection only to have the server reset it whenever you call a send function: maybe rename to PQsetResultSingleRowMode? Does PQgetRowData() break the ability to call PQgetvalue() against the result as well as other functions like PQgetisnull()? If so, I object. merlin
On Tue, Jul 24, 2012 at 12:27 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Mon, Jul 23, 2012 at 2:05 PM, Marko Kreen <markokr@gmail.com> wrote: > It seems odd (but maybe ok) that you have to set the single row mode > on the connection only to have the server reset it whenever you call a > send function: maybe rename to PQsetResultSingleRowMode? Server does not reset it, it's purely client-side flag. It is reset by next PQsend*/PQexec* call. If there are several resultsets sent by server for one query, they all share the same setting. I don't think extra-long function names that "describe exactly" the function behavior are improvement over shorter but inexact names, as you need to read the docs anyway to know the real behavior. But its a matter of taste, so it can be renamed if people want it. Alternative is to create parallel PQsend* functions that set the flag. It gets rid of the possibility of any extra activity between PQsend and PQsetSingleRowMode(). But it seems messy to do that just for single-row-mode, instead it makes sense to have PQsend* that takes a flags argument to tune it's behavior. But that is worth doing only if we have more flags than just one. > Does PQgetRowData() break the ability to call PQgetvalue() against the > result as well as other functions like PQgetisnull()? If so, I > object. I don't get what are you objecting to. The PQgetRowData() is called instead of PQgetResult() and it returns zero-row PGresult for row header and list of PGdataValues that point to actual data. You call the value functions, they don't crash, but as the result has zero rows (the data is not copied to it) they don't do anything useful either. The whole point of the API is to avoid the unnecessary copying. You can mix the calls to PQgetRowData() and PQgetResult() during one resultset, it works fine although does not seem that useful. I guess you fear that some code can unexpectedly see new behavior, and that exactly why the row-callback API needs to be scrapped - it allowed such possibility. But with PQsetSingleRowMode and PQgetRowData, the new behavior is seen only by new code that clearly expects it, so I don't see what the problem is. -- marko
On Mon, Jul 23, 2012 at 5:07 PM, Marko Kreen <markokr@gmail.com> wrote: >> Does PQgetRowData() break the ability to call PQgetvalue() against the >> result as well as other functions like PQgetisnull()? If so, I >> object. > > I don't get what are you objecting to. The PQgetRowData() > is called instead of PQgetResult() and it returns zero-row PGresult > for row header and list of PGdataValues that point to actual > data. You call the value functions, they don't crash, but as > the result has zero rows (the data is not copied to it) > they don't do anything useful either. > > The whole point of the API is to avoid the unnecessary copying. > > You can mix the calls to PQgetRowData() and PQgetResult() > during one resultset, it works fine although does not seem > that useful. > > I guess you fear that some code can unexpectedly see > new behavior, and that exactly why the row-callback API > needs to be scrapped - it allowed such possibility. > > But with PQsetSingleRowMode and PQgetRowData, the > new behavior is seen only by new code that clearly > expects it, so I don't see what the problem is. Well, for one, it breaks libpqtypes (see line 171@ http://libpqtypes.esilo.com/browse_source.html?file=exec.c), or at least makes it unable to use the row-processing mode. libpqtypes wraps the libpq getter functions and exposes a richer api using only the result object. When writing libpqtypes we went through great pains to keep access to server side data through the existing result API. Note, I'm not arguing that compatibility with libpqtypes is a desired objective, but the wrapping model that it uses is pretty reasonably going to be used by other code -- the awkwardness there should be a red flag of sorts. I'm arguing that *all* data getting must continue to do so through the result object, and bypassing the result to get at data is breaking the result abstraction in the libpq api. I bet you can still maintain data access through result object while avoiding extra copies. For example, maybe PQsetSingleRowMode maybe should return a result object? merlin
Merlin Moncure <mmoncure@gmail.com> writes: > I'm arguing that *all* data getting must continue to do so through the > result object, and bypassing the result to get at data is breaking the > result abstraction in the libpq api. That's a fair point, but the single-row mode without PQgetRowData still fits that model, doesn't it? From the point of view of libpqtypes it just looks like you got a lot of one-row query results. regards, tom lane
On Tue, Jul 24, 2012 at 10:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> I'm arguing that *all* data getting must continue to do so through the >> result object, and bypassing the result to get at data is breaking the >> result abstraction in the libpq api. > > That's a fair point, but the single-row mode without PQgetRowData still > fits that model, doesn't it? From the point of view of libpqtypes it > just looks like you got a lot of one-row query results. Sure: Marko's exec_query_single_row example looks like 100% reasonable libpq code. That said, I'd still spend a few cycles to think this through and make sure we aren't walling ourselves off from 'copy free' behavior, even if that's reserved for a future improvement. In particular, I'd like to explore if PQsetSingleRowMode() should be returning a result. merlin
On Tue, Jul 24, 2012 at 6:18 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Mon, Jul 23, 2012 at 5:07 PM, Marko Kreen <markokr@gmail.com> wrote: >>> Does PQgetRowData() break the ability to call PQgetvalue() against the >>> result as well as other functions like PQgetisnull()? If so, I >>> object. >> >> I don't get what are you objecting to. The PQgetRowData() >> is called instead of PQgetResult() and it returns zero-row PGresult >> for row header and list of PGdataValues that point to actual >> data. You call the value functions, they don't crash, but as >> the result has zero rows (the data is not copied to it) >> they don't do anything useful either. >> >> The whole point of the API is to avoid the unnecessary copying. >> >> You can mix the calls to PQgetRowData() and PQgetResult() >> during one resultset, it works fine although does not seem >> that useful. >> >> I guess you fear that some code can unexpectedly see >> new behavior, and that exactly why the row-callback API >> needs to be scrapped - it allowed such possibility. >> >> But with PQsetSingleRowMode and PQgetRowData, the >> new behavior is seen only by new code that clearly >> expects it, so I don't see what the problem is. > > Well, for one, it breaks libpqtypes (see line 171@ > http://libpqtypes.esilo.com/browse_source.html?file=exec.c), or at > least makes it unable to use the row-processing mode. libpqtypes > wraps the libpq getter functions and exposes a richer api using only > the result object. When writing libpqtypes we went through great > pains to keep access to server side data through the existing result > API. Note, I'm not arguing that compatibility with libpqtypes is a > desired objective, but the wrapping model that it uses is pretty > reasonably going to be used by other code -- the awkwardness there > should be a red flag of sorts. > > I'm arguing that *all* data getting must continue to do so through the > result object, and bypassing the result to get at data is breaking the > result abstraction in the libpq api. I bet you can still maintain > data access through result object while avoiding extra copies. Well, the main problem this week is whether we should apply PQsetSingleRowMode() from single-row-mode1 or from single-row-mode2 branch? The PQgetRowData() is unimportant as it just exposes the rowBuf to user and thats all. Do you have opinion about that? > For example, maybe PQsetSingleRowMode maybe should return a result object? What do you mean by that? And have you though about both sync and async usage patterns? -- marko
On Tue, Jul 24, 2012 at 11:08 AM, Marko Kreen <markokr@gmail.com> wrote: >> I'm arguing that *all* data getting must continue to do so through the >> result object, and bypassing the result to get at data is breaking the >> result abstraction in the libpq api. I bet you can still maintain >> data access through result object while avoiding extra copies. > > Well, the main problem this week is whether we should > apply PQsetSingleRowMode() from single-row-mode1 > or from single-row-mode2 branch? > > The PQgetRowData() is unimportant as it just exposes > the rowBuf to user and thats all. right. branch 1 (containing PQgetRowData) seems wrong to me. so, if given that choice, I'd argue for branch 2, forcing a PGresult pull on each row. However, what you were gunning for via branch 1 which is extra performance via removing the extra allocs is important and useful; hopefully we can get the best of both worlds, or punt and settle on branch 2. >> For example, maybe PQsetSingleRowMode maybe should return a result object? > > What do you mean by that? And have you though about both > sync and async usage patterns? No, I haven't -- at least not very well. The basic idea is that PQsetSingleRowMode turns into PQgetSingleRowResult() and returns a result object. For row by row an extra API call gets called (maybe PQgetNextRow(PGconn, PGresult)) that does the behind the scenes work under the existing result object. This is the same general structure you have in branch 2, but the result allocation is move out of the loop. Presumably sync and async would then follow the same pattern, but we'd still have to be able to guarantee non-blocking behavior in the async api. merlin
On Tue, Jul 24, 2012 at 7:08 PM, Marko Kreen <markokr@gmail.com> wrote: > The PQgetRowData() is unimportant as it just exposes > the rowBuf to user and thats all. So to get back to the more interesting PQgetRowData() discussion... Did you notice that it's up to app to decide whether it calls PQgetResult() or PQgetRowData() after PQsetSingleRowMode()? So there is no way for it to get into conflicts with anything. If libpqtypes keeps using PQgetResult it keeps getting PGresult. No problem. The PQgetRowData() is meant for use-cases where PGresult is *not* the main data structure the app operates on. If the app does dumb copy out of PGresult as soon as possible, it can move to PGgetRowData(). If app wants do to complex operations with PGresult or just store it, then it can keep doing it. Nobody forces the use of PQgetRowData(). Now the about idea about doing more optimized PGresult - one way of doing it would be to create zero-copy PGresult that points into network buffer like PGgetRowData() does. But this breaks all expectations of lifetime rules for PGresult, thus seems like bad idea. Second way is to optimize the copying step - basically just do a malloc and 2 or 3 memcopies - for struct, headers and data. This leaves standalone PGresult around that behaves as expected. But for apps that don't care about PGresult it is still unnecessary copy. So even if we optimize PGresult that way, it still seems worthwhile to have PGgetRowData() around. Hm, I think it's possible to rig the test to do dummy copy of pgresult, thus it's possible to see what kind of speed is possible.. Will try. -- marko
On Tue, Jul 24, 2012 at 7:25 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Tue, Jul 24, 2012 at 11:08 AM, Marko Kreen <markokr@gmail.com> wrote: >>> I'm arguing that *all* data getting must continue to do so through the >>> result object, and bypassing the result to get at data is breaking the >>> result abstraction in the libpq api. I bet you can still maintain >>> data access through result object while avoiding extra copies. >> >> Well, the main problem this week is whether we should >> apply PQsetSingleRowMode() from single-row-mode1 >> or from single-row-mode2 branch? >> >> The PQgetRowData() is unimportant as it just exposes >> the rowBuf to user and thats all. > > right. branch 1 (containing PQgetRowData) seems wrong to me. Indeed, you are missing the fact that PGgetResult works same in both branches., And please see the benchmart I posted. Even better, run it yourself... >> What do you mean by that? And have you though about both >> sync and async usage patterns? > > No, I haven't -- at least not very well. The basic idea is that > PQsetSingleRowMode turns into PQgetSingleRowResult() and returns a > result object. For row by row an extra API call gets called (maybe > PQgetNextRow(PGconn, PGresult)) that does the behind the scenes work > under the existing result object. This is the same general structure > you have in branch 2, but the result allocation is move out of the > loop. Presumably sync and async would then follow the same pattern, > but we'd still have to be able to guarantee non-blocking behavior in > the async api. Well, as discussed previously it's worthwhile to keep standard functions like PQisBusy() and PQgetResult() working sensibly in new mode, and currently they do so. If you add new functions, you also need to define the behavior when new and old one's are mixed and it gets messy quickly. -- marko
On Tue, Jul 24, 2012 at 11:39 AM, Marko Kreen <markokr@gmail.com> wrote: > On Tue, Jul 24, 2012 at 7:25 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> On Tue, Jul 24, 2012 at 11:08 AM, Marko Kreen <markokr@gmail.com> wrote: >>>> I'm arguing that *all* data getting must continue to do so through the >>>> result object, and bypassing the result to get at data is breaking the >>>> result abstraction in the libpq api. I bet you can still maintain >>>> data access through result object while avoiding extra copies. >>> >>> Well, the main problem this week is whether we should >>> apply PQsetSingleRowMode() from single-row-mode1 >>> or from single-row-mode2 branch? >>> >>> The PQgetRowData() is unimportant as it just exposes >>> the rowBuf to user and thats all. >> >> right. branch 1 (containing PQgetRowData) seems wrong to me. > > Indeed, you are missing the fact that PGgetResult works same > in both branches., > > And please see the benchmart I posted. > > Even better, run it yourself... > >>> What do you mean by that? And have you though about both >>> sync and async usage patterns? >> >> No, I haven't -- at least not very well. The basic idea is that >> PQsetSingleRowMode turns into PQgetSingleRowResult() and returns a >> result object. For row by row an extra API call gets called (maybe >> PQgetNextRow(PGconn, PGresult)) that does the behind the scenes work >> under the existing result object. This is the same general structure >> you have in branch 2, but the result allocation is move out of the >> loop. Presumably sync and async would then follow the same pattern, >> but we'd still have to be able to guarantee non-blocking behavior in >> the async api. > > Well, as discussed previously it's worthwhile to keep standard functions > like PQisBusy() and PQgetResult() working sensibly in new mode, > and currently they do so. > > If you add new functions, you also need to define the behavior > when new and old one's are mixed and it gets messy quickly. Right, I'm aware that you can use 'slow' method in branch 1. I also saw the benchmarks and agree that single row mode should be at least competitive with current methods for pretty much all cases. But, the faster rowbuf method is a generally incompatible way of dealing with data vs current libpq -- this is bad. If it's truly impossible to get those benefits without bypassing result API that then I remove my objection on the grounds it's optional behavior (my gut tells me it is possible though). I think the dummy copy of PGresult is plausible (if by that you mean optimizing PQgetResult when in single row mode). That would be even better: you'd remove the need for the rowbuf mode. merlin
On Tue, Jul 24, 2012 at 7:52 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > But, the faster rowbuf method is a generally incompatible way of > dealing with data vs current libpq -- this is bad. If it's truly > impossible to get those benefits without bypassing result API that > then I remove my objection on the grounds it's optional behavior (my > gut tells me it is possible though). Um, please clarify what are you talking about here? What is the incompatibility of PGresult from branch 1? -- marko
Merlin Moncure <mmoncure@gmail.com> writes: > I think the dummy copy of PGresult is plausible (if by that you mean > optimizing PQgetResult when in single row mode). That would be even > better: you'd remove the need for the rowbuf mode. I haven't spent any time looking at this, but my gut tells me that a big chunk of the expense is copying the PGresult's metadata (the column names, types, etc). It has to be possible to make that cheaper. One idea is to rearrange the internal storage so that that part reduces to one memcpy(). Another thought is to allow PGresults to share metadata by treating the metadata as a separate reference-counted object. The second could be a bit hazardous though, as we advertise that PGresults are independent objects that can be manipulated by separate threads. I don't want to introduce mutexes into PGresults, but I'm not sure reference-counted metadata can be safe without them. So maybe the memcpy idea is the only workable one. regards, tom lane
On Tue, Jul 24, 2012 at 11:57 AM, Marko Kreen <markokr@gmail.com> wrote: > On Tue, Jul 24, 2012 at 7:52 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> But, the faster rowbuf method is a generally incompatible way of >> dealing with data vs current libpq -- this is bad. If it's truly >> impossible to get those benefits without bypassing result API that >> then I remove my objection on the grounds it's optional behavior (my >> gut tells me it is possible though). > > Um, please clarify what are you talking about here? > > What is the incompatibility of PGresult from branch 1? Incompatibility in terms of usage -- we should be getting data with PQgetdata. I think you're suspecting that I incorrectly believe your forced to use the rowbuf API -- I don't (although I wasn't clear on that earlier). Basically I'm saying that we should only buy into that if all other alternative routes to getting the faster performance are exhausted. On Tue, Jul 24, 2012 at 11:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> I think the dummy copy of PGresult is plausible (if by that you mean >> optimizing PQgetResult when in single row mode). That would be even >> better: you'd remove the need for the rowbuf mode. > > I haven't spent any time looking at this, but my gut tells me that a big > chunk of the expense is copying the PGresult's metadata (the column > names, types, etc). It has to be possible to make that cheaper. > > One idea is to rearrange the internal storage so that that part reduces > to one memcpy(). Another thought is to allow PGresults to share > metadata by treating the metadata as a separate reference-counted > object. The second could be a bit hazardous though, as we advertise > that PGresults are independent objects that can be manipulated by > separate threads. I don't want to introduce mutexes into PGresults, > but I'm not sure reference-counted metadata can be safe without them. > So maybe the memcpy idea is the only workable one. Yeah -- we had a very similar problem in libpqtypes and we solved it exactly as you're thinking. libpqtypes has to create a result with each row iteration potentially (we expose rows and composites as on the fly created result objects) and stores some extra non-trivial data with the result. We solved it with the optimized-memcpy method (look here: http://libpqtypes.esilo.com/browse_source.html?file=libpqtypes.h and you'll see all the important structs like PGtypeHandler are somewhat haphazardly designed to be run through a memcpy. We couldn't do anything about internal libpq issues though, but some micro optimization of PQsetResultAttrs (which is called via PQcopyResult) might fit the bill. The 'source' result (or source data that would be copied into the destination result) would be stored in the PGconn, right? So, the idea is that when you set up single row mode the connection generates a template PGconn which is then copied out repeatedly during row-by-row processing. I like it, but only if we're reasonably confident the PGresult can be sufficiently optimized like that. merlin
On Tue, Jul 24, 2012 at 1:33 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > The 'source' result (or source data that would be copied into the > destination result) would be stored in the PGconn, right? So, the idea > is that when you set up single row mode the connection generates a > template PGconn which is then copied out repeatedly during row-by-row > processing. I like it, but only if we're reasonably confident the > PGresult can be sufficiently optimized like that. hm, PGresAttDesc is unfortunately in the public header and as such probably can't be changed? merlin
On Tue, Jul 24, 2012 at 1:49 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Tue, Jul 24, 2012 at 1:33 PM, Merlin Moncure <mmoncure@gmail.com> wrote: >> The 'source' result (or source data that would be copied into the >> destination result) would be stored in the PGconn, right? So, the idea >> is that when you set up single row mode the connection generates a >> template PGconn which is then copied out repeatedly during row-by-row >> processing. I like it, but only if we're reasonably confident the >> PGresult can be sufficiently optimized like that. > > hm, PGresAttDesc is unfortunately in the public header and as such > probably can't be changed? It can't -- at least not without breaking compatibility. So, as attractive as it sounds, it looks like a memcpy based PGresult copy is out unless we break the rules change it anyways (with the probably safe assumption that the only userland code that cares is libpqtypes, which we'd certainly change). However, it's not clear that a shared metadata implementation would require a mutex -- if you stored the shared data in the conn and were willing to walk the results in the event the PGconn was freed before the results, you'd probably be ok. That's really unpleasant though. Either way, it looks like there's plausible paths to optimizing repeated result fetch without having expose an alternate data-fetching API and that the proposed implementation doesn't appear to be a wall in terms of getting there. So I'm firmly in the non-exposed-rowbuf camp, even if we can't expose an optimal implementation in time for 9.2. merlin
Merlin Moncure <mmoncure@gmail.com> writes: > Either way, it looks like there's plausible paths to optimizing > repeated result fetch without having expose an alternate data-fetching > API and that the proposed implementation doesn't appear to be a wall > in terms of getting there. So I'm firmly in the non-exposed-rowbuf > camp, even if we can't expose an optimal implementation in time for > 9.2. Yeah, the schedule argument is a strong one. If we put in PQgetRowData now, we'll be stuck with it forevermore. It would be better to push that part of the patch to 9.3 so we can have more time to think it over and investigate alternatives. The single-row mode is already enough to solve the demonstrated client-side performance issues we know about. So IMO it would be a lot smarter to be spending our time right now on making sure we have *that* part of the patch right. In particular I agree that PQsetSingleRowMode is a bit ugly, so I'm wondering about your thoughts on providing PQgetSingleRowResult instead. I don't see how to make that work in async mode. I think the library needs to be aware of whether it's supposed to return a single-row result in advance of actually doing so --- for instance, how can PQisBusy possibly give the correct answer if it doesn't know whether having received the first row is sufficient? (Yeah, I know we could invent a separate flavor of PQisBusy for single-row usage, but ick. There are implementation problems too, such as possibly having already copied a lot of rows into a work-in-progress PGresult.) The thing I fundamentally don't like about PQsetSingleRowMode is that there's such a narrow window of time to use it correctly -- as soon as you've done even one PQconsumeInput, it's too late. (Or maybe not, if the server is slow, which makes it timing-dependent whether you'll observe a bug. Maybe we should add more internal state so that we can consistently throw error if you've done any PQconsumeInput?) The most obvious alternative is to invent new versions of the PQsendXXX functions with an additional flag parameter; but there are enough of them to make that not very attractive either. regards, tom lane
> Hm, I think it's possible to rig the test to do dummy > copy of pgresult, thus it's possible to see what kind > of speed is possible.. Will try. I added a new method (-x) to rowdump where it asks for row with PQgetRowData() and then tries to emulate super-efficient PGresult copy, then loads data from that PGresult. Quick reference: rowdump1 - single-row-mode1 [~ libpq 9.2] rowdump2 - single-row-mode2 [~ libpq 9.1] -s - single row mode with PQgetResult() -z - single row mode with PQgetRowData() -x - simulated optimized PQgetResult() ------------------------------------------------------------- QUERY: select 10000,200,300000,rpad('x',30,'z') from generate_series(1,5000000) ./rowdump1 -s: 6.28 6.25 6.39 avg: 6.31 [ 100.00 % ] ./rowdump2 -s: 7.49 7.40 7.57 avg: 7.49 [ 118.71 % ] ./rowdump1 -z: 2.86 2.77 2.79 avg: 2.81 [ 44.50 % ] ./rowdump1 -x: 3.46 3.27 3.29 avg: 3.34 [ 52.96 % ] QUERY: select rpad('x',10,'z'),rpad('x',20,'z'),rpad('x',30,'z'),rpad('x',40,'z'),rpad('x',50,'z'),rpad('x',60,'z') from generate_series(1,3000000) ./rowdump1 -s: 7.76 7.76 7.68 avg: 7.73 [ 100.00 % ] ./rowdump2 -s: 8.24 8.12 8.66 avg: 8.34 [ 107.85 % ] ./rowdump1 -z: 5.34 5.07 5.23 avg: 5.21 [ 67.41 % ] ./rowdump1 -x: 5.53 5.61 5.61 avg: 5.58 [ 72.20 % ] QUERY: select 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100 from generate_series(1,800000) ./rowdump1 -s: 7.49 7.66 7.59 avg: 7.58 [ 100.00 % ] ./rowdump2 -s: 7.56 8.12 7.95 avg: 7.88 [ 103.91 % ] ./rowdump1 -z: 2.77 2.76 2.76 avg: 2.76 [ 36.46 % ] ./rowdump1 -x: 3.07 3.05 3.18 avg: 3.10 [ 40.90 % ] QUERY: select 1000,rpad('x', 400, 'z'),rpad('x', 4000, 'z') from generate_series(1,100000) ./rowdump1 -s: 2.66 2.62 2.67 avg: 2.65 [ 100.00 % ] ./rowdump2 -s: 3.11 3.14 3.11 avg: 3.12 [ 117.74 % ] ./rowdump1 -z: 2.49 2.46 2.47 avg: 2.47 [ 93.33 % ] ./rowdump1 -x: 2.59 2.57 2.57 avg: 2.58 [ 97.23 % ] ----------------------------------------------------------------- It shows that even if the actual "fast" row copy will be slower than this one here, it's still quote competitive approach to PQgetRowData(), as long it's not too slow. So the optimized PGgetResult() may be good enough, thus we can drop the idea of PQgetRowData(). Code attached, also in https://github.com/markokr/pqtest repo. -- marko
Attachment
On Tue, Jul 24, 2012 at 2:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> Either way, it looks like there's plausible paths to optimizing >> repeated result fetch without having expose an alternate data-fetching >> API and that the proposed implementation doesn't appear to be a wall >> in terms of getting there. So I'm firmly in the non-exposed-rowbuf >> camp, even if we can't expose an optimal implementation in time for >> 9.2. > > Yeah, the schedule argument is a strong one. If we put in PQgetRowData > now, we'll be stuck with it forevermore. It would be better to push > that part of the patch to 9.3 so we can have more time to think it over > and investigate alternatives. The single-row mode is already enough to > solve the demonstrated client-side performance issues we know about. > So IMO it would be a lot smarter to be spending our time right now on > making sure we have *that* part of the patch right. > > In particular I agree that PQsetSingleRowMode is a bit ugly, so I'm > wondering about your thoughts on providing PQgetSingleRowResult instead. > I don't see how to make that work in async mode. I think the library > needs to be aware of whether it's supposed to return a single-row result > in advance of actually doing so --- for instance, how can PQisBusy > possibly give the correct answer if it doesn't know whether having > received the first row is sufficient? Well (Marko probably wants to slap me with a halibut by now), the original intent was for it not to have to: PQgetSingleRowResult is more 'construct result for single row iteration', so it would never block -- it only sets the internal single row mode and instantiates the result object. After that, you can do do standard consumeinput/isbusy processing on the conn. The actual iteration routine would be like PQiterateResult which you could guard with PQisBusy. Like I said: it's the same general structure but the result construction is moved out of the loop. I don't think this breaks result scoping rules...the conn keeps a copy of the result during 'result construction' which we are going to define as ending when the query terminates. Until the query resolves, the result remains 'under construction' and avoids the expectation of const-ness that you normally get so you don't get to interleave threads reading the result with threads iterating from the connection (which is fine) and you have to avoid doing stupid things like closing the connection before all the data has been read through. (but maybe all this is moot per Marko's latest benchmarks) merlin
On Tue, Jul 24, 2012 at 11:34 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Tue, Jul 24, 2012 at 2:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> In particular I agree that PQsetSingleRowMode is a bit ugly, so I'm >> wondering about your thoughts on providing PQgetSingleRowResult instead. >> I don't see how to make that work in async mode. I think the library >> needs to be aware of whether it's supposed to return a single-row result >> in advance of actually doing so --- for instance, how can PQisBusy >> possibly give the correct answer if it doesn't know whether having >> received the first row is sufficient? > > Well (Marko probably wants to slap me with a halibut by now), the > original intent was for it not to have to: PQgetSingleRowResult is > more 'construct result for single row iteration', so it would never > block -- it only sets the internal single row mode and instantiates > the result object. After that, you can do do standard > consumeinput/isbusy processing on the conn. The actual iteration > routine would be like PQiterateResult which you could guard with > PQisBusy. Like I said: it's the same general structure but the result > construction is moved out of the loop. If you really don't like PQsetSingleRowMode, then I would prefer new PQsend* functions to new fetch functions. Because while send is one-shot affair, receive is complex state-machine with requires multiple function calls. -- marko
On Tue, Jul 24, 2012 at 10:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> Either way, it looks like there's plausible paths to optimizing >> repeated result fetch without having expose an alternate data-fetching >> API and that the proposed implementation doesn't appear to be a wall >> in terms of getting there. So I'm firmly in the non-exposed-rowbuf >> camp, even if we can't expose an optimal implementation in time for >> 9.2. > > Yeah, the schedule argument is a strong one. If we put in PQgetRowData > now, we'll be stuck with it forevermore. It would be better to push > that part of the patch to 9.3 so we can have more time to think it over > and investigate alternatives. The single-row mode is already enough to > solve the demonstrated client-side performance issues we know about. > So IMO it would be a lot smarter to be spending our time right now on > making sure we have *that* part of the patch right. Just to show agreement: both PQgetRowData() and optimized PGresult do not belong to 9.2. Only open questions are: * Is there better API than PQsetSingleRowMode()? New PQsend* functions is my alternative. * Should we rollback rowBuf change? I think no, as per benchmark it performs better than old code. > The thing I fundamentally don't like about PQsetSingleRowMode is that > there's such a narrow window of time to use it correctly -- as soon as > you've done even one PQconsumeInput, it's too late. (Or maybe not, if > the server is slow, which makes it timing-dependent whether you'll > observe a bug. Maybe we should add more internal state so that we can > consistently throw error if you've done any PQconsumeInput?) The most > obvious alternative is to invent new versions of the PQsendXXX functions > with an additional flag parameter; but there are enough of them to make > that not very attractive either. It belongs logically together with PQsend, so if you don't like current separation, then proper fix is to make them single function call. -- marko
On Tue, Jul 24, 2012 at 3:52 PM, Marko Kreen <markokr@gmail.com> wrote: > * Is there better API than PQsetSingleRowMode()? New PQsend* > functions is my alternative. I would prefer to have PQsetSingleRowMode() over new flavor of PQsend. To consolidate my argument above: since you're throwing PQgetResult in the main iteration loop you're potentially walling yourself off from one important optimization: not copying the result at all and reusing the old one; that's going to produce the fastest possible code since you're recovering all the attribute structures that have already been set up unless you're willing to do the following: Reading your original patch, what if at the line inside PQgetResult: res = pqSingleRowResult(conn); you instead manipulated the result the connection already had and directly returned it -- discarding the result data but not the attributes etc? Actually, you could even keep your API 'as is' if you're willing to adjust the behavior of PQclear while the connection is doing row by row results processing to be a no-op unless done. Single row results' data would then be volatile across iterations, not const -- even if you save off the pointer the connection can and will rewrite the data portion of the PGresult. Any pointers to PQgetdata'd values would have to be copied off between iterations naturally (or the user can copy off using the user-facing copy result function). This would be extremely efficient since we'd only even need to extend PGresult buffer if a particular row was longer than the longest one already fetched. If all that's a bridge too far, we can look at re-jiggering the API like I was thinking ealier to make the adjusted result scoping more user visible or run your non-rowbuf-exposing patch as-is and hope for optimizations down the line. merlin
On Wed, Jul 25, 2012 at 12:35 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Tue, Jul 24, 2012 at 3:52 PM, Marko Kreen <markokr@gmail.com> wrote: >> * Is there better API than PQsetSingleRowMode()? New PQsend* >> functions is my alternative. > > I would prefer to have PQsetSingleRowMode() over new flavor of PQsend. > > To consolidate my argument above: since you're throwing PQgetResult in > the main iteration loop you're potentially walling yourself off from > one important optimization: not copying the result at all and reusing > the old one; that's going to produce the fastest possible code since > you're recovering all the attribute structures that have already been > set up unless you're willing to do the following: I am not forgetting such optimization, I've already implemented it: it's called PQgetRowData(). Main reason why it works, and so simply, is that it does not try to give you PGresult. PGresult carries various historical assumptions: - Fully user-controlled lifetime. - Zero-terminated strings. - Aligned binary values. Breaking them all does not seem like a good idea. Trying to make them work with zero-copy seems not worth the pain. And if you are ready to introduce new accessor functions, why care about PGresult at all? Why not give user PQgetRowData()? Btw, PQgetRowData() and any other potential zero-copy API is not incompatible with both slow PQgetResult() or optimized PQgetResult(). So if we give only PQgetResult() in 9.2, I do not see that we are locked out from any interesting optimizations. -- marko
On Tue, Jul 24, 2012 at 4:59 PM, Marko Kreen <markokr@gmail.com> wrote: > So if we give only PQgetResult() in 9.2, I do not see that we > are locked out from any interesting optimizations. Well, you are locked out of having PQgetResult reuse the conn's result since that would then introduce potentially breaking changes to user code. Here are the approaches I see: 1) introduce rowbuf (but not user exposed rowbuf) patch: Not the fastest method. Users would begin using the feature and API behaviors would be locked in -- only possible optmiization route would be to try and make PQcopyResult as fast as possible. 2) expose PQgetRowData Very fast, but foreign and somewhat awkward. Existing libpq wrappers (libpqtypes, libpqxx etc) could not be converted to use without extensive revision, if at all, and would be stuck with the slower version of iteration. Also a side benefit here is that optimizing result copying later has usefulness outside of row by row processing. 3) as #1, but instead of copying result, overwrite the data area. this is bending the rule 'user defined lifetime of result' since each iteration clears the data area of the PGresult. This is almost as fast as #2, and existing libpq wrappers would be trivially converted to the API. 4) as #3, but introduce a new iteration function (PQiterateResult(conn, result)) that would be called instead of a paired PQgetResult/PQclear. This would also be fast, and could almost lay directly on top of #2 as an alternate optimization strategy -- the set result mode would have to be modified (or and additional function introduced) to return a result. I feel like you're partial to #2 and appreciate that, but I'm really quite skeptical about it as noted. #3 and #4 are not as well thought out as what you've put together and would need some extra work and buy-in to get out the door, so #2 seems like the lowest common denominator (it would permanently preclude #3 and would require #4 to introduce two new functions instead of just one). #1 of course would bolt on to #2. merlin
On Tue, Jul 24, 2012 at 5:29 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > so #2 seems like the lowest common > denominator (it would permanently preclude #3 and would require #4 to > introduce two new functions instead of just one). #1 of course would > bolt on to #2. oops, got #1 and #2 backwards there. merlin
On Wed, Jul 25, 2012 at 1:29 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Tue, Jul 24, 2012 at 4:59 PM, Marko Kreen <markokr@gmail.com> wrote: >> So if we give only PQgetResult() in 9.2, I do not see that we >> are locked out from any interesting optimizations. > > Well, you are locked out of having PQgetResult reuse the conn's result > since that would then introduce potentially breaking changes to user > code. You can specify special flags to PQsend or have special PQgetResultWeird() calls to get PGresults with unusual behavior. Like I did with PQgetRowData(). I see no reason here to reject PQgetResult() that returns normal PGresult. -- marko
On Tuesday, July 24, 2012, Marko Kreen <<a href="mailto:markokr@gmail.com">markokr@gmail.com</a>> wrote:<br />>On Wed, Jul 25, 2012 at 1:29 AM, Merlin Moncure <<a href="mailto:mmoncure@gmail.com">mmoncure@gmail.com</a>>wrote:<br /> >> On Tue, Jul 24, 2012 at 4:59 PM, Marko Kreen<<a href="mailto:markokr@gmail.com">markokr@gmail.com</a>> wrote:<br />>>> So if we give only PQgetResult()in 9.2, I do not see that we<br />>>> are locked out from any interesting optimizations.<br /> >><br/>>> Well, you are locked out of having PQgetResult reuse the conn's result<br />>> since that wouldthen introduce potentially breaking changes to user<br />>> code.<br />><br />> You can specify specialflags to PQsend or have special PQgetResultWeird()<br /> > calls to get PGresults with unusual behavior. LikeI did with PQgetRowData().<br />><br />> I see no reason here to reject PQgetResult() that returns normal PGresult.<br/><br />Yeah -- I agree. So, given the scheduling, I think we should go with non-PQgetRowData patch and reserveoptimized path for 9.3.<br /><br />merlin
Hey, this thread was pointed out to me just a few days ago, but I'll start by saying that I think this thread is on exactly the right track. I don't like the callback API, and think that PQsetSingleRowMode should be offered in place of it. But I do have one
On Sat, Jun 16, 2012 at 10:22 AM, Marko Kreen <markokr@gmail.com> wrote:
The function can be called only after PQsend* and before any
rows have arrived. This guarantees there will be no surprises
to PQexec* users who expect full resultset at once.
Ok, I'm guessing you mean that "before you call PQgetResult or PQgetRowData", or maybe "before you call PQgetResult or PQgetRowData and it returns a result or partial result." Because it would be a race condition if you meant exactly what you said. (Though I don't understand how this could possibly be implemented without some source of concurrency, which libpq doesn't do.) Maybe this is a little overly pendantic, but I do want to confirm the intention here.
One other possibility, Tom Lane fretted ever so slightly about the use of malloc/free per row... what about instead of PQsetSingleRowMode, you have PQsetChunkedRowMode that takes a chunkSize parameter. A chunkSize <= 0 would be equivalent to what we have today, a chunkSize of 1 means you get what you have from PQsetSingleRowMode, and larger chunkSizes would wait until n rows have been received before returning them all in a single result. I don't know that this suggestion is all that important, but it seems like an obvious generalization that might possibly be useful.
Best,
Leon
On 7/30/2012 8:11 PM, Leon Smith wrote: > One other possibility, Tom Lane fretted ever so slightly about the use > of malloc/free per row... what about instead of PQsetSingleRowMode, you > have PQsetChunkedRowMode that takes a chunkSize parameter. A chunkSize > <= 0 would be equivalent to what we have today, a chunkSize of 1 means > you get what you have from PQsetSingleRowMode, and larger chunkSizes > would wait until n rows have been received before returning them all in > a single result. I don't know that this suggestion is all that > important, but it seems like an obvious generalization that might > possibly be useful. It is questionable if that actually adds any useful functionality. Any "collecting" of multiple rows will only run the risk to stall receiving the following rows while processing this batch. Processing each row as soon as it is available will ensure making most use network buffers. Collecting multiple rows, like in the FETCH command for cursors does, makes sense when each batch introduces a network round trip, like for the FETCH command. But does it make any sense for a true streaming mode, like what is discussed here? Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin
On Mon, Jul 30, 2012 at 9:59 PM, Jan Wieck <JanWieck@yahoo.com> wrote:
On 7/30/2012 8:11 PM, Leon Smith wrote:It is questionable if that actually adds any useful functionality.One other possibility, Tom Lane fretted ever so slightly about the use
of malloc/free per row... what about instead of PQsetSingleRowMode, you
have PQsetChunkedRowMode that takes a chunkSize parameter. A chunkSize
<= 0 would be equivalent to what we have today, a chunkSize of 1 means
you get what you have from PQsetSingleRowMode, and larger chunkSizes
would wait until n rows have been received before returning them all in
a single result. I don't know that this suggestion is all that
important, but it seems like an obvious generalization that might
possibly be useful.
This is true, I'm not sure my suggestion is necessarily useful. I'm just throwing it out there.
Any "collecting" of multiple rows will only run the risk to stall receiving the following rows while processing this batch. Processing each row as soon as it is available will ensure making most use network buffers.
This is not necessarily true, on multiple levels. I mean, some of the programs I write are highly concurrent, and this form of batching would have almost no risk of stalling the network buffer. And the possible use case would be when you are dealing with very small rows, when there would typically be several rows inside a single network packet or network buffer.
Collecting multiple rows, like in the FETCH command for cursors does, makes sense when each batch introduces a network round trip, like for the FETCH command. But does it make any sense for a true streaming mode, like what is discussed here?
Maybe? I mean, I anticipate that there are (probably) still use cases for FETCH, even when the row-at-a-time interface is a viable option and the transport between postgres and the client has reasonable flow-control.
Leon
On 7/30/2012 10:31 PM, Leon Smith wrote: > This is not necessarily true, on multiple levels. I mean, some of > the programs I write are highly concurrent, and this form of batching > would have almost no risk of stalling the network buffer. And > the possible use case would be when you are dealing with very small > rows, when there would typically be several rows inside a single > network packet or network buffer. With "highly concurrent" you mean multi-threaded? Like one thread reads the rows in batches and pushes them into a queue while another thread processes them from that queue? If that is the case, then you just added a useless layer of buffering and the need for thread/thread context switches to PQsetSingleRowMode. Libpq's "receiver thread" is the kernel itself. Libpq tries to never read partial kernel buffers already. It always makes sure that there are at least 8K of free space in the inBuffer. In the case you describe above, where several rows fit into a single packet, libpq will receive them with a single system call in one read(2), then the application can get them as fast as possible, without causing any further context switches because they are already in the inBuffer. I've written that sort of code myself in the past. Look at the Slony worker thread prior to 2.2. We switched to the COPY protocol instead of waiting for the single row mode and got rid of all that extra buffering already (and then some more). Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin
On Mon, Jul 30, 2012 at 10:26 PM, Jan Wieck <JanWieck@yahoo.com> wrote: > On 7/30/2012 10:31 PM, Leon Smith wrote: >> >> This is not necessarily true, on multiple levels. I mean, some of >> the programs I write are highly concurrent, and this form of batching >> would have almost no risk of stalling the network buffer. And >> the possible use case would be when you are dealing with very small >> rows, when there would typically be several rows inside a single >> network packet or network buffer. > > > With "highly concurrent" you mean multi-threaded? Like one thread reads the > rows in batches and pushes them into a queue while another thread processes > them from that queue? > > If that is the case, then you just added a useless layer of buffering and > the need for thread/thread context switches to PQsetSingleRowMode. Libpq's > "receiver thread" is the kernel itself. Libpq tries to never read partial > kernel buffers already. It always makes sure that there are at least 8K of > free space in the inBuffer. In the case you describe above, where several > rows fit into a single packet, libpq will receive them with a single system > call in one read(2), then the application can get them as fast as possible, > without causing any further context switches because they are already in the > inBuffer. Yeah: with asynchronous query processing the query gets sent and control returns immediately to your code: that's the whole point. Even if some data races to the network buffer, libpq doesn't 'see' any data until you tell it to by asking for a result (which can block) or draining the buffers with PQconsumeInput. So there is no race in the traditional sense and I'm ok with the PQsetSingleRowMode as such. Removing malloc/free on row iteration seems only to be possible via one of two methods: either a) you introduce a non-PGresult based method of data extraction or b) you preserve the PGresult across row iterations. merlin
[ getting back to this now after assorted distractions ] Marko Kreen <markokr@gmail.com> writes: > Just to show agreement: both PQgetRowData() and optimized PGresult > do not belong to 9.2. OK, we're all on board with leaving those for later. > Only open questions are: > * Is there better API than PQsetSingleRowMode()? New PQsend* > functions is my alternative. After thinking it over, I'm really unexcited about adding new versions of all the PQsend functions. If we had the prospect of more flags in future that could be added to a bitmask flags argument, it would be more palatable, but it's far from clear what those might be. So I'm now leaning towards using PQsetSingleRowMode as-is. > * Should we rollback rowBuf change? I think no, as per benchmark > it performs better than old code. I had already pretty much come to that conclusion just based on code review, without thinking about performance. In particular, we had done some nontrivial work towards improving error-case handling in the data message parsing code, and I don't really want to give that up, nor rewrite it on the fly now. About the only reason I could see for reverting rowBuf was that I thought it might hurt performance; so now that you've proven the opposite, we should leave it alone. So I'm working from the first set of patches in your message <20120721194907.GA28021@gmail.com>. regards, tom lane
On Wed, Aug 1, 2012 at 6:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Marko Kreen <markokr@gmail.com> writes: >> * Is there better API than PQsetSingleRowMode()? New PQsend* >> functions is my alternative. > > After thinking it over, I'm really unexcited about adding new versions > of all the PQsend functions. If we had the prospect of more flags in > future that could be added to a bitmask flags argument, it would be > more palatable, but it's far from clear what those might be. So I'm > now leaning towards using PQsetSingleRowMode as-is. I can imagine such flag - I'd like to have a flag to allow send more queries to server without waiting the finish of old ones. Currently, when a batch-processing app wants to keep backend busy, they need to stuff many statements into single query. Which is ugly. Among other things it loses the possibility to separate arguments from query, and it breaks error reporting. The flag would fix this, and it would be useful also in other scenarios. Interestingly, although it affects different area, it would be also be flag for PQsend* and not for PQexec*. So maybe the direction is not completely wrong. But I cannot imagine why the PQsetSingleRowMode would be hard to keep working even if we have PQsend functions with flags, so I'm not worried about getting it exactly right from the start. >> * Should we rollback rowBuf change? I think no, as per benchmark >> it performs better than old code. > > I had already pretty much come to that conclusion just based on code > review, without thinking about performance. In particular, we had done > some nontrivial work towards improving error-case handling in the data > message parsing code, and I don't really want to give that up, nor > rewrite it on the fly now. About the only reason I could see for > reverting rowBuf was that I thought it might hurt performance; so now > that you've proven the opposite, we should leave it alone. Additional argument for rowBuf is Merlin's wish for weirdly optimized PGresults. Basically, with rowBuf anybody who wants to experiment with different ways to process row data just needs to implement single function which processes data then advances the state machine. Like I did with PQgetRowData(). Without it, quite lot of code needs to be patched. > So I'm working from the first set of patches in your message > <20120721194907.GA28021@gmail.com>. Great! -- marko
Marko Kreen <markokr@gmail.com> writes: > On Wed, Aug 1, 2012 at 6:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So I'm working from the first set of patches in your message >> <20120721194907.GA28021@gmail.com>. > Great! Here's an updated draft patch. I've not looked at the docs yet so this doesn't include that, but I'm reasonably happy with the code changes now. The main difference from what you had is that I pushed the creation of the single-row PGresults into pqRowProcessor, so that they're formed immediately while scanning the input message. That other method with postponing examination of the rowBuf does not work, any more than it did with the original patch, because you can't assume that the network input buffer is going to hold still. For example, calling PQconsumeInput after parseInput has absorbed a row-data message but before calling PQgetResult would likely break things. In principle I suppose we could hack PQconsumeInput enough that it didn't damage the row buffer while still meeting its API contract of clearing the read-ready condition on the socket; but it wouldn't be simple or cheap to do so. regards, tom lane diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index 1e62d8091a9d2bdf60af6745d5a01ee14ee5cf5a..7cf86176c2385b9e4ee37c72d7e3c662ea079f7a 100644 *** a/contrib/dblink/dblink.c --- b/contrib/dblink/dblink.c *************** typedef struct storeInfo *** 70,75 **** --- 70,78 ---- AttInMetadata *attinmeta; MemoryContext tmpcontext; char **cstrs; + /* temp storage for results to avoid leaks on exception */ + PGresult *last_res; + PGresult *cur_res; } storeInfo; /* *************** static void materializeQueryResult(Funct *** 83,90 **** const char *conname, const char *sql, bool fail); ! static int storeHandler(PGresult *res, const PGdataValue *columns, ! const char **errmsgp, void *param); static remoteConn *getConnectionByName(const char *name); static HTAB *createConnHash(void); static void createNewConnection(const char *name, remoteConn *rconn); --- 86,93 ---- const char *conname, const char *sql, bool fail); ! static PGresult *storeQueryResult(storeInfo *sinfo, PGconn *conn, const char *sql); ! static void storeRow(storeInfo *sinfo, PGresult *res, bool first); static remoteConn *getConnectionByName(const char *name); static HTAB *createConnHash(void); static void createNewConnection(const char *name, remoteConn *rconn); *************** materializeResult(FunctionCallInfo fcinf *** 927,934 **** /* * Execute the given SQL command and store its results into a tuplestore * to be returned as the result of the current function. * This is equivalent to PQexec followed by materializeResult, but we make ! * use of libpq's "row processor" API to reduce per-row overhead. */ static void materializeQueryResult(FunctionCallInfo fcinfo, --- 930,939 ---- /* * Execute the given SQL command and store its results into a tuplestore * to be returned as the result of the current function. + * * This is equivalent to PQexec followed by materializeResult, but we make ! * use of libpq's single-row mode to avoid accumulating the whole result ! * inside libpq before it gets transferred to the tuplestore. */ static void materializeQueryResult(FunctionCallInfo fcinfo, *************** materializeQueryResult(FunctionCallInfo *** 944,962 **** /* prepTuplestoreResult must have been called previously */ Assert(rsinfo->returnMode == SFRM_Materialize); PG_TRY(); { ! /* initialize storeInfo to empty */ ! memset(&sinfo, 0, sizeof(sinfo)); ! sinfo.fcinfo = fcinfo; ! ! /* We'll collect tuples using storeHandler */ ! PQsetRowProcessor(conn, storeHandler, &sinfo); ! ! res = PQexec(conn, sql); ! ! /* We don't keep the custom row processor installed permanently */ ! PQsetRowProcessor(conn, NULL, NULL); if (!res || (PQresultStatus(res) != PGRES_COMMAND_OK && --- 949,962 ---- /* prepTuplestoreResult must have been called previously */ Assert(rsinfo->returnMode == SFRM_Materialize); + /* initialize storeInfo to empty */ + memset(&sinfo, 0, sizeof(sinfo)); + sinfo.fcinfo = fcinfo; + PG_TRY(); { ! /* execute query, collecting any tuples into the tuplestore */ ! res = storeQueryResult(&sinfo, conn, sql); if (!res || (PQresultStatus(res) != PGRES_COMMAND_OK && *************** materializeQueryResult(FunctionCallInfo *** 975,982 **** else if (PQresultStatus(res) == PGRES_COMMAND_OK) { /* ! * storeHandler didn't get called, so we need to convert the ! * command status string to a tuple manually */ TupleDesc tupdesc; AttInMetadata *attinmeta; --- 975,982 ---- else if (PQresultStatus(res) == PGRES_COMMAND_OK) { /* ! * storeRow didn't get called, so we need to convert the command ! * status string to a tuple manually */ TupleDesc tupdesc; AttInMetadata *attinmeta; *************** materializeQueryResult(FunctionCallInfo *** 1008,1032 **** tuplestore_puttuple(tupstore, tuple); PQclear(res); } else { Assert(PQresultStatus(res) == PGRES_TUPLES_OK); ! /* storeHandler should have created a tuplestore */ Assert(rsinfo->setResult != NULL); PQclear(res); } } PG_CATCH(); { - /* be sure to unset the custom row processor */ - PQsetRowProcessor(conn, NULL, NULL); /* be sure to release any libpq result we collected */ ! if (res) ! PQclear(res); /* and clear out any pending data in libpq */ ! while ((res = PQskipResult(conn)) != NULL) PQclear(res); PG_RE_THROW(); } --- 1008,1037 ---- tuplestore_puttuple(tupstore, tuple); PQclear(res); + res = NULL; } else { Assert(PQresultStatus(res) == PGRES_TUPLES_OK); ! /* storeRow should have created a tuplestore */ Assert(rsinfo->setResult != NULL); PQclear(res); + res = NULL; } + PQclear(sinfo.last_res); + sinfo.last_res = NULL; + PQclear(sinfo.cur_res); + sinfo.cur_res = NULL; } PG_CATCH(); { /* be sure to release any libpq result we collected */ ! PQclear(res); ! PQclear(sinfo.last_res); ! PQclear(sinfo.cur_res); /* and clear out any pending data in libpq */ ! while ((res = PQgetResult(conn)) != NULL) PQclear(res); PG_RE_THROW(); } *************** materializeQueryResult(FunctionCallInfo *** 1034,1056 **** } /* ! * Custom row processor for materializeQueryResult. ! * Prototype of this function must match PQrowProcessor. */ ! static int ! storeHandler(PGresult *res, const PGdataValue *columns, ! const char **errmsgp, void *param) { - storeInfo *sinfo = (storeInfo *) param; int nfields = PQnfields(res); - char **cstrs = sinfo->cstrs; HeapTuple tuple; - char *pbuf; - int pbuflen; int i; MemoryContext oldcontext; ! if (columns == NULL) { /* Prepare for new result set */ ReturnSetInfo *rsinfo = (ReturnSetInfo *) sinfo->fcinfo->resultinfo; --- 1039,1108 ---- } /* ! * Execute query, and send any result rows to sinfo->tuplestore. */ ! static PGresult * ! storeQueryResult(storeInfo *sinfo, PGconn *conn, const char *sql) ! { ! bool first = true; ! PGresult *res; ! ! if (!PQsendQuery(conn, sql)) ! return PQgetResult(conn); ! ! if (!PQsetSingleRowMode(conn)) /* shouldn't fail */ ! elog(ERROR, "dblink: failed to set single-row mode"); ! ! for (;;) ! { ! CHECK_FOR_INTERRUPTS(); ! ! sinfo->cur_res = PQgetResult(conn); ! if (!sinfo->cur_res) ! break; ! ! if (PQresultStatus(sinfo->cur_res) == PGRES_SINGLE_TUPLE) ! { ! /* got one row from possibly-bigger resultset */ ! storeRow(sinfo, sinfo->cur_res, first); ! ! PQclear(sinfo->cur_res); ! sinfo->cur_res = NULL; ! first = false; ! } ! else ! { ! /* if empty resultset, fill tuplestore header */ ! if (first && PQresultStatus(sinfo->cur_res) == PGRES_TUPLES_OK) ! storeRow(sinfo, sinfo->cur_res, first); ! ! /* store completed result at last_res */ ! PQclear(sinfo->last_res); ! sinfo->last_res = sinfo->cur_res; ! sinfo->cur_res = NULL; ! first = true; ! } ! } ! ! /* return last_res */ ! res = sinfo->last_res; ! sinfo->last_res = NULL; ! return res; ! } ! ! /* ! * Send single row to sinfo->tuplestore. ! * If "first" is true, create the tuplestore using PGresult's metadata. ! */ ! static void ! storeRow(storeInfo *sinfo, PGresult *res, bool first) { int nfields = PQnfields(res); HeapTuple tuple; int i; MemoryContext oldcontext; ! if (first) { /* Prepare for new result set */ ReturnSetInfo *rsinfo = (ReturnSetInfo *) sinfo->fcinfo->resultinfo; *************** storeHandler(PGresult *res, const PGdata *** 1098,1110 **** sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc); /* Create a new, empty tuplestore */ ! oldcontext = MemoryContextSwitchTo( ! rsinfo->econtext->ecxt_per_query_memory); sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem); rsinfo->setResult = sinfo->tuplestore; rsinfo->setDesc = tupdesc; MemoryContextSwitchTo(oldcontext); /* * Set up sufficiently-wide string pointers array; this won't change * in size so it's easy to preallocate. --- 1150,1165 ---- sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc); /* Create a new, empty tuplestore */ ! oldcontext = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory); sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem); rsinfo->setResult = sinfo->tuplestore; rsinfo->setDesc = tupdesc; MemoryContextSwitchTo(oldcontext); + /* Done if empty resultset */ + if (PQntuples(res) == 0) + return; + /* * Set up sufficiently-wide string pointers array; this won't change * in size so it's easy to preallocate. *************** storeHandler(PGresult *res, const PGdata *** 1121,1131 **** ALLOCSET_DEFAULT_MINSIZE, ALLOCSET_DEFAULT_INITSIZE, ALLOCSET_DEFAULT_MAXSIZE); - - return 1; } ! CHECK_FOR_INTERRUPTS(); /* * Do the following work in a temp context that we reset after each tuple. --- 1176,1185 ---- ALLOCSET_DEFAULT_MINSIZE, ALLOCSET_DEFAULT_INITSIZE, ALLOCSET_DEFAULT_MAXSIZE); } ! /* Should have a single-row result if we get here */ ! Assert(PQntuples(res) == 1); /* * Do the following work in a temp context that we reset after each tuple. *************** storeHandler(PGresult *res, const PGdata *** 1135,1180 **** oldcontext = MemoryContextSwitchTo(sinfo->tmpcontext); /* ! * The strings passed to us are not null-terminated, but the datatype ! * input functions we're about to call require null termination. Copy the ! * strings and add null termination. As a micro-optimization, allocate ! * all the strings with one palloc. */ - pbuflen = nfields; /* count the null terminators themselves */ - for (i = 0; i < nfields; i++) - { - int len = columns[i].len; - - if (len > 0) - pbuflen += len; - } - pbuf = (char *) palloc(pbuflen); - for (i = 0; i < nfields; i++) { ! int len = columns[i].len; ! ! if (len < 0) ! cstrs[i] = NULL; else ! { ! cstrs[i] = pbuf; ! memcpy(pbuf, columns[i].value, len); ! pbuf += len; ! *pbuf++ = '\0'; ! } } /* Convert row to a tuple, and add it to the tuplestore */ ! tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs); tuplestore_puttuple(sinfo->tuplestore, tuple); /* Clean up */ MemoryContextSwitchTo(oldcontext); MemoryContextReset(sinfo->tmpcontext); - - return 1; } /* --- 1189,1212 ---- oldcontext = MemoryContextSwitchTo(sinfo->tmpcontext); /* ! * Fill cstrs with null-terminated strings of column values. */ for (i = 0; i < nfields; i++) { ! if (PQgetisnull(res, 0, i)) ! sinfo->cstrs[i] = NULL; else ! sinfo->cstrs[i] = PQgetvalue(res, 0, i); } /* Convert row to a tuple, and add it to the tuplestore */ ! tuple = BuildTupleFromCStrings(sinfo->attinmeta, sinfo->cstrs); tuplestore_puttuple(sinfo->tuplestore, tuple); /* Clean up */ MemoryContextSwitchTo(oldcontext); MemoryContextReset(sinfo->tmpcontext); } /* diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt index 1251455f1f6d92c74e206b5a9b8dcdeb36ac9b98..9d95e262be3fbf26731c0926013db56dbc4e00ab 100644 *** a/src/interfaces/libpq/exports.txt --- b/src/interfaces/libpq/exports.txt *************** PQconnectStartParams 157 *** 160,165 **** PQping 158 PQpingParams 159 PQlibVersion 160 ! PQsetRowProcessor 161 ! PQgetRowProcessor 162 ! PQskipResult 163 --- 160,163 ---- PQping 158 PQpingParams 159 PQlibVersion 160 ! PQsetSingleRowMode 161 diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index a32258a8cbab5e655484d0471c031864fa5084de..adaab7aaade60a980b7eaf5b6e9734444c72b930 100644 *** a/src/interfaces/libpq/fe-connect.c --- b/src/interfaces/libpq/fe-connect.c *************** makeEmptyPGconn(void) *** 2709,2716 **** /* Zero all pointers and booleans */ MemSet(conn, 0, sizeof(PGconn)); ! /* install default row processor and notice hooks */ ! PQsetRowProcessor(conn, NULL, NULL); conn->noticeHooks.noticeRec = defaultNoticeReceiver; conn->noticeHooks.noticeProc = defaultNoticeProcessor; --- 2709,2715 ---- /* Zero all pointers and booleans */ MemSet(conn, 0, sizeof(PGconn)); ! /* install default notice hooks */ conn->noticeHooks.noticeRec = defaultNoticeReceiver; conn->noticeHooks.noticeProc = defaultNoticeProcessor; *************** conninfo_uri_parse_options(PQconninfoOpt *** 4658,4664 **** if (p == host) { printfPQExpBuffer(errorMessage, ! libpq_gettext("IPv6 host address may not be empty in URI: \"%s\"\n"), uri); goto cleanup; } --- 4657,4663 ---- if (p == host) { printfPQExpBuffer(errorMessage, ! libpq_gettext("IPv6 host address may not be empty in URI: \"%s\"\n"), uri); goto cleanup; } *************** conninfo_uri_parse_params(char *params, *** 4878,4884 **** printfPQExpBuffer(errorMessage, libpq_gettext( ! "invalid URI query parameter: \"%s\"\n"), keyword); return false; } --- 4877,4883 ---- printfPQExpBuffer(errorMessage, libpq_gettext( ! "invalid URI query parameter: \"%s\"\n"), keyword); return false; } *************** conninfo_uri_decode(const char *str, PQE *** 4943,4949 **** if (!(get_hexdigit(*q++, &hi) && get_hexdigit(*q++, &lo))) { printfPQExpBuffer(errorMessage, ! libpq_gettext("invalid percent-encoded token: \"%s\"\n"), str); free(buf); return NULL; --- 4942,4948 ---- if (!(get_hexdigit(*q++, &hi) && get_hexdigit(*q++, &lo))) { printfPQExpBuffer(errorMessage, ! libpq_gettext("invalid percent-encoded token: \"%s\"\n"), str); free(buf); return NULL; *************** static void *** 5594,5601 **** dot_pg_pass_warning(PGconn *conn) { /* If it was 'invalid authorization', add .pgpass mention */ - if (conn->dot_pgpass_used && conn->password_needed && conn->result && /* only works with >= 9.0 servers */ strcmp(PQresultErrorField(conn->result, PG_DIAG_SQLSTATE), ERRCODE_INVALID_PASSWORD) == 0) { --- 5593,5600 ---- dot_pg_pass_warning(PGconn *conn) { /* If it was 'invalid authorization', add .pgpass mention */ /* only works with >= 9.0 servers */ + if (conn->dot_pgpass_used && conn->password_needed && conn->result && strcmp(PQresultErrorField(conn->result, PG_DIAG_SQLSTATE), ERRCODE_INVALID_PASSWORD) == 0) { diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c index badc0b32a8e8f3e33e42729a8f8899475a0de31b..53516db723492f11fc9f4bdc6d4d351693c4c22f 100644 *** a/src/interfaces/libpq/fe-exec.c --- b/src/interfaces/libpq/fe-exec.c *************** char *const pgresStatus[] = { *** 38,44 **** "PGRES_BAD_RESPONSE", "PGRES_NONFATAL_ERROR", "PGRES_FATAL_ERROR", ! "PGRES_COPY_BOTH" }; /* --- 38,45 ---- "PGRES_BAD_RESPONSE", "PGRES_NONFATAL_ERROR", "PGRES_FATAL_ERROR", ! "PGRES_COPY_BOTH", ! "PGRES_SINGLE_TUPLE" }; /* *************** static bool static_std_strings = false; *** 51,58 **** static PGEvent *dupEvents(PGEvent *events, int count); static bool pqAddTuple(PGresult *res, PGresAttValue *tup); - static int pqStdRowProcessor(PGresult *res, const PGdataValue *columns, - const char **errmsgp, void *param); static bool PQsendQueryStart(PGconn *conn); static int PQsendQueryGuts(PGconn *conn, const char *command, --- 52,57 ---- *************** static int PQsendQueryGuts(PGconn *conn, *** 64,71 **** const int *paramFormats, int resultFormat); static void parseInput(PGconn *conn); - static int dummyRowProcessor(PGresult *res, const PGdataValue *columns, - const char **errmsgp, void *param); static bool PQexecStart(PGconn *conn); static PGresult *PQexecFinish(PGconn *conn); static int PQsendDescribe(PGconn *conn, char desc_type, --- 63,68 ---- *************** PQmakeEmptyPGresult(PGconn *conn, ExecSt *** 181,186 **** --- 178,184 ---- case PGRES_COPY_OUT: case PGRES_COPY_IN: case PGRES_COPY_BOTH: + case PGRES_SINGLE_TUPLE: /* non-error cases */ break; default: *************** PQclear(PGresult *res) *** 698,703 **** --- 696,703 ---- /* * Handy subroutine to deallocate any partially constructed async result. + * + * Any "next" result gets cleared too. */ void pqClearAsyncResult(PGconn *conn) *************** pqClearAsyncResult(PGconn *conn) *** 705,710 **** --- 705,713 ---- if (conn->result) PQclear(conn->result); conn->result = NULL; + if (conn->next_result) + PQclear(conn->next_result); + conn->next_result = NULL; } /* *************** pqPrepareAsyncResult(PGconn *conn) *** 758,764 **** * conn->errorMessage. */ res = conn->result; - conn->result = NULL; /* handing over ownership to caller */ if (!res) res = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR); else --- 761,766 ---- *************** pqPrepareAsyncResult(PGconn *conn) *** 771,776 **** --- 773,788 ---- appendPQExpBufferStr(&conn->errorMessage, PQresultErrorMessage(res)); } + + /* + * Replace conn->result with next_result, if any. In the normal case + * there isn't a next result and we're just dropping ownership of the + * current result. In single-row mode this restores the situation to what + * it was before we created the current single-row result. + */ + conn->result = conn->next_result; + conn->next_result = NULL; + return res; } *************** pqSaveParameterStatus(PGconn *conn, cons *** 981,1065 **** /* ! * PQsetRowProcessor ! * Set function that copies row data out from the network buffer, ! * along with a passthrough parameter for it. ! */ ! void ! PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param) ! { ! if (!conn) ! return; ! ! if (func) ! { ! /* set custom row processor */ ! conn->rowProcessor = func; ! conn->rowProcessorParam = param; ! } ! else ! { ! /* set default row processor */ ! conn->rowProcessor = pqStdRowProcessor; ! conn->rowProcessorParam = conn; ! } ! } ! ! /* ! * PQgetRowProcessor ! * Get current row processor of PGconn. ! * If param is not NULL, also store the passthrough parameter at *param. ! */ ! PQrowProcessor ! PQgetRowProcessor(const PGconn *conn, void **param) ! { ! if (!conn) ! { ! if (param) ! *param = NULL; ! return NULL; ! } ! ! if (param) ! *param = conn->rowProcessorParam; ! return conn->rowProcessor; ! } ! ! /* ! * pqStdRowProcessor ! * Add the received row to the PGresult structure ! * Returns 1 if OK, -1 if error occurred. * ! * Note: "param" should point to the PGconn, but we don't actually need that ! * as of the current coding. */ ! static int ! pqStdRowProcessor(PGresult *res, const PGdataValue *columns, ! const char **errmsgp, void *param) { int nfields = res->numAttributes; PGresAttValue *tup; int i; ! if (columns == NULL) { ! /* New result set ... we have nothing to do in this function. */ ! return 1; } /* * Basically we just allocate space in the PGresult for each field and * copy the data over. * ! * Note: on malloc failure, we return -1 leaving *errmsgp still NULL, ! * which caller will take to mean "out of memory". This is preferable to ! * trying to set up such a message here, because evidently there's not ! * enough memory for gettext() to do anything. */ tup = (PGresAttValue *) pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE); if (tup == NULL) ! return -1; for (i = 0; i < nfields; i++) { --- 993,1047 ---- /* ! * pqRowProcessor ! * Add the received row to the current async result (conn->result). ! * Returns 1 if OK, 0 if error occurred. * ! * On error, *errmsgp can be set to an error string to be returned. ! * If it is left NULL, the error is presumed to be "out of memory". ! * ! * In single-row mode, we create a new result holding just the current row, ! * stashing the previous result in conn->next_result so that it becomes ! * active again after pqPrepareAsyncResult(). This allows the result metadata ! * (column descriptions) to be carried forward to each result row. */ ! int ! pqRowProcessor(PGconn *conn, const char **errmsgp) { + PGresult *res = conn->result; int nfields = res->numAttributes; + const PGdataValue *columns = conn->rowBuf; PGresAttValue *tup; int i; ! /* ! * In single-row mode, make a new PGresult that will hold just this one ! * row; the original conn->result is left unchanged so that it can be used ! * again as the template for future rows. ! */ ! if (conn->singleRowMode) { ! /* Copy everything that should be in the result at this point */ ! res = PQcopyResult(res, ! PG_COPYRES_ATTRS | PG_COPYRES_EVENTS | ! PG_COPYRES_NOTICEHOOKS); ! if (!res) ! return 0; } /* * Basically we just allocate space in the PGresult for each field and * copy the data over. * ! * Note: on malloc failure, we return 0 leaving *errmsgp still NULL, which ! * caller will take to mean "out of memory". This is preferable to trying ! * to set up such a message here, because evidently there's not enough ! * memory for gettext() to do anything. */ tup = (PGresAttValue *) pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE); if (tup == NULL) ! goto fail; for (i = 0; i < nfields; i++) { *************** pqStdRowProcessor(PGresult *res, const P *** 1078,1084 **** val = (char *) pqResultAlloc(res, clen + 1, isbinary); if (val == NULL) ! return -1; /* copy and zero-terminate the data (even if it's binary) */ memcpy(val, columns[i].value, clen); --- 1060,1066 ---- val = (char *) pqResultAlloc(res, clen + 1, isbinary); if (val == NULL) ! goto fail; /* copy and zero-terminate the data (even if it's binary) */ memcpy(val, columns[i].value, clen); *************** pqStdRowProcessor(PGresult *res, const P *** 1091,1100 **** /* And add the tuple to the PGresult's tuple array */ if (!pqAddTuple(res, tup)) ! return -1; - /* Success */ return 1; } --- 1073,1102 ---- /* And add the tuple to the PGresult's tuple array */ if (!pqAddTuple(res, tup)) ! goto fail; ! ! /* ! * Success. In single-row mode, make the result available to the client ! * immediately. ! */ ! if (conn->singleRowMode) ! { ! /* Change result status to special single-row value */ ! res->resultStatus = PGRES_SINGLE_TUPLE; ! /* Stash old result for re-use later */ ! conn->next_result = conn->result; ! conn->result = res; ! /* And mark the result ready to return */ ! conn->asyncStatus = PGASYNC_READY; ! } return 1; + + fail: + /* release locally allocated PGresult, if we made one */ + if (res != conn->result) + PQclear(res); + return 0; } *************** PQsendQueryStart(PGconn *conn) *** 1343,1348 **** --- 1345,1354 ---- /* initialize async result-accumulation state */ conn->result = NULL; + conn->next_result = NULL; + + /* reset single-row processing mode */ + conn->singleRowMode = false; /* ready to send command message */ return true; *************** pqHandleSendFailure(PGconn *conn) *** 1548,1553 **** --- 1554,1584 ---- } /* + * Select row-by-row processing mode + */ + int + PQsetSingleRowMode(PGconn *conn) + { + /* + * Only allow setting the flag when we have launched a query and not yet + * received any results. + */ + if (!conn) + return 0; + if (conn->asyncStatus != PGASYNC_BUSY) + return 0; + if (conn->queryclass != PGQUERY_SIMPLE && + conn->queryclass != PGQUERY_EXTENDED) + return 0; + if (conn->result) + return 0; + + /* OK, set flag */ + conn->singleRowMode = true; + return 1; + } + + /* * Consume any available input from the backend * 0 return: some kind of trouble * 1 return: no problem *************** PQconsumeInput(PGconn *conn) *** 1587,1595 **** * parseInput: if appropriate, parse input data from backend * until input is exhausted or a stopping state is reached. * Note that this function will NOT attempt to read more data from the backend. - * - * Note: callers of parseInput must be prepared for a longjmp exit when we are - * in PGASYNC_BUSY state, since an external row processor might do that. */ static void parseInput(PGconn *conn) --- 1618,1623 ---- *************** PQgetResult(PGconn *conn) *** 1737,1785 **** return res; } - /* - * PQskipResult - * Get the next PGresult produced by a query, but discard any data rows. - * - * This is mainly useful for cleaning up after a longjmp out of a row - * processor, when resuming processing of the current query result isn't - * wanted. Note that this is of little value in an async-style application, - * since any preceding calls to PQisBusy would have already called the regular - * row processor. - */ - PGresult * - PQskipResult(PGconn *conn) - { - PGresult *res; - PQrowProcessor savedRowProcessor; - - if (!conn) - return NULL; - - /* temporarily install dummy row processor */ - savedRowProcessor = conn->rowProcessor; - conn->rowProcessor = dummyRowProcessor; - /* no need to save/change rowProcessorParam */ - - /* fetch the next result */ - res = PQgetResult(conn); - - /* restore previous row processor */ - conn->rowProcessor = savedRowProcessor; - - return res; - } - - /* - * Do-nothing row processor for PQskipResult - */ - static int - dummyRowProcessor(PGresult *res, const PGdataValue *columns, - const char **errmsgp, void *param) - { - return 1; - } - /* * PQexec --- 1765,1770 ---- *************** PQexecStart(PGconn *conn) *** 1886,1892 **** * Silently discard any prior query result that application didn't eat. * This is probably poor design, but it's here for backward compatibility. */ ! while ((result = PQskipResult(conn)) != NULL) { ExecStatusType resultStatus = result->resultStatus; --- 1871,1877 ---- * Silently discard any prior query result that application didn't eat. * This is probably poor design, but it's here for backward compatibility. */ ! while ((result = PQgetResult(conn)) != NULL) { ExecStatusType resultStatus = result->resultStatus; diff --git a/src/interfaces/libpq/fe-lobj.c b/src/interfaces/libpq/fe-lobj.c index 13fd98c2f913d3818758e75bac96822306981b52..f3a6d0341c13ce644a7f1b2f15919385b85d6a6e 100644 *** a/src/interfaces/libpq/fe-lobj.c --- b/src/interfaces/libpq/fe-lobj.c *************** lo_initialize(PGconn *conn) *** 682,689 **** int n; const char *query; const char *fname; - PQrowProcessor savedRowProcessor; - void *savedRowProcessorParam; Oid foid; if (!conn) --- 682,687 ---- *************** lo_initialize(PGconn *conn) *** 732,747 **** "or proname = 'loread' " "or proname = 'lowrite'"; - /* Ensure the standard row processor is used to collect the result */ - savedRowProcessor = conn->rowProcessor; - savedRowProcessorParam = conn->rowProcessorParam; - PQsetRowProcessor(conn, NULL, NULL); - res = PQexec(conn, query); - - conn->rowProcessor = savedRowProcessor; - conn->rowProcessorParam = savedRowProcessorParam; - if (res == NULL) { free(lobjfuncs); --- 730,736 ---- diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c index 8dbd6b6982395850fc167eb20cb8d5590d33122f..1ba5885cd3b419a97f4fc5ea897eac209f364897 100644 *** a/src/interfaces/libpq/fe-protocol2.c --- b/src/interfaces/libpq/fe-protocol2.c *************** static int getNotify(PGconn *conn); *** 49,67 **** PostgresPollingStatusType pqSetenvPoll(PGconn *conn) { - PostgresPollingStatusType result; PGresult *res; - PQrowProcessor savedRowProcessor; - void *savedRowProcessorParam; if (conn == NULL || conn->status == CONNECTION_BAD) return PGRES_POLLING_FAILED; - /* Ensure the standard row processor is used to collect any results */ - savedRowProcessor = conn->rowProcessor; - savedRowProcessorParam = conn->rowProcessorParam; - PQsetRowProcessor(conn, NULL, NULL); - /* Check whether there are any data for us */ switch (conn->setenv_state) { --- 49,59 ---- *************** pqSetenvPoll(PGconn *conn) *** 77,86 **** if (n < 0) goto error_return; if (n == 0) ! { ! result = PGRES_POLLING_READING; ! goto normal_return; ! } break; } --- 69,75 ---- if (n < 0) goto error_return; if (n == 0) ! return PGRES_POLLING_READING; break; } *************** pqSetenvPoll(PGconn *conn) *** 94,101 **** /* Should we raise an error if called when not active? */ case SETENV_STATE_IDLE: ! result = PGRES_POLLING_OK; ! goto normal_return; default: printfPQExpBuffer(&conn->errorMessage, --- 83,89 ---- /* Should we raise an error if called when not active? */ case SETENV_STATE_IDLE: ! return PGRES_POLLING_OK; default: printfPQExpBuffer(&conn->errorMessage, *************** pqSetenvPoll(PGconn *conn) *** 192,201 **** case SETENV_STATE_CLIENT_ENCODING_WAIT: { if (PQisBusy(conn)) ! { ! result = PGRES_POLLING_READING; ! goto normal_return; ! } res = PQgetResult(conn); --- 180,186 ---- case SETENV_STATE_CLIENT_ENCODING_WAIT: { if (PQisBusy(conn)) ! return PGRES_POLLING_READING; res = PQgetResult(conn); *************** pqSetenvPoll(PGconn *conn) *** 220,229 **** case SETENV_STATE_OPTION_WAIT: { if (PQisBusy(conn)) ! { ! result = PGRES_POLLING_READING; ! goto normal_return; ! } res = PQgetResult(conn); --- 205,211 ---- case SETENV_STATE_OPTION_WAIT: { if (PQisBusy(conn)) ! return PGRES_POLLING_READING; res = PQgetResult(conn); *************** pqSetenvPoll(PGconn *conn) *** 262,278 **** goto error_return; conn->setenv_state = SETENV_STATE_QUERY1_WAIT; ! result = PGRES_POLLING_READING; ! goto normal_return; } case SETENV_STATE_QUERY1_WAIT: { if (PQisBusy(conn)) ! { ! result = PGRES_POLLING_READING; ! goto normal_return; ! } res = PQgetResult(conn); --- 244,256 ---- goto error_return; conn->setenv_state = SETENV_STATE_QUERY1_WAIT; ! return PGRES_POLLING_READING; } case SETENV_STATE_QUERY1_WAIT: { if (PQisBusy(conn)) ! return PGRES_POLLING_READING; res = PQgetResult(conn); *************** pqSetenvPoll(PGconn *conn) *** 349,365 **** goto error_return; conn->setenv_state = SETENV_STATE_QUERY2_WAIT; ! result = PGRES_POLLING_READING; ! goto normal_return; } case SETENV_STATE_QUERY2_WAIT: { if (PQisBusy(conn)) ! { ! result = PGRES_POLLING_READING; ! goto normal_return; ! } res = PQgetResult(conn); --- 327,339 ---- goto error_return; conn->setenv_state = SETENV_STATE_QUERY2_WAIT; ! return PGRES_POLLING_READING; } case SETENV_STATE_QUERY2_WAIT: { if (PQisBusy(conn)) ! return PGRES_POLLING_READING; res = PQgetResult(conn); *************** pqSetenvPoll(PGconn *conn) *** 406,413 **** { /* Query finished, so we're done */ conn->setenv_state = SETENV_STATE_IDLE; ! result = PGRES_POLLING_OK; ! goto normal_return; } break; } --- 380,386 ---- { /* Query finished, so we're done */ conn->setenv_state = SETENV_STATE_IDLE; ! return PGRES_POLLING_OK; } break; } *************** pqSetenvPoll(PGconn *conn) *** 425,436 **** error_return: conn->setenv_state = SETENV_STATE_IDLE; ! result = PGRES_POLLING_FAILED; ! ! normal_return: ! conn->rowProcessor = savedRowProcessor; ! conn->rowProcessorParam = savedRowProcessorParam; ! return result; } --- 398,404 ---- error_return: conn->setenv_state = SETENV_STATE_IDLE; ! return PGRES_POLLING_FAILED; } *************** normal_return: *** 438,446 **** * parseInput: if appropriate, parse input data from backend * until input is exhausted or a stopping state is reached. * Note that this function will NOT attempt to read more data from the backend. - * - * Note: callers of parseInput must be prepared for a longjmp exit when we are - * in PGASYNC_BUSY state, since an external row processor might do that. */ void pqParseInput2(PGconn *conn) --- 406,411 ---- *************** getRowDescriptions(PGconn *conn) *** 746,776 **** /* Success! */ conn->result = result; ! /* ! * Advance inStart to show that the "T" message has been processed. We ! * must do this before calling the row processor, in case it longjmps. ! */ conn->inStart = conn->inCursor; ! /* Give the row processor a chance to initialize for new result set */ ! errmsg = NULL; ! switch ((*conn->rowProcessor) (result, NULL, &errmsg, ! conn->rowProcessorParam)) ! { ! case 1: ! /* everything is good */ ! return 0; ! ! case -1: ! /* error, report the errmsg below */ ! break; ! default: ! /* unrecognized return code */ ! errmsg = libpq_gettext("unrecognized return value from row processor"); ! break; ! } ! goto set_error_result; advance_and_error: --- 711,726 ---- /* Success! */ conn->result = result; ! /* Advance inStart to show that the "T" message has been processed. */ conn->inStart = conn->inCursor; ! /* ! * We could perform additional setup for the new result set here, but for ! * now there's nothing else to do. ! */ ! /* And we're done. */ ! return 0; advance_and_error: *************** advance_and_error: *** 781,788 **** */ conn->inStart = conn->inEnd; - set_error_result: - /* * Replace partially constructed result with an error result. First * discard the old result to try to win back some memory. --- 731,736 ---- *************** set_error_result: *** 790,796 **** pqClearAsyncResult(conn); /* ! * If row processor didn't provide an error message, assume "out of * memory" was meant. The advantage of having this special case is that * freeing the old result first greatly improves the odds that gettext() * will succeed in providing a translation. --- 738,744 ---- pqClearAsyncResult(conn); /* ! * If preceding code didn't provide an error message, assume "out of * memory" was meant. The advantage of having this special case is that * freeing the old result first greatly improves the odds that gettext() * will succeed in providing a translation. *************** getAnotherTuple(PGconn *conn, bool binar *** 937,967 **** free(bitmap); bitmap = NULL; ! /* ! * Advance inStart to show that the "D" message has been processed. We ! * must do this before calling the row processor, in case it longjmps. ! */ conn->inStart = conn->inCursor; ! /* Pass the completed row values to rowProcessor */ errmsg = NULL; ! switch ((*conn->rowProcessor) (result, rowbuf, &errmsg, ! conn->rowProcessorParam)) ! { ! case 1: ! /* everything is good */ ! return 0; ! ! case -1: ! /* error, report the errmsg below */ ! break; ! default: ! /* unrecognized return code */ ! errmsg = libpq_gettext("unrecognized return value from row processor"); ! break; ! } ! goto set_error_result; advance_and_error: --- 885,899 ---- free(bitmap); bitmap = NULL; ! /* Advance inStart to show that the "D" message has been processed. */ conn->inStart = conn->inCursor; ! /* Process the collected row */ errmsg = NULL; ! if (pqRowProcessor(conn, &errmsg)) ! return 0; /* normal, successful exit */ ! goto set_error_result; /* pqRowProcessor failed, report it */ advance_and_error: *************** set_error_result: *** 981,987 **** pqClearAsyncResult(conn); /* ! * If row processor didn't provide an error message, assume "out of * memory" was meant. The advantage of having this special case is that * freeing the old result first greatly improves the odds that gettext() * will succeed in providing a translation. --- 913,919 ---- pqClearAsyncResult(conn); /* ! * If preceding code didn't provide an error message, assume "out of * memory" was meant. The advantage of having this special case is that * freeing the old result first greatly improves the odds that gettext() * will succeed in providing a translation. diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c index 173af2e0a79ef3b443de515cda851e277deaed2d..d289f82285fea00d5de20542e43ea103493f9e58 100644 *** a/src/interfaces/libpq/fe-protocol3.c --- b/src/interfaces/libpq/fe-protocol3.c *************** static int build_startup_packet(const PG *** 61,69 **** * parseInput: if appropriate, parse input data from backend * until input is exhausted or a stopping state is reached. * Note that this function will NOT attempt to read more data from the backend. - * - * Note: callers of parseInput must be prepared for a longjmp exit when we are - * in PGASYNC_BUSY state, since an external row processor might do that. */ void pqParseInput3(PGconn *conn) --- 61,66 ---- *************** handleSyncLoss(PGconn *conn, char id, in *** 446,455 **** * Returns: 0 if processed message successfully, EOF to suspend parsing * (the latter case is not actually used currently). * In either case, conn->inStart has been advanced past the message. - * - * Note: the row processor could also choose to longjmp out of libpq, - * in which case the library's state must allow for resumption at the - * next message. */ static int getRowDescriptions(PGconn *conn, int msgLength) --- 443,448 ---- *************** getRowDescriptions(PGconn *conn, int msg *** 564,573 **** /* Success! */ conn->result = result; ! /* ! * Advance inStart to show that the "T" message has been processed. We ! * must do this before calling the row processor, in case it longjmps. ! */ conn->inStart = conn->inCursor; /* --- 557,563 ---- /* Success! */ conn->result = result; ! /* Advance inStart to show that the "T" message has been processed. */ conn->inStart = conn->inCursor; /* *************** getRowDescriptions(PGconn *conn, int msg *** 580,604 **** return 0; } ! /* Give the row processor a chance to initialize for new result set */ ! errmsg = NULL; ! switch ((*conn->rowProcessor) (result, NULL, &errmsg, ! conn->rowProcessorParam)) ! { ! case 1: ! /* everything is good */ ! return 0; ! ! case -1: ! /* error, report the errmsg below */ ! break; ! default: ! /* unrecognized return code */ ! errmsg = libpq_gettext("unrecognized return value from row processor"); ! break; ! } ! goto set_error_result; advance_and_error: /* Discard unsaved result, if any */ --- 570,582 ---- return 0; } ! /* ! * We could perform additional setup for the new result set here, but for ! * now there's nothing else to do. ! */ ! /* And we're done. */ ! return 0; advance_and_error: /* Discard unsaved result, if any */ *************** advance_and_error: *** 608,615 **** /* Discard the failed message by pretending we read it */ conn->inStart += 5 + msgLength; - set_error_result: - /* * Replace partially constructed result with an error result. First * discard the old result to try to win back some memory. --- 586,591 ---- *************** set_error_result: *** 617,624 **** pqClearAsyncResult(conn); /* ! * If row processor didn't provide an error message, assume "out of ! * memory" was meant. */ if (!errmsg) errmsg = libpq_gettext("out of memory for query result"); --- 593,602 ---- pqClearAsyncResult(conn); /* ! * If preceding code didn't provide an error message, assume "out of ! * memory" was meant. The advantage of having this special case is that ! * freeing the old result first greatly improves the odds that gettext() ! * will succeed in providing a translation. */ if (!errmsg) errmsg = libpq_gettext("out of memory for query result"); *************** failure: *** 695,704 **** * Returns: 0 if processed message successfully, EOF to suspend parsing * (the latter case is not actually used currently). * In either case, conn->inStart has been advanced past the message. - * - * Note: the row processor could also choose to longjmp out of libpq, - * in which case the library's state must allow for resumption at the - * next message. */ static int getAnotherTuple(PGconn *conn, int msgLength) --- 673,678 ---- *************** getAnotherTuple(PGconn *conn, int msgLen *** 778,808 **** goto advance_and_error; } ! /* ! * Advance inStart to show that the "D" message has been processed. We ! * must do this before calling the row processor, in case it longjmps. ! */ conn->inStart = conn->inCursor; ! /* Pass the completed row values to rowProcessor */ errmsg = NULL; ! switch ((*conn->rowProcessor) (result, rowbuf, &errmsg, ! conn->rowProcessorParam)) ! { ! case 1: ! /* everything is good */ ! return 0; ! ! case -1: ! /* error, report the errmsg below */ ! break; ! default: ! /* unrecognized return code */ ! errmsg = libpq_gettext("unrecognized return value from row processor"); ! break; ! } ! goto set_error_result; advance_and_error: /* Discard the failed message by pretending we read it */ --- 752,766 ---- goto advance_and_error; } ! /* Advance inStart to show that the "D" message has been processed. */ conn->inStart = conn->inCursor; ! /* Process the collected row */ errmsg = NULL; ! if (pqRowProcessor(conn, &errmsg)) ! return 0; /* normal, successful exit */ ! goto set_error_result; /* pqRowProcessor failed, report it */ advance_and_error: /* Discard the failed message by pretending we read it */ *************** set_error_result: *** 817,823 **** pqClearAsyncResult(conn); /* ! * If row processor didn't provide an error message, assume "out of * memory" was meant. The advantage of having this special case is that * freeing the old result first greatly improves the odds that gettext() * will succeed in providing a translation. --- 775,781 ---- pqClearAsyncResult(conn); /* ! * If preceding code didn't provide an error message, assume "out of * memory" was meant. The advantage of having this special case is that * freeing the old result first greatly improves the odds that gettext() * will succeed in providing a translation. diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h index 67db6119bbaa35ae31fb58bb902a455b093ea23f..9d05dd20605a84ab4c4ec9d61ef8697fb2f3b77e 100644 *** a/src/interfaces/libpq/libpq-fe.h --- b/src/interfaces/libpq/libpq-fe.h *************** typedef enum *** 90,96 **** * backend */ PGRES_NONFATAL_ERROR, /* notice or warning message */ PGRES_FATAL_ERROR, /* query failed */ ! PGRES_COPY_BOTH /* Copy In/Out data transfer in progress */ } ExecStatusType; typedef enum --- 90,97 ---- * backend */ PGRES_NONFATAL_ERROR, /* notice or warning message */ PGRES_FATAL_ERROR, /* query failed */ ! PGRES_COPY_BOTH, /* Copy In/Out data transfer in progress */ ! PGRES_SINGLE_TUPLE /* single tuple from larger resultset */ } ExecStatusType; typedef enum *************** typedef struct pg_conn PGconn; *** 129,145 **** */ typedef struct pg_result PGresult; - /* PGdataValue represents a data field value being passed to a row processor. - * It could be either text or binary data; text data is not zero-terminated. - * A SQL NULL is represented by len < 0; then value is still valid but there - * are no data bytes there. - */ - typedef struct pgDataValue - { - int len; /* data length in bytes, or <0 if NULL */ - const char *value; /* data value, without zero-termination */ - } PGdataValue; - /* PGcancel encapsulates the information needed to cancel a running * query on an existing connection. * The contents of this struct are not supposed to be known to applications. --- 130,135 ---- *************** typedef struct pgNotify *** 161,170 **** struct pgNotify *next; /* list link */ } PGnotify; - /* Function type for row-processor callback */ - typedef int (*PQrowProcessor) (PGresult *res, const PGdataValue *columns, - const char **errmsgp, void *param); - /* Function types for notice-handling callbacks */ typedef void (*PQnoticeReceiver) (void *arg, const PGresult *res); typedef void (*PQnoticeProcessor) (void *arg, const char *message); --- 151,156 ---- *************** extern int PQsendQueryPrepared(PGconn *c *** 403,419 **** const int *paramLengths, const int *paramFormats, int resultFormat); extern PGresult *PQgetResult(PGconn *conn); - extern PGresult *PQskipResult(PGconn *conn); /* Routines for managing an asynchronous query */ extern int PQisBusy(PGconn *conn); extern int PQconsumeInput(PGconn *conn); - /* Override default per-row processing */ - extern void PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param); - extern PQrowProcessor PQgetRowProcessor(const PGconn *conn, void **param); - /* LISTEN/NOTIFY support */ extern PGnotify *PQnotifies(PGconn *conn); --- 389,401 ---- const int *paramLengths, const int *paramFormats, int resultFormat); + extern int PQsetSingleRowMode(PGconn *conn); extern PGresult *PQgetResult(PGconn *conn); /* Routines for managing an asynchronous query */ extern int PQisBusy(PGconn *conn); extern int PQconsumeInput(PGconn *conn); /* LISTEN/NOTIFY support */ extern PGnotify *PQnotifies(PGconn *conn); diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h index 4bc89269fababe5e4d8ecbf6e80ca1a8625d4bd5..2bac59c3d879ecabce42ceab3b5133df03a0886a 100644 *** a/src/interfaces/libpq/libpq-int.h --- b/src/interfaces/libpq/libpq-int.h *************** typedef struct pgLobjfuncs *** 277,282 **** --- 277,293 ---- Oid fn_lo_write; /* OID of backend function LOwrite */ } PGlobjfuncs; + /* PGdataValue represents a data field value being passed to a row processor. + * It could be either text or binary data; text data is not zero-terminated. + * A SQL NULL is represented by len < 0; then value is still valid but there + * are no data bytes there. + */ + typedef struct pgDataValue + { + int len; /* data length in bytes, or <0 if NULL */ + const char *value; /* data value, without zero-termination */ + } PGdataValue; + /* * PGconn stores all the state data associated with a single connection * to a backend. *************** struct pg_conn *** 324,333 **** /* Optional file to write trace info to */ FILE *Pfdebug; - /* Callback procedure for per-row processing */ - PQrowProcessor rowProcessor; /* function pointer */ - void *rowProcessorParam; /* passthrough argument */ - /* Callback procedures for notice message processing */ PGNoticeHooks noticeHooks; --- 335,340 ---- *************** struct pg_conn *** 346,351 **** --- 353,359 ---- bool options_valid; /* true if OK to attempt connection */ bool nonblocking; /* whether this connection is using nonblock * sending semantics */ + bool singleRowMode; /* return current query result row-by-row? */ char copy_is_binary; /* 1 = copy binary, 0 = copy text */ int copy_already_done; /* # bytes already returned in COPY * OUT */ *************** struct pg_conn *** 406,411 **** --- 414,420 ---- /* Status for asynchronous result construction */ PGresult *result; /* result being constructed */ + PGresult *next_result; /* next result (used in single-row mode) */ /* Assorted state for SSL, GSS, etc */ *************** extern void pqSaveMessageField(PGresult *** 517,522 **** --- 526,532 ---- const char *value); extern void pqSaveParameterStatus(PGconn *conn, const char *name, const char *value); + extern int pqRowProcessor(PGconn *conn, const char **errmsgp); extern void pqHandleSendFailure(PGconn *conn); /* === in fe-protocol2.c === */
On Thu, Aug 2, 2012 at 8:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Marko Kreen <markokr@gmail.com> writes: >> On Wed, Aug 1, 2012 at 6:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> So I'm working from the first set of patches in your message >>> <20120721194907.GA28021@gmail.com>. > >> Great! > > Here's an updated draft patch. I've not looked at the docs yet so > this doesn't include that, but I'm reasonably happy with the code > changes now. The main difference from what you had is that I pushed > the creation of the single-row PGresults into pqRowProcessor, so that > they're formed immediately while scanning the input message. That > other method with postponing examination of the rowBuf does not work, > any more than it did with the original patch, because you can't assume > that the network input buffer is going to hold still. For example, > calling PQconsumeInput after parseInput has absorbed a row-data message > but before calling PQgetResult would likely break things. > > In principle I suppose we could hack PQconsumeInput enough that it > didn't damage the row buffer while still meeting its API contract of > clearing the read-ready condition on the socket; but it wouldn't be > simple or cheap to do so. Any code using single-row-mode is new. Any code calling consumeInput before fully draining rows from buffer is buggy, as it allows unlimited growth of network buffer, which breaks whole reason to use single-row mode. It was indeed bug in previous patch that it did not handle this situation, but IMHO it should be handled with error and not with allowing such code to work. Previously, whatever sequence the fetch functions were called, the apps max memory usage was either 1x resultset size, or max 2x resultset size, if it messed the sequence somehow. But no more. So it app knew that resultset was big, it needed to split it up. Now with single-row-mode, the apps can assume their max memory usage is 1*row size + network buffer, lets simplify that to 2x row size. But no more. And now they can process huge resultsets assuming their memory usage will be no more than 2x row size. And now libpq allows such apps to go from 2x row size to full resultset size with unfortunate coding. Why? I have opinions about reorg details too, but that's mostly matter of taste, so rather unimportant compared to question whether we should allow code to break the guarantees the single-row-mode gives. -- marko
Marko Kreen <markokr@gmail.com> writes: > On Thu, Aug 2, 2012 at 8:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> In principle I suppose we could hack PQconsumeInput enough that it >> didn't damage the row buffer while still meeting its API contract of >> clearing the read-ready condition on the socket; but it wouldn't be >> simple or cheap to do so. > Any code using single-row-mode is new. Any code calling consumeInput > before fully draining rows from buffer is buggy, as it allows unlimited growth > of network buffer, which breaks whole reason to use single-row mode. Sorry, that argument will not fly. The API specification for PQconsumeInput is that it can be called at any time to drain the kernel input buffer, without changing the logical state of the PGconn. It's not acceptable to insert a parenthetical "oh, but you'd better not try it in single-row mode" caveat. The reason I find this of importance is that PQconsumeInput is meant to be used in an application's main event loop for the sole purpose of clearing the read-ready condition on the connection's socket, so that it can wait for other conditions. This could very well be decoupled from the logic that is involved in testing PQisBusy and fetching/processing actual results. (If we had not intended to allow those activities to be decoupled, we'd not have separated PQconsumeInput and PQisBusy in the first place.) Introducing a dependency between these things could lead to non-reproducible, timing-dependent, very hard-to-find bugs. By the same token, if somebody were trying to put single-row-mode support into an existing async application, they'd be fooling with the parts that issue new queries and collect results, but probably not with the main event loop. Thus, I do not believe your argument above that "any code using single-row mode is new". The whole app is not going to be new. It could be very hard to guarantee that there is no path of control that invokes PQconsumeInput before the new data is actually processed, because there was never before a reason to avoid extra PQconsumeInput calls. As I said, this is the exact same objection I had to the original scheme of exposing the row buffer directly. Putting a libpq function in front of the row buffer doesn't solve the problem if that function is implemented in a way that's still vulnerable to misuse or race conditions. > And now libpq allows such apps to go from 2x row size to full resultset > size with unfortunate coding. Why? This argument is completely nonsensical. The contract for PQconsumeInput has always been that it consumes whatever the kernel has available. If you don't extract data from the library before calling it again, the library's internal buffer may grow to more than the minimum workable size, but that's a tradeoff you may be willing to make to simplify your application logic. I do not think that it's an improvement to change the API spec to say either that you can't call PQconsumeInput in certain states, or that you can but it won't absorb data from the kernel. regards, tom lane
On Fri, Aug 3, 2012 at 12:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Marko Kreen <markokr@gmail.com> writes: >> On Thu, Aug 2, 2012 at 8:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> In principle I suppose we could hack PQconsumeInput enough that it >>> didn't damage the row buffer while still meeting its API contract of >>> clearing the read-ready condition on the socket; but it wouldn't be >>> simple or cheap to do so. > >> Any code using single-row-mode is new. Any code calling consumeInput >> before fully draining rows from buffer is buggy, as it allows unlimited growth >> of network buffer, which breaks whole reason to use single-row mode. > > Sorry, that argument will not fly. The API specification for > PQconsumeInput is that it can be called at any time to drain the kernel > input buffer, without changing the logical state of the PGconn. It's > not acceptable to insert a parenthetical "oh, but you'd better not try > it in single-row mode" caveat. Well, if the old API contract must be kept, then so be it. > The reason I find this of importance is that PQconsumeInput is meant to > be used in an application's main event loop for the sole purpose of > clearing the read-ready condition on the connection's socket, so that > it can wait for other conditions. This could very well be decoupled > from the logic that is involved in testing PQisBusy and > fetching/processing actual results. (If we had not intended to allow > those activities to be decoupled, we'd not have separated PQconsumeInput > and PQisBusy in the first place.) Introducing a dependency between > these things could lead to non-reproducible, timing-dependent, very > hard-to-find bugs. > > By the same token, if somebody were trying to put single-row-mode > support into an existing async application, they'd be fooling with the > parts that issue new queries and collect results, but probably not with > the main event loop. Thus, I do not believe your argument above that > "any code using single-row mode is new". The whole app is not going to > be new. It could be very hard to guarantee that there is no path of > control that invokes PQconsumeInput before the new data is actually > processed, because there was never before a reason to avoid extra > PQconsumeInput calls. I've thought about this. On first glance indeed, if async app has both reader and writer registered in event loop, it might be simpler to keep reading from source socket, event if writer stalls. But this is exactly the situation where error from consumeInput would help. Imagine fast source and slow target (common scenario - a database query for each row). Reading from source *must* be stopped to get predictable memory usage,. > As I said, this is the exact same objection I had to the original scheme > of exposing the row buffer directly. Putting a libpq function in front > of the row buffer doesn't solve the problem if that function is > implemented in a way that's still vulnerable to misuse or race conditions. > >> And now libpq allows such apps to go from 2x row size to full resultset >> size with unfortunate coding. Why? > > This argument is completely nonsensical. The contract for > PQconsumeInput has always been that it consumes whatever the kernel has > available. If you don't extract data from the library before calling it > again, the library's internal buffer may grow to more than the minimum > workable size, but that's a tradeoff you may be willing to make to > simplify your application logic. I do not think that it's an > improvement to change the API spec to say either that you can't call > PQconsumeInput in certain states, or that you can but it won't absorb > data from the kernel. Old patch had a nice property that we could replace PQgetResult() with something else. To allow that and also PQconsumeInput(), we could store offsets in rowBuf, and then skip realign in PQconsumeInput(). Not sure if the replacement reason is worth keeping, but the offsets may be useful even now as they might give additional speedup as they decrease the per-column storage from 16 to 8 bytes on 64-bit architectures. May be better left for 9.3 though... -- marko