Thread: libpq custom row processing
There is an item in current commitfest that might be interesting to Psycopg - a way to provide custom callback to result row processing. By default, libpq takes row data from network buffer and puts it into PGresult. This can now be overrided - column data can be converted immediately to final format and even processed in-flight: https://commitfest.postgresql.org/action/patch_view?id=769 Attached is current state of the doc. Any feedback is appreciated, either about API or documentation clarity/typos or whether all interesting scenarios are handled. -- marko
Attachment
On 14/02/12 13:11, Marko Kreen wrote: > There is an item in current commitfest that might be interesting > to Psycopg - a way to provide custom callback to result > row processing. By default, libpq takes row data from > network buffer and puts it into PGresult. This can now > be overrided - column data can be converted immediately > to final format and even processed in-flight: > > https://commitfest.postgresql.org/action/patch_view?id=769 > > Attached is current state of the doc. > > Any feedback is appreciated, either about API or > documentation clarity/typos or whether all interesting > scenarios are handled. Extremely interesting. Thank you very much for this link. federico
Hi All - I am configuring a Mac laptop to use for coding, and of course I wanted psycopg2 right away.. But the more I look into this, the more questions are raised.. I do not want to use the hack to turn off 64-bit python.. why ?? If I am not using Python 3 than at least I should be able to keep 64 bit, no? The system Postgres is well hidden.. I installed PgAdmin 3 via .dmg and I have no problems there.. there is a libpq private to PgAdmin but the pg_config that is found in /usr/bin appears to be for the hidden Postgres I have stopped my installs at this point, other than a few KyngChaos Frameworks for geo processing.. like QGis.. what is the story ? Is Mac OS X an orphan now? (I believe I am similarly stuck with no ipython, also) -- Brian Hamlin GeoCal OSGeo California Chapter 415-717-4462 cell
On Tue, Feb 14, 2012 at 3:28 PM, Federico Di Gregorio <fog@dndg.it> wrote: > On 14/02/12 13:11, Marko Kreen wrote: >> There is an item in current commitfest that might be interesting >> to Psycopg - a way to provide custom callback to result >> row processing. By default, libpq takes row data from >> network buffer and puts it into PGresult. This can now >> be overrided - column data can be converted immediately >> to final format and even processed in-flight: The callback API has been rolled back and final 9.2 will have single-row-mode instead: http://www.postgresql.org/docs/devel/static/libpq-single-row-mode.html Callback API was hard to use and did not support iterative result processing. This API should fix both problems. -- marko
On 07/08/12 12:51, Marko Kreen wrote: > The callback API has been rolled back and final 9.2 will have > single-row-mode instead: > > http://www.postgresql.org/docs/devel/static/libpq-single-row-mode.html > > Callback API was hard to use and did not support iterative > result processing. This API should fix both problems. Interesting, but this is quite intimidating: While processing a query, the server may return some rows and then encounter an error, causing the query to be aborted. Ordinarily, libpq discards any such rows and reports only the error. But in single-row mode, those rows will have already been returned to the application. Hence, the application will see some PGRES_SINGLE_TUPLE PGresult objects followed by a PGRES_FATAL_ERROR object. For proper transactional behavior, the application must be designed to discard or undo whatever has been done with the previously-processed rows, if the query ultimately fails. Making sure old code continues to work and doesn't process the initial tuples before an exception is throw will be challenging. federico -- Federico Di Gregorio federico.digregorio@dndg.it Studio Associato Di Nunzio e Di Gregorio http://dndg.it Everything will be OK at the end. If it's not OK, it's not the end. -- Unknown
On Tue, Aug 7, 2012 at 2:15 PM, Federico Di Gregorio <fog@dndg.it> wrote: > On 07/08/12 12:51, Marko Kreen wrote: >> The callback API has been rolled back and final 9.2 will have >> single-row-mode instead: >> >> http://www.postgresql.org/docs/devel/static/libpq-single-row-mode.html >> >> Callback API was hard to use and did not support iterative >> result processing. This API should fix both problems. > > Interesting, but this is quite intimidating: > > While processing a query, the server may return some rows and then > encounter an error, causing the query to be aborted. Ordinarily, libpq > discards any such rows and reports only the error. But in single-row > mode, those rows will have already been returned to the application. > Hence, the application will see some PGRES_SINGLE_TUPLE PGresult objects > followed by a PGRES_FATAL_ERROR object. For proper transactional > behavior, the application must be designed to discard or undo whatever > has been done with the previously-processed rows, if the query > ultimately fails. > > Making sure old code continues to work and doesn't process the initial > tuples before an exception is throw will be challenging. Same thing happens when you fetch the result in transaction and later query fails with error thus invalidating earlier processing. So nothing new. Or how about FETCH 100 from cursor in transaction, and first few succeed and later one fails. It's up to user code to handle such cases correctly and "correct" here depends on actual business logic of the transaction. The warning is there because there is now new failure scenario, but not because the failure needs any kind of special handling. -- marko
On 07/08/12 13:41, Marko Kreen wrote: > Same thing happens when you fetch the result in transaction > and later query fails with error thus invalidating earlier > processing. So nothing new. > > Or how about FETCH 100 from cursor in transaction, > and first few succeed and later one fails. > > It's up to user code to handle such cases correctly > and "correct" here depends on actual business logic > of the transaction. > > The warning is there because there is now new > failure scenario, but not because the failure > needs any kind of special handling. I don't agree. Simple code like: curs.execute("SELECT * FROM xs") for x in curs.fetchall(): # do something like writing to the file system with x will have a different effect if row-by-row processing is enabled. Before nothing would be changed on the file system in case of error: the fetchall() is "atomic"; while now you write to the file system until the row that causes the error is processed. federico -- Federico Di Gregorio federico.digregorio@dndg.it Studio Associato Di Nunzio e Di Gregorio http://dndg.it There's no greys, only white that's got grubby. I'm surprised you don't know that. And sin, young man, is when you treat people as things. Including yourself. -- Granny Weatherwax
On Tue, Aug 7, 2012 at 2:46 PM, Federico Di Gregorio <fog@dndg.it> wrote: > On 07/08/12 13:41, Marko Kreen wrote: >> Same thing happens when you fetch the result in transaction >> and later query fails with error thus invalidating earlier >> processing. So nothing new. >> >> Or how about FETCH 100 from cursor in transaction, >> and first few succeed and later one fails. >> >> It's up to user code to handle such cases correctly >> and "correct" here depends on actual business logic >> of the transaction. >> >> The warning is there because there is now new >> failure scenario, but not because the failure >> needs any kind of special handling. > > I don't agree. Simple code like: > > curs.execute("SELECT * FROM xs") > for x in curs.fetchall(): > # do something like writing to the file system with x > > will have a different effect if row-by-row processing is enabled. Before > nothing would be changed on the file system in case of error: the > fetchall() is "atomic"; while now you write to the file system until the > row that causes the error is processed. When in transaction, then best analogy is reading from cursor with FETCH. When outside, in autocommit mode, then COPY (SELECT ..) which you then process line-by-line. My point is that the behavior is not something completely new, that no-one has seen before. But it's different indeed from libpq default, so it's not something psycopg can convert to using unconditionally. But as optional feature it should be quite useful. Note - we are talking about libpq world here, Npgsql uses such mode by default, maybe pgjdbc does too. -- marko
On 07/08/12 15:14, Marko Kreen wrote: > My point is that the behavior is not something completely new, > that no-one has seen before. > > But it's different indeed from libpq default, so it's not something > psycopg can convert to using unconditionally. But as optional feature > it should be quite useful. I agree. As an opt-in feature would be quite useful for large datasets but then, named cursors already cover that ground. Not that I am against it, just I'd like to see why: curs = conn.cursor(row_by_row=True) would be better than: curs = conn.cursor("row_by_row") Is row by row faster than fetching from a named cursor? Does it add less overhead. If that's the case then would be nice to have it as a feature for optimizing queries returning large datasets. federico -- Federico Di Gregorio federico.digregorio@dndg.it Studio Associato Di Nunzio e Di Gregorio http://dndg.it And anyone who yells "fork" deserves to get one stuck in them. -- Dan Winship
On Tue, Aug 7, 2012 at 3:25 PM, Federico Di Gregorio <fog@dndg.it> wrote: > On 07/08/12 15:14, Marko Kreen wrote: >> My point is that the behavior is not something completely new, >> that no-one has seen before. >> >> But it's different indeed from libpq default, so it's not something >> psycopg can convert to using unconditionally. But as optional feature >> it should be quite useful. > > I agree. As an opt-in feature would be quite useful for large datasets > but then, named cursors already cover that ground. Not that I am against > it, just I'd like to see why: > > curs = conn.cursor(row_by_row=True) > > would be better than: > > curs = conn.cursor("row_by_row") > > Is row by row faster than fetching from a named cursor? Does it add less > overhead. If that's the case then would be nice to have it as a feature > for optimizing queries returning large datasets. A big win would be that you don't need to keep the whole dataset in memory, wouldn't it? As you're looping through it, you can throw away the old results... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Tue, Aug 7, 2012 at 4:25 PM, Federico Di Gregorio <fog@dndg.it> wrote: > On 07/08/12 15:14, Marko Kreen wrote: >> My point is that the behavior is not something completely new, >> that no-one has seen before. >> >> But it's different indeed from libpq default, so it's not something >> psycopg can convert to using unconditionally. But as optional feature >> it should be quite useful. > > I agree. As an opt-in feature would be quite useful for large datasets > but then, named cursors already cover that ground. Not that I am against > it, just I'd like to see why: > > curs = conn.cursor(row_by_row=True) > > would be better than: > > curs = conn.cursor("row_by_row") > > Is row by row faster than fetching from a named cursor? Does it add less > overhead. If that's the case then would be nice to have it as a feature > for optimizing queries returning large datasets. It avoids network rondtrips while buffering minimal amount of data. Roundtrips may not be noticeable in single colo, but are definitely troublesome when working between different colos. It also saves CPU and memory on both server and client, (less cache usage, less context switches) but that gets into micro-optimization world so it is harder to measure. -- marko