Re: Resdhift's lack of cursors and PQsetSingleRowMode - Mailing list psycopg

From Daniele Varrazzo
Subject Re: Resdhift's lack of cursors and PQsetSingleRowMode
Date
Msg-id CA+mi_8acYYr+iRFYhngWow9vxtBfKLDvRi6M-+EkmcireksKwQ@mail.gmail.com
Whole thread Raw
In response to Resdhift's lack of cursors and PQsetSingleRowMode  (Ryan Kelly <rpkelly22@gmail.com>)
Responses Re: Resdhift's lack of cursors and PQsetSingleRowMode  (Marko Kreen <markokr@gmail.com>)
List psycopg
On Tue, Dec 24, 2013 at 4:45 PM, Ryan Kelly <rpkelly22@gmail.com> wrote:
> Hi:
>
> We're using psycopg2 to connect to Amazon's Redshift, and one problem
> Redshift has is that it lacks server-side cursors. What I'd like is to
> be able to use PQsetSingleRowMode to avoid consuming all the data at
> once (in specific circumstances, of course). I'm not entirely sure what
> in psycopg2 would need to be modified, but it seems like the changes
> would mostly be restricted to the `async == 0` portion of `pq_execute`,
> but I'm not familiar enough with the source to be certain.

Hello Ryan,

supporting SingleRowMode doesn't look like a trivial change. I have
personally no plan to work on it but of course I'm not against
contributions. Here's a bunch of random thoughts.

General implementation of psycopg with SingleRowMode seems relatively
hard: for instance implementing fetchmany(10) would require 10 calls
and handling the result from 10 PQresult structures into a temporary
storage we currently don't need and don't have: currently we have a
single result both with client-side and server-side cursor. It would
be easier to implement a subset of the interface, e.g. forward only
iteration with no bookkeeping (no scroll(), no rownumber...).

ISTM that in single row mode you can only get a single row per network
operation. If you have a large dataset (likely if you want this
feature) you may end up in a large overhead, the sort of problem we
have solved with cursor.itersize but whose solution doesn't seem to
apply for SingleRowMode.

The async codepath is already lacking several features: I wouldn't
bother for SingleRowMode with it, at least in a first iteration (I
haven't analyzed if it's doable or not).

If you want to work on the feature you may also start hacking on the
pure python implementation of the driver
(https://github.com/chtd/psycopg2cffi is the most up to date): it is
quite a straightforward porting so reimplementing the changeset in C
wouldn't be too hard.

I may be wrong, but in my opinion the one-roundtrip-per-row would be a
performance killer and you may end up better paginating manually via
LIMIT/OFFSET, although the server wouldn't love you for that (the
server still computes the rows before the offset, only to discard
them, so querying a large dataset in pages is a quadratic operation).

-- Daniele


psycopg by date:

Previous
From: Ryan Kelly
Date:
Subject: Resdhift's lack of cursors and PQsetSingleRowMode
Next
From: Marko Kreen
Date:
Subject: Re: Resdhift's lack of cursors and PQsetSingleRowMode