Re: Proposal: efficient iter on named cursors - Mailing list psycopg

From Daniele Varrazzo
Subject Re: Proposal: efficient iter on named cursors
Date
Msg-id AANLkTi=y7x32LTLA65JKm5BSD3kN0rDRq+wC6HhjAqcg@mail.gmail.com
Whole thread Raw
In response to Re: Proposal: efficient iter on named cursors  (Federico Di Gregorio <federico.digregorio@dndg.it>)
Responses Re: Proposal: efficient iter on named cursors  (Federico Di Gregorio <federico.digregorio@dndg.it>)
List psycopg
On Thu, Jan 13, 2011 at 4:46 PM, Federico Di Gregorio
<federico.digregorio@dndg.it> wrote:
> On 13/01/2011 17:37, Daniele Varrazzo wrote:

>> So, I'd like to modify the cursor so that in case of __iter__, a
>> certain number of record is fetched and iteration is performed on
>> them. The cursor already has the state to keep the dataset so probably
>> only the code would require change, not so much the data structures.

> I mostly agree. I'd like to see a .fetchsize parameter both on the
> module, on the connection and on the cursor (usual psycopg cascade
> initialization). A good default value, supposing rows of 10-20 columns
> with a mix of textual and numeric data, is ~2000 because it will
> probably fetch less than 100KB of data and that seems a good compromise
> between the number of rows transferred and data you'll throw away if you
> decide to leave the loop early.

The feature discussed here is included in the current betas. In
particular, in the released implementation, we are using the already
existing 'arraysize' attribute (used as default for 'fetchmany()') as
the number of records to fetch per roundtrip. But because the default
for arraysize is 1 (per DB-API), and this value basically disables the
feature, 'iter()' currently uses the proposed value of 2000 as
default.

However the ticket #41
<http://psycopg.lighthouseapp.com/projects/62710/tickets/41>, while
invalid for me as I don't see "for record in cur" as a synonym for
"fetch one record at time",  has made me wonder if we are too
aggressive with the default: maybe discarding arraysize=1 is not the
best option. Albeit somebody can still use the "while True + /
fetchone() + break" pattern to force record-per-record fetching, I
think if she is careful enough to use named cursors for its task she
may also care to set an appropriate value > 1 for arraysize. Named
cursors are still easier to use, but I don't want to make people think
they can be a replacement for *all* cursors - they still require more
resources on the server, so are better used only when required.

On this reasoning, I've committed this patch
<https://github.com/dvarrazzo/psycopg/commit/1dd71947bff415f1c79a5e6b81b6ba89d717ecb9>
in a separate branch: it makes iteration respect arraysize in its
default value 1 too, and improves the documentation explaining the
complete picture. If ok, the patch will be merged in devel.

Comments?

-- Daniele

psycopg by date:

Previous
From: Daniele Varrazzo
Date:
Subject: Re: psycopg used in a ASP page fails
Next
From: Daniele Varrazzo
Date:
Subject: Re: psycopg used in a ASP page fails