Thread: plpython SPI cursors

plpython SPI cursors

From
Jan Urbański
Date:
Hi,

attached is a patch implementing the usage of SPI cursors in PL/Python.
Currently when trying to process a large table in PL/Python you have
slurp it all into memory (that's what plpy.execute does).

This patch allows reading the result set in smaller chunks, using a SPI
cursor behind the scenes.

Example usage:

cursor = plpy.cursor("select a, b from hugetable")
for row in cursor:
    plpy.info("a is %s and b is %s" % (row['a'], row['b']))

The patch itself is simple, but there's a lot of boilerplate dedicated
to opening a subtransaction and handling prepared plans. I'd like to do
some refactoring of they way PL/Python uses SPI to reduce the amount of
boilerplate needed, but that'll come as a separate patch (just before
the patch to split plpython.c in smaller chunks).

This feature has been sponsored by Nomao.

Cheers,
Jan

PS: I already added it to the November CF.

J

Attachment

Re: plpython SPI cursors

From
Steve Singer
Date:
On 11-10-15 07:28 PM, Jan Urbański wrote:
> Hi,
>
> attached is a patch implementing the usage of SPI cursors in PL/Python.
> Currently when trying to process a large table in PL/Python you have
> slurp it all into memory (that's what plpy.execute does).
>
> J

I found a few bugs  (see my testing section below) that will need fixing 
+ a few questions about the code


Overview & Feature Review
-----------
This patch adds cursor support to plpython.  SPI cursors will allow
a plpython function to read process a large results set without having to
read it all into memory at once.  This is a good thing.  Without this
patch I think you could accomplish the same with using SQL DECLARE CURSOR
and SQL fetch.    This feature allows you to use a python cursor as
an iterator resulting in much cleaner python code than the SQL FETCH
approach.   I think the feature is worth having


Usability Review
---------------------- The patch adds the user methods
cursor=plpy.cursor(query_or_plan)
cursor.fetch(100)
cursor.close()

Do we like the name plpy.cursor or would we rather call it something like
plpy.execute_cursor(...) or plpy.cursor_open(...) or plpy.create_cursor(...)
Since we will be mostly stuck with the API once we release 9.2 this is worth
some opinions on.  I like cursor() but if anyone disagrees now is the time.

This patch does not provide a wrapper around SPI_cursor_move.  The patch
is useful without that and I don't see anything that preculdes someone else
adding that later if they see a need.


Documentation Review
---------------------

The patch includes documentation updates that describes the new feature.
The Database Access page doesn't provide a API style list of database access
functions like the plperl 
http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html page 
does.  I think the organization of the perl page is
clearer than the python one and we should think about a doing some
documentaiton refactoring.  That should be done as a seperate patch and
shouldn't be a barrier to committing this one.




Code Review
----------------

in PLy_cursor_plan line 4080
+     PG_TRY();
+     {
+         Portal        portal;
+         char       *volatile nulls;
+         volatile int j;
+
+         if (nargs > 0)
+             nulls = palloc(nargs * sizeof(char));
+         else
+             nulls = NULL;
+
+         for (j = 0; j < nargs; j++)
+         {
+             PyObject   *elem;
I am probably not seeing a code path or misunderstanding something
about the setjmp/longjump usages but I don't see why nulls and j need to be
volatile here.

line 444 PLy_cursor(PyObject *self, PyObject *args)
+ {
+     char            *query;
+     PyObject        *plan;
+     PyObject           *planargs = NULL;
+
+     if (PyArg_ParseTuple(args, "s", &query))
+         return PLy_cursor_query(query);
+

Should query be freed with PyMem_free()



Testing
-----------

I tested both python 2.6 and 3 on a Linux system



create or replace function x() returns text as $$
cur=None
try:  with plpy.subtransaction():    cur=plpy.cursor('select generate_series(1,1000)')    rows=cur.fetch(10);
plpy.execute('selectf()')
 

except plpy.SPIError:  rows=cur.fetch(10);  return rows[0]['generate_series']
return 'none'
$$ language plpythonu;
select x();

crashes the backend
test=# select x();
The connection to the server was lost. Attempting reset: LOG:  server 
process (PID 3166) was terminated by signal 11: Segmentation fault

The below test gives me a strange error message:

create or replace function x1() returns text as $$
plan=None
try:  with plpy.subtransaction():    plpy.execute('CREATE TEMP TABLE z AS select generate_series(1,1000)')
plan=plpy.prepare('select* FROM z')    plpy.execute('select * FROM does_not_exist')
 
except plpy.SPIError, e:    cur=plpy.cursor(plan)    rows=cur.fetch(10)    return rows[0]['generate_series']
return '1'
$$ language plpythonu;
select x1();

test=# select x1()
test-# ;
ERROR:  TypeError: Expected sequence of 82187072 arguments, got 0: <NULL>
CONTEXT:  Traceback (most recent call last):      PL/Python function "x1", line 9, in <module>
cur=plpy.cursor(plan)   PL/Python function "x1"
 
STATEMENT:  select x1()


I was expecting an error from the function just a bit more useful one.


Performance
-------------------
I did not do any specific performance testing but I don't see this patch 
as having any impact to performance




Re: plpython SPI cursors

From
Jan Urbański
Date:
On 20/11/11 19:14, Steve Singer wrote:
> On 11-10-15 07:28 PM, Jan Urbański wrote:
>> Hi,
>>
>> attached is a patch implementing the usage of SPI cursors in PL/Python.

> I found a few bugs (see my testing section below) that will need fixing
> + a few questions about the code

Hi Steve,

thanks a lot for the review, I'll investigate the errors you were 
getting and post a follow-up.

Good catch on trying cursors with explicit subtransactions, I didn't 
think about how they would interact.

Cheers,
Jan


Re: plpython SPI cursors

From
Jan Urbański
Date:
On 20/11/11 19:14, Steve Singer wrote:
> On 11-10-15 07:28 PM, Jan Urbański wrote:
>> Hi,
>>
>> attached is a patch implementing the usage of SPI cursors in PL/Python.
>> Currently when trying to process a large table in PL/Python you have
>> slurp it all into memory (that's what plpy.execute does).
>>
>> J
>
> I found a few bugs (see my testing section below) that will need fixing
> + a few questions about the code

Responding now to all questions and attaching a revised patch based on
your comments.

> Do we like the name plpy.cursor or would we rather call it something like
> plpy.execute_cursor(...) or plpy.cursor_open(...) or
> plpy.create_cursor(...)
> Since we will be mostly stuck with the API once we release 9.2 this is
> worth
> some opinions on. I like cursor() but if anyone disagrees now is the time.

We use plpy.subtransaction() to create Subxact objects, so I though
plpy.cursor() would be most appropriate.

> This patch does not provide a wrapper around SPI_cursor_move. The patch
> is useful without that and I don't see anything that preculdes someone else
> adding that later if they see a need.

My idea is to add keyword arguments to plpy.cursor() that will allow you
to decide whether you want a scrollable cursor and after that provide a
move() method.

> The patch includes documentation updates that describes the new feature.
> The Database Access page doesn't provide a API style list of database
> access
> functions like the plperl
> http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html page
> does. I think the organization of the perl page is
> clearer than the python one and we should think about a doing some
> documentaiton refactoring. That should be done as a seperate patch and
> shouldn't be a barrier to committing this one.

Yeah, the PL/Python docs are a bit chaotic right now. I haven't yet
summoned force to overhaul them.

> in PLy_cursor_plan line 4080
> + PG_TRY();
> + {
> + Portal portal;
> + char *volatile nulls;
> + volatile int j;

> I am probably not seeing a code path or misunderstanding something
> about the setjmp/longjump usages but I don't see why nulls and j need to be
> volatile here.

It looked like you could drop volatile there (and in
PLy_spi_execute_plan, where this is copied from (did I mention there's
quite some code duplication in PL/Python?)) but digging in git I found
this commit:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2789b7278c11785750dd9d2837856510ffc67000

that added the original volatile qualification, so I guess there's a reason.

> line 444
> PLy_cursor(PyObject *self, PyObject *args)
> + {
> + char *query;
> + PyObject *plan;
> + PyObject *planargs = NULL;
> +
> + if (PyArg_ParseTuple(args, "s", &query))
> + return PLy_cursor_query(query);
> +
>
> Should query be freed with PyMem_free()

No, PyArg_ParseTuple returns a string on the stack, I check that
repeatedly creating a cursor with a plan argument does not leak memory
and that adding PyMem_Free there promptly leads to a segfault.


> I tested both python 2.6 and 3 on a Linux system
>
> [test cases demonstrating bugs]

Turns out it's a really bad idea to store pointers to Portal structures,
because they get invalidated by the subtransaction abort hooks.

I switched to storing the cursor name and looking it up in the
appropriate hash table every time it's used. The examples you sent
(which I included as regression tests) now cause a ValueError to be
raised with a message stating that the cursor has been created in an
aborted subtransaction.

Not sure about the wording of the error message, though.

Thanks again for the review!

Cheers,
Jan

Attachment

Re: plpython SPI cursors

From
Steve Singer
Date:
On 11-11-23 01:58 PM, Jan Urbański wrote: <blockquote cite="mid:4ECD426F.5060702@wulczer.org" type="cite">On 20/11/11
19:14,Steve Singer wrote: <br /><blockquote type="cite">On 11-10-15 07:28 PM, Jan Urbański wrote: <br /><blockquote
type="cite">Hi,<br /><br /> attached is a patch implementing the usage of SPI cursors in PL/Python. <br /> Currently
whentrying to process a large table in PL/Python you have <br /> slurp it all into memory (that's what plpy.execute
does).<br /><br /> J <br /></blockquote><br /> I found a few bugs (see my testing section below) that will need fixing
<br/> + a few questions about the code <br /></blockquote><br /> Responding now to all questions and attaching a
revisedpatch based on your comments. <br /><br /></blockquote><br /> I've looked over the revised version of the patch
andit now seems fine.<br /><br /> Ready for committer.<br /><br /><br /><br /><blockquote
cite="mid:4ECD426F.5060702@wulczer.org"type="cite"><blockquote type="cite">Do we like the name plpy.cursor or would we
rathercall it something like <br /> plpy.execute_cursor(...) or plpy.cursor_open(...) or <br /> plpy.create_cursor(...)
<br/> Since we will be mostly stuck with the API once we release 9.2 this is <br /> worth <br /> some opinions on. I
likecursor() but if anyone disagrees now is the time. <br /></blockquote><br /> We use plpy.subtransaction() to create
Subxactobjects, so I though plpy.cursor() would be most appropriate. <br /><br /><blockquote type="cite">This patch
doesnot provide a wrapper around SPI_cursor_move. The patch <br /> is useful without that and I don't see anything that
preculdessomeone else <br /> adding that later if they see a need. <br /></blockquote><br /> My idea is to add keyword
argumentsto plpy.cursor() that will allow you to decide whether you want a scrollable cursor and after that provide a
move()method. <br /><br /><blockquote type="cite">The patch includes documentation updates that describes the new
feature.<br /> The Database Access page doesn't provide a API style list of database <br /> access <br /> functions
likethe plperl <br /><a class="moz-txt-link-freetext"
href="http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html">http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html</a>
page<br /> does. I think the organization of the perl page is <br /> clearer than the python one and we should think
abouta doing some <br /> documentaiton refactoring. That should be done as a seperate patch and <br /> shouldn't be a
barrierto committing this one. <br /></blockquote><br /> Yeah, the PL/Python docs are a bit chaotic right now. I
haven'tyet summoned force to overhaul them. <br /><br /><blockquote type="cite">in PLy_cursor_plan line 4080 <br /> +
PG_TRY();<br /> + { <br /> + Portal portal; <br /> + char *volatile nulls; <br /> + volatile int j; <br
/></blockquote><br/><blockquote type="cite">I am probably not seeing a code path or misunderstanding something <br />
aboutthe setjmp/longjump usages but I don't see why nulls and j need to be <br /> volatile here. <br /></blockquote><br
/>It looked like you could drop volatile there (and in PLy_spi_execute_plan, where this is copied from (did I mention
there'squite some code duplication in PL/Python?)) but digging in git I found this commit: <br /><br /><a
class="moz-txt-link-freetext"
href="http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2789b7278c11785750dd9d2837856510ffc67000">http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2789b7278c11785750dd9d2837856510ffc67000</a><br
/><br/> that added the original volatile qualification, so I guess there's a reason. <br /><br /><blockquote
type="cite">line444 <br /> PLy_cursor(PyObject *self, PyObject *args) <br /> + { <br /> + char *query; <br /> +
PyObject*plan; <br /> + PyObject *planargs = NULL; <br /> + <br /> + if (PyArg_ParseTuple(args, "s", &query)) <br
/>+ return PLy_cursor_query(query); <br /> + <br /><br /> Should query be freed with PyMem_free() <br
/></blockquote><br/> No, PyArg_ParseTuple returns a string on the stack, I check that repeatedly creating a cursor with
aplan argument does not leak memory and that adding PyMem_Free there promptly leads to a segfault. <br /><br /><br
/><blockquotetype="cite">I tested both python 2.6 and 3 on a Linux system <br /><br /> [test cases demonstrating bugs]
<br/></blockquote><br /> Turns out it's a really bad idea to store pointers to Portal structures, because they get
invalidatedby the subtransaction abort hooks. <br /><br /> I switched to storing the cursor name and looking it up in
theappropriate hash table every time it's used. The examples you sent (which I included as regression tests) now cause
aValueError to be raised with a message stating that the cursor has been created in an aborted subtransaction. <br
/><br/> Not sure about the wording of the error message, though. <br /><br /> Thanks again for the review! <br /><br />
Cheers,<br /> Jan <br /><pre wrap="">
 
<fieldset class="mimeAttachmentHeader"></fieldset>

</pre></blockquote><br />

Re: plpython SPI cursors

From
Peter Eisentraut
Date:
On lör, 2011-11-26 at 11:21 -0500, Steve Singer wrote:
> I've looked over the revised version of the patch and it now seems
> fine.
> 
> Ready for committer. 

I can take it from here.



Re: plpython SPI cursors

From
Peter Eisentraut
Date:
On ons, 2011-11-23 at 19:58 +0100, Jan Urbański wrote:
> On 20/11/11 19:14, Steve Singer wrote:
> > On 11-10-15 07:28 PM, Jan Urbański wrote:
> >> Hi,
> >>
> >> attached is a patch implementing the usage of SPI cursors in PL/Python.
> >> Currently when trying to process a large table in PL/Python you have
> >> slurp it all into memory (that's what plpy.execute does).
> >>
> >> J
> >
> > I found a few bugs (see my testing section below) that will need fixing
> > + a few questions about the code
> 
> Responding now to all questions and attaching a revised patch based on 
> your comments.

Committed

Please refresh the other patch.





Re: plpython SPI cursors

From
Jan Urbański
Date:
On 05/12/11 18:58, Peter Eisentraut wrote:
> On ons, 2011-11-23 at 19:58 +0100, Jan Urbański wrote:
>> On 20/11/11 19:14, Steve Singer wrote:
>> Responding now to all questions and attaching a revised patch based on 
>> your comments.
> 
> Committed
> 
> Please refresh the other patch.

Great, thanks!

I'll try to send an updated version of the other patch this evening.

Cheers,
Jan


Re: plpython SPI cursors

From
Bruce Momjian
Date:
Jan Urbański wrote:
> On 05/12/11 18:58, Peter Eisentraut wrote:
> > On ons, 2011-11-23 at 19:58 +0100, Jan Urba?ski wrote:
> >> On 20/11/11 19:14, Steve Singer wrote:
> >> Responding now to all questions and attaching a revised patch based on
> >> your comments.
> >
> > Committed
> >
> > Please refresh the other patch.
>
> Great, thanks!
>
> I'll try to send an updated version of the other patch this evening.

I assume this is _not_ related to this TODO item:
Add a DB-API compliant interface on top of the SPI interface

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: plpython SPI cursors

From
Jan Urbański
Date:
On 05/12/11 19:12, Bruce Momjian wrote:
> Jan Urbański wrote:
>> On 05/12/11 18:58, Peter Eisentraut wrote:
>>> On ons, 2011-11-23 at 19:58 +0100, Jan Urba?ski wrote:
>>>> On 20/11/11 19:14, Steve Singer wrote:
>>>> Responding now to all questions and attaching a revised patch based on 
>>>> your comments.
>>>
>>> Committed
>>>
>>> Please refresh the other patch.
>>
>> Great, thanks!
>>
>> I'll try to send an updated version of the other patch this evening.
> 
> I assume this is _not_ related to this TODO item:
> 
>     Add a DB-API compliant interface on top of the SPI interface 

No,  not related.


Re: plpython SPI cursors

From
Peter Eisentraut
Date:
On mån, 2011-12-05 at 13:12 -0500, Bruce Momjian wrote:
> Jan Urbański wrote:
> > On 05/12/11 18:58, Peter Eisentraut wrote:
> > > On ons, 2011-11-23 at 19:58 +0100, Jan Urba?ski wrote:
> > >> On 20/11/11 19:14, Steve Singer wrote:
> > >> Responding now to all questions and attaching a revised patch based on 
> > >> your comments.
> > > 
> > > Committed
> > > 
> > > Please refresh the other patch.
> > 
> > Great, thanks!
> > 
> > I'll try to send an updated version of the other patch this evening.
> 
> I assume this is _not_ related to this TODO item:
> 
>     Add a DB-API compliant interface on top of the SPI interface 

No, but this is:
http://petereisentraut.blogspot.com/2011/11/plpydbapi-db-api-for-plpython.html