Re: plpython SPI cursors - Mailing list pgsql-hackers
From | Steve Singer |
---|---|
Subject | Re: plpython SPI cursors |
Date | |
Msg-id | BLU0-SMTP4530C1907601DFB9AEAE778ECA0@phx.gbl Whole thread Raw |
In response to | plpython SPI cursors (Jan Urbański <wulczer@wulczer.org>) |
Responses |
Re: plpython SPI cursors
Re: plpython SPI cursors |
List | pgsql-hackers |
On 11-10-15 07:28 PM, Jan Urbański wrote: > Hi, > > attached is a patch implementing the usage of SPI cursors in PL/Python. > Currently when trying to process a large table in PL/Python you have > slurp it all into memory (that's what plpy.execute does). > > J I found a few bugs (see my testing section below) that will need fixing + a few questions about the code Overview & Feature Review ----------- This patch adds cursor support to plpython. SPI cursors will allow a plpython function to read process a large results set without having to read it all into memory at once. This is a good thing. Without this patch I think you could accomplish the same with using SQL DECLARE CURSOR and SQL fetch. This feature allows you to use a python cursor as an iterator resulting in much cleaner python code than the SQL FETCH approach. I think the feature is worth having Usability Review ---------------------- The patch adds the user methods cursor=plpy.cursor(query_or_plan) cursor.fetch(100) cursor.close() Do we like the name plpy.cursor or would we rather call it something like plpy.execute_cursor(...) or plpy.cursor_open(...) or plpy.create_cursor(...) Since we will be mostly stuck with the API once we release 9.2 this is worth some opinions on. I like cursor() but if anyone disagrees now is the time. This patch does not provide a wrapper around SPI_cursor_move. The patch is useful without that and I don't see anything that preculdes someone else adding that later if they see a need. Documentation Review --------------------- The patch includes documentation updates that describes the new feature. The Database Access page doesn't provide a API style list of database access functions like the plperl http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html page does. I think the organization of the perl page is clearer than the python one and we should think about a doing some documentaiton refactoring. That should be done as a seperate patch and shouldn't be a barrier to committing this one. Code Review ---------------- in PLy_cursor_plan line 4080 + PG_TRY(); + { + Portal portal; + char *volatile nulls; + volatile int j; + + if (nargs > 0) + nulls = palloc(nargs * sizeof(char)); + else + nulls = NULL; + + for (j = 0; j < nargs; j++) + { + PyObject *elem; I am probably not seeing a code path or misunderstanding something about the setjmp/longjump usages but I don't see why nulls and j need to be volatile here. line 444 PLy_cursor(PyObject *self, PyObject *args) + { + char *query; + PyObject *plan; + PyObject *planargs = NULL; + + if (PyArg_ParseTuple(args, "s", &query)) + return PLy_cursor_query(query); + Should query be freed with PyMem_free() Testing ----------- I tested both python 2.6 and 3 on a Linux system create or replace function x() returns text as $$ cur=None try: with plpy.subtransaction(): cur=plpy.cursor('select generate_series(1,1000)') rows=cur.fetch(10); plpy.execute('selectf()') except plpy.SPIError: rows=cur.fetch(10); return rows[0]['generate_series'] return 'none' $$ language plpythonu; select x(); crashes the backend test=# select x(); The connection to the server was lost. Attempting reset: LOG: server process (PID 3166) was terminated by signal 11: Segmentation fault The below test gives me a strange error message: create or replace function x1() returns text as $$ plan=None try: with plpy.subtransaction(): plpy.execute('CREATE TEMP TABLE z AS select generate_series(1,1000)') plan=plpy.prepare('select* FROM z') plpy.execute('select * FROM does_not_exist') except plpy.SPIError, e: cur=plpy.cursor(plan) rows=cur.fetch(10) return rows[0]['generate_series'] return '1' $$ language plpythonu; select x1(); test=# select x1() test-# ; ERROR: TypeError: Expected sequence of 82187072 arguments, got 0: <NULL> CONTEXT: Traceback (most recent call last): PL/Python function "x1", line 9, in <module> cur=plpy.cursor(plan) PL/Python function "x1" STATEMENT: select x1() I was expecting an error from the function just a bit more useful one. Performance ------------------- I did not do any specific performance testing but I don't see this patch as having any impact to performance
pgsql-hackers by date: