Thread: splitting plpython into smaller parts

splitting plpython into smaller parts

From
Jan Urbański
Date:
Hi,

attached are two incremental patches that refactor plpython into smaller
modules.

The first one factors out some bolerplate related to executing SPI
functions in subtransactions (and idea borrowed from pltcl.c).

The second one is the actual split. plpython.c has been split into 11
separate files and one header. The separate files are:

  * plpython.c - top-level handlers and stuff that everything else uses
  * plpython_io.c - transforming Python objects to PG structures and
vice versa
  * plpython_procedure.c - handling and caching PLyProcedure objects
  * plpython_exec.c - actually executing the Python code
  * plpython_plpy.c - defining the global plpy module and setting up the
Python interpreter
  * plpython_spi.c - interface to SPI functions
  * plpython_result.c, plpython_plan.c, plpython_subtrasaction.c - a
file per Python class created by plpython with their method definitions
  * plpython_functions.c - Python functions available from the plpy module
  * plpython_elog.c - transforming Python errors into Postgres elogs

All regression tests pass, I tested on Python 2.3, 2.7 and 3.1.

The other plpython patch I submitted (cursor support) is not included
here. If it gets accepted, I'll update this patch to add a
plpython_cursor.c file. If this gets accepted first, I'll update the
cursor patch accordingly.

There's still a lot of room for refactoring and getting rid of
repetitive code from plpython, but that split should be fundamental to
make it a bit more manageable (it's almost 5K lines now).

I've tried to change as little code as possible during the split, apart
from making a bunch of functions non-static I only had to change the
type initialization to call functions from the
plpython_{result,plan,...} modules to avoid exposing the PyTypeObject
structs outside of their respective files and get rid of the
is_PLyPlanObject macro in favour of a function.

Cheers,
Jan

PS: the patches are gzipped because they're rather big - 270K uncompressed.

J

PPS: I guess a README in the plpython dir would be in order. If we
accept these patches, I'll write one up based on the contents of this mail.

J

Attachment

Re: splitting plpython into smaller parts

From
Greg Smith
Date:
On 11/13/2011 09:45 AM, Jan Urbański wrote:
> The first one factors out some bolerplate related to executing SPI 
> functions in subtransactions (and idea borrowed from pltcl.c).

While I haven't looked at the code, this seems worthwhile from the 
description.

> The second one is the actual split. plpython.c has been split into 11 
> separate files and one header. 

Could you comment a bit more about what the goal of this is?  We don't 
have a reviewer for this patch yet, and I think part of the reason is 
because it's not really obvious what it's supposed to be doing, and why 
that's useful.




Re: splitting plpython into smaller parts

From
Jan Urbański
Date:
On 28/11/11 11:00, Greg Smith wrote:
> On 11/13/2011 09:45 AM, Jan Urbański wrote:
>> The second one is the actual split. plpython.c has been split into 11
>> separate files and one header.
>
> Could you comment a bit more about what the goal of this is? We don't
> have a reviewer for this patch yet, and I think part of the reason is
> because it's not really obvious what it's supposed to be doing, and why
> that's useful.

The idea of splitting plpython.c (an almost 5k lines file) into 
something more modular.

It's been floated around here:

http://postgresql.1045698.n5.nabble.com/Large-C-files-tt4766446.html#a4773493

and I think at other occasions, too.

The patch introduces no behavioural changes, it's only shuffling code 
around. The only goal is to improve the maintainability.

I guess the reviewer could verify that a) I haven't botched the split 
and it all still compiles and workds b) the choice of which modules were 
defined is correct

Cheers,
Jan


Re: splitting plpython into smaller parts

From
Peter Eisentraut
Date:
On mån, 2011-11-28 at 02:00 -0800, Greg Smith wrote:
> On 11/13/2011 09:45 AM, Jan Urbański wrote:
> > The first one factors out some bolerplate related to executing SPI 
> > functions in subtransactions (and idea borrowed from pltcl.c).
> 
> While I haven't looked at the code, this seems worthwhile from the 
> description.
> 
> > The second one is the actual split. plpython.c has been split into 11 
> > separate files and one header. 
> 
> Could you comment a bit more about what the goal of this is?  We don't 
> have a reviewer for this patch yet, and I think part of the reason is 
> because it's not really obvious what it's supposed to be doing, and why 
> that's useful.

I will look into it.



Re: splitting plpython into smaller parts

From
Jan Urbański
Date:
Rebased against master after the SPI cursor patch has been committed.

The first patch removes SPI boilerplate from the cursor functions as
well and the second patch creates a plpython_cursor.c file.

A side effect of creating a separate file for cursors is that I had to
make PLy_spi_transaction_{begin,commit,abort} helper functions external
since they're used both by regular SPI execution functions and the
cursor functions.

They live the plpython_spi.c which is not an ideal place for them, but
IMHO it's not bad either.

Cheers,
Jan

Attachment

Re: splitting plpython into smaller parts

From
Peter Eisentraut
Date:
How to people feel about naming the files (as proposed)

! OBJS = plpython.o plpython_io.o plpython_procedure.o plpython_exec.o \
!        plpython_plpy.o plpython_spi.o plpython_result.o plpython_cursor.o \
!        plpython_plan.o plpython_subtransaction.o plpython_functions.o \
!        plpython_elog.o

vs. say

! OBJS = main.o io.o procedure.o exec.o plpy.o spi.o result.o cursor.o \
!        plan.o subtransaction.o functions.o elog.o

?




Re: splitting plpython into smaller parts

From
Alvaro Herrera
Date:
Excerpts from Peter Eisentraut's message of jue dic 15 12:00:13 -0300 2011:
> How to people feel about naming the files (as proposed)
>
> ! OBJS = plpython.o plpython_io.o plpython_procedure.o plpython_exec.o \
> !        plpython_plpy.o plpython_spi.o plpython_result.o plpython_cursor.o \
> !        plpython_plan.o plpython_subtransaction.o plpython_functions.o \
> !        plpython_elog.o
>
> vs. say
>
> ! OBJS = main.o io.o procedure.o exec.o plpy.o spi.o result.o cursor.o \
> !        plan.o subtransaction.o functions.o elog.o
>
> ?

I find the extra prefix unnecessary and ugly; if we had to had a
prefix, I'd choose a shorter one (maybe "py" instead of "plpython_").

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: splitting plpython into smaller parts

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Excerpts from Peter Eisentraut's message of jue dic 15 12:00:13 -0300 2011:
>> How to people feel about naming the files (as proposed)
>> 
>> ! OBJS = plpython.o plpython_io.o plpython_procedure.o plpython_exec.o \
>> !        plpython_plpy.o plpython_spi.o plpython_result.o plpython_cursor.o \
>> !        plpython_plan.o plpython_subtransaction.o plpython_functions.o \
>> !        plpython_elog.o
>> 
>> vs. say
>> 
>> ! OBJS = main.o io.o procedure.o exec.o plpy.o spi.o result.o cursor.o \
>> !        plan.o subtransaction.o functions.o elog.o
>> 
>> ?

> I find the extra prefix unnecessary and ugly; if we had to had a
> prefix, I'd choose a shorter one (maybe "py" instead of "plpython_").

+1 for a prefix, mainly because the shorter names duplicate some
names already in use elsewhere in our tree.  But I agree with Alvaro
that "py" would be sufficient.
        regards, tom lane


Re: splitting plpython into smaller parts

From
Peter Eisentraut
Date:
On tis, 2011-12-06 at 00:58 +0100, Jan Urbański wrote:
> Rebased against master after the SPI cursor patch has been committed.
> 
> The first patch removes SPI boilerplate from the cursor functions as
> well and the second patch creates a plpython_cursor.c file.
> 
> A side effect of creating a separate file for cursors is that I had to
> make PLy_spi_transaction_{begin,commit,abort} helper functions external
> since they're used both by regular SPI execution functions and the
> cursor functions.
> 
> They live the plpython_spi.c which is not an ideal place for them, but
> IMHO it's not bad either.

Committed now.

I moved a few more things around.  I split up the plpython.h header file
to create a separate header file for each .c file, so that the
hierarchical releationship of the modules is clearer.  (The only cases
of circular relationships should be caused by use of global variables.)
Also, I named the files that contain classes or modules more like they
are in the CPython source code, e.g., plpy_cursorobject.c.



Re: splitting plpython into smaller parts

From
Jan Urbański
Date:
On 18/12/11 20:53, Peter Eisentraut wrote:
> On tis, 2011-12-06 at 00:58 +0100, Jan Urbański wrote:
>> Rebased against master after the SPI cursor patch has been committed.
>>
>> The first patch removes SPI boilerplate from the cursor functions as
>> well and the second patch creates a plpython_cursor.c file.
>>
>> A side effect of creating a separate file for cursors is that I had to
>> make PLy_spi_transaction_{begin,commit,abort} helper functions external
>> since they're used both by regular SPI execution functions and the
>> cursor functions.
>>
>> They live the plpython_spi.c which is not an ideal place for them, but
>> IMHO it's not bad either.
>
> Committed now.

Great, thanks! I hope this will make for a more maintanable PL/Python.

By the way, the buildfarm is turning red because it's missing the
attached patch.

Cheers,
Jan

Attachment