Thread: plpython

plpython

From

James Pye

Date:

04 September 2003, 19:02:18

Greetings,
I've recently been spending some quality time with the plpython module, and I think I'm well on the road to an improved
versionof it(although, nothing about a trusted variant). By improved, I mostly mean cleaned up, and reorganized..

Here are some of the changes that I have made in my own version:
Compilation and execution have been greatly simplified and should be faster(at least execution should be).Caching of
compiledcode no longer references a Python dictionary(PLyProcedureCache). The handler keeps its own vector of procedure
structs(shouldbe faster, and is trivial).Removal of plpython generated dictionaries SD and GD. They don't seem be very
useful,as they are forgotten when the postmaster exits and not remembered when a new one starts. SD is questionable,
does/didanyone find SD very useful? GD seems almost pointless as the global keyword should be sufficient. Although, I
dothink there was a mention of GD being "safe globals", but I don't know why it would be safer than "global
var".Removalof the built-in "plpy" python module that plpython creates. This is done because it provides interfaces to
pgsqlfunctions that I feel should be located elsewhere; elsewhere being another python module. I've already generated a
preliminaryinterface to elog and SPI_* with SWIG that at first glance seems quite functional(it links, and is at least
ableto properly call elog, I haven't really tested SPI).Improvement to tracebacks, as it now NOTICE's the python
tracebacks(Thereis already an ERROR, so I don't think WARNING is necessary). PLy_traceback, originally, seemed to
ignorethe tb of the PyErr_Fetch.Removal of plpython type conversion routines and data structures. This was done because
Ifelt that there was a better way to do it. Not sure what yet, as it is one of my questions to the list, but it will
probablyend up being a similar implementation.I also plan to make some changes to trigger handling, but I haven't done
anythingworth mentioning yet..

Type conversion
plpython's current type conversion implementation appears to be dependent on strings as the common format. This is
fine,but not very extensible as is, unless you don't mind explicitly parsing strings inside each function that takes an
unsupporteddata type.I was thinking that a better solution would be creating a python object type inside the database.
Thusallowing users to write casts to and from non-standard or unimplemented data types with little difficulty(well,
maybesome :). This would allow conversion in an extensible way, which doesn't require modification to plpython. Storage
couldbe easily achieved by pickling the object.Another thought would be to just pass valid PyObject pointers in and out
ofconversion procedures, effectively disallowing storage(outside the process in which the object was created in),
unlessit is possible to have a persistent storage mechanism that makes it possible to go through pickle?.?..(yeah, I'm
newto pgsql dev).

Python PostgreSQL Interface
plpython, currently, implements its own built-in module to interface with a few pgsql routines, and it works, but I
feelit should be located elsewhere, as I said before.For the most part, I can only see most people using elog, and SPI
withinplpy, but perhaps that is too narrow of a view. Perhaps it would be useful to many to have access to some backend
routinesthrough plpy, but I'm not sure and that is why I'm asking the list.How far should such an PostgreSQL interface
modulego?What should its name be if full/semi-full interface is created? I was thinking simply py-pgsql as the package
name,and the module name, of course, would be pgsql.What should the name be if it was only elog and SPI? py-pgspi?I'm
leaningtowards py-pgsql, a partial interface consisting of elog and SPI and perhaps a few other useful routines. But
havethe module as a package as to allow easy extensions to the package as subpackages..From this interface, a DB-API
2.0compatible SPI interface will come as well.

My version has a short ways to go before it is ready for usage, but if you want to see what I've done, just drop me an
e-mail.

Comments? Criticisms? Feature suggestions?
Anyone else doing significant work on plpython?

-James

Re: plpython

From

elein

Date:

05 September 2003, 10:22:54

NO!!! Don't remove SD and GD!!! They are useful.
I use them in several applications, primarily
for running aggregates.

What needs to be fixed is that the SD needs to be
initialized at the start of each statement.
Joe Conway just implemented this in Pl/R and
Tom Lane had an idea about it too.

See http:/www.varlena.com/GeneralBits/TidBits
for the talk and code I gave on running aggregation
with plpython at OSCON.  It illustrates the
initialization problem.

And don't remove plpy.  You can move it or replace
its implementation, but do not remove it.  People
are really using these things.

People are also depending on python's loose
type conversion from strings.  If you add another
kind of conversion interpretation, you must keep the backward
compatibility or call it something different.

It seems to me is that you need to talk more to
people using plpython.  I am just one person.
There are others.  I hope I've misunderstood you 
about some of these things...

elein@varlena.com

On Thu, Sep 04, 2003 at 03:01:57PM -0700, James Pye wrote:
> 
> Greetings,
> 
>     I've recently been spending some quality time with the plpython module, and I think I'm well on the road to an
improvedversion of it(although, nothing about a trusted variant).  By improved, I mostly mean cleaned up, and
reorganized..
> 
> Here are some of the changes that I have made in my own version:
> 
>     Compilation and execution have been greatly simplified and should be faster(at least execution should be).
>     Caching of compiled code no longer references a Python dictionary(PLyProcedureCache). The handler keeps its own
vectorof procedure structs(should be faster, and is trivial).
 
>     Removal of plpython generated dictionaries SD and GD. They don't seem be very useful, as they are forgotten when
thepostmaster exits and not remembered when a new one starts. SD is questionable, does/did anyone find SD very useful?
GDseems almost pointless as the global keyword should be sufficient. Although, I do think there was a mention of GD
being"safe globals", but I don't know why it would be safer than "global var".
 
>     Removal of the built-in "plpy" python module that plpython creates. This is done because it provides interfaces
topgsql functions that I feel should be located elsewhere; elsewhere being another python module. I've already
generateda preliminary interface to elog and SPI_* with SWIG that at first glance seems quite functional(it links, and
isat least able to properly call elog, I haven't really tested SPI).
 
>     Improvement to tracebacks, as it now NOTICE's the python tracebacks(There is already an ERROR, so I don't think
WARNINGis necessary). PLy_traceback, originally, seemed to ignore the tb of the PyErr_Fetch.
 
>     Removal of plpython type conversion routines and data structures. This was done because I felt that there was a
betterway to do it. Not sure what yet, as it is one of my questions to the list, but it will probably end up being a
similarimplementation.
 
>     I also plan to make some changes to trigger handling, but I haven't done anything worth mentioning yet..
> 
> 
> Type conversion
> 
>     plpython's current type conversion implementation appears to be dependent on strings as the common format. This
isfine, but not very extensible as is, unless you don't mind explicitly parsing strings inside each function that takes
anunsupported data type.
 
>     I was thinking that a better solution would be creating a python object type inside the database. Thus allowing
usersto write casts to and from non-standard or unimplemented data types with little difficulty(well, maybe some :).
Thiswould allow conversion in an extensible way, which doesn't require modification to plpython. Storage could be
easilyachieved by pickling the object.
 
>     Another thought would be to just pass valid PyObject pointers in and out of conversion procedures, effectively
disallowingstorage(outside the process in which the object was created in), unless it is possible to have a persistent
storagemechanism that makes it possible to go through pickle?.?..(yeah, I'm new to pgsql dev).
 
> 
> 
> Python PostgreSQL Interface
> 
>     plpython, currently, implements its own built-in module to interface with a few pgsql routines, and it works, but
Ifeel it should be located elsewhere, as I said before.
 
>     For the most part, I can only see most people using elog, and SPI within plpy, but perhaps that is too narrow of
aview. Perhaps it would be useful to many to have access to some backend routines through plpy, but I'm not sure and
thatis why I'm asking the list.
 
>     How far should such an PostgreSQL interface module go?
>     What should its name be if full/semi-full interface is created? I was thinking simply py-pgsql as the package
name,and the module name, of course, would be pgsql.
 
>     What should the name be if it was only elog and SPI? py-pgspi?
>     I'm leaning towards py-pgsql, a partial interface consisting of elog and SPI and perhaps a few other useful
routines.But have the module as a package as to allow easy extensions to the package as subpackages..
 
>     From this interface, a DB-API 2.0 compatible SPI interface will come as well.
> 
> 
>     My version has a short ways to go before it is ready for usage, but if you want to see what I've done, just drop
mean e-mail.
 
> 
> 
> Comments? Criticisms? Feature suggestions?
> Anyone else doing significant work on plpython?
> 
> 
> -James
>

Re: plpython

From

elein

Date:

06 September 2003, 15:46:39

The key value of having both SD vs. GB is scope.
We *do* want to be able to have dictionaries with
scope that is function specific,
statement specific and global (available to all functions).

I do use plpython primarily for running aggregates.

Having the different scopes (if they all worked 
correctly) would enable 1) multiple calls to the same function within   a statement to use the same SD["whatever"]   to
storedata cleanly and clearly without   overwriting the other instance's values.2) Allow any functions within the same
statement  access to a statement dictionary.3) Global allows any function instance in   any statement access to the
samevalues.   You can change the name if you must, but   I don't see the point.
 

I don't understand why you would want to move
plpy out.  If you are reimplementing, why don't
you do it under the plpy wrappers?  Am I missing
something.  Taking out the functionality to
execute functions and calling notice and replacing
them with other named calls also seems pointless to me.

One of two primary features of good pl languages *is*
the ability to run queries and interface with the
database.  I can certainly understand the separation
with regards to implementation of the language,
however I cannot see that it is appropriate for
the interface level.

The discussions with regards to scoping was not 
necessarily on list.  Most of it was in person.
I suggest contacting Joe and Tom directly or via
this discussion.  I strongly suspect Jan knows
about this as well as I believe tcl supports this too.

Exactly how will your type conversions change
what people program?  Your description and
explanation are not clear.  Remember it is the
interface that must remain stable.

Be sure that you are do not 
eliminate capabilities that are common in
all/most of the languages.  Pl/R is a relatively
new language implemented by Joe Conway.  It
would be really good to talk over some of the
design decisions with him.  Jan, et. al, have
done a particularly thorough job with pltcl.
He also should be a key person with whom to
discuss major interfaces changes.

Make sure your vision of plpython matches
the basic framework of other procedural
languages in postgres.

elein@varlena.com
=============================================================
elein@varlena.com                             www.varlena.com               PostgreSQL Consulting & Support

 
PostgreSQL General Bits   http://www.varlena.com/GeneralBits/
=============================================================
"Sometimes we are confronted with more data than we can really use,
and it may be wisest to forget and to destroy most of it"
-- Donald Knuth, The Art of Computer Programming


On Thu, Sep 04, 2003 at 11:41:14PM -0700, James Pye wrote:
> Greetings,
> 
>     Thanks for your e-mail, I really do appreciate the feedback. :)
> 
>     First of all, I was planning on calling it plpy(should have said something in the e-mail), as to not necessarily
showbackward compatibility with plpython, but I can understand the annoyance that it would be for whatever users that
wantedto take advantage of whatever improvements that I am able to generate.
 
>     Perhaps it would be wise to provide "legacy" support as a compilation option for some period of time, if I
actuallymove away from some/most of plpython's supplied features. It should be fairly easy to do..
 
> 
> > NO!!! Don't remove SD and GD!!! They are useful.
> > I use them in several applications, primarily
> > for running aggregates.
> 
>     Perhaps I will keep SD, but I will talk to some more users. SD could easily be emulated through the use of
globals,but dealing with initialization inside every procedure would be a hassle, so I probably will keep it.
 
>     I disagree with you about GD, as I said in my e-mail you can easily, and more naturally(IMO, as it is part of
Pythonitself), use the Python "global" keyword. As far as it being unsafe(I think the docs declare that somewhere, or
thatGD is safe), I don't understand what makes it safer than `global myglobal`, other than the fact that it doesn't
dealwith the real globals dictionary.
 
>     Although, I suppose it may be nice to have the extra clarification that the coder is wanting to access global
databy explicitly specifying GD everywhere it is accessed, but does that really justify its creation(not a serious
issue,but why have it)?
 
> 
> 
> > What needs to be fixed is that the SD needs to be
> > initialized at the start of each statement.
> > Joe Conway just implemented this in Pl/R and
> > Tom Lane had an idea about it too.
> 
>     Very good point. I will look into this more. Do you remember what the subject of the hacker's thread was that
discussedthis? Was it recent? I'll search the archives a bit and see if I can find it...
 
>     Although, it seems to me that despite that aggregate data is static, it is only static in a statement-instance
context/scope,which SD does not care about. It seems to me that SD does the job that it is supposed to do, as it was
notmade for a statement-instance context. What I mean by this, is SD should not be changed, but rather a special
dictionary,or variable should be created and maintained that is static on the statement-instance level. I don't know if
itis possible to make the necessary distinctions from fcinfo.(Again, I'm pretty new to pgsql development, so I'll make
thisa point of research..)
 
> 
>     Other than aggregates, what do you or would you use SD for?
> 
> 
> > And don't remove plpy.  You can move it or replace
> > its implementation, but do not remove it.  People
> > are really using these things.
> 
>     I was planning on replacing it. One of my questions to the list was about the Python PostgreSQL interface
module(ie,what is now the built-in 'plpy' module). I feel that it should not be *built-in* to plpython, but rather an
extramodule that may be installed. It makes more sense to me to have it as a separate project that specifies the pgsql
interfacemodule, as I believe it is logically a separate module. Thus, you would `import pgsql`, or whatever the
wrapper'smodule name becomes, when you need its functionality.
 
> 
> 
> > People are also depending on python's loose
> > type conversion from strings.  If you add another
> > kind of conversion interpretation, you must keep the backward
> > compatibility or call it something different.
> 
>     Hrm, well I was planning on having a "failsafe" to convert the Datum to a string, so there shouldn't be much
troublethere. The point of the new system would be to allow explicit specification of conversion functions(using CASTS)
insidethe database, without any need to change anything inside the plpython module to provide "clean" support for a
type.
>     Considering that the pl's ability to convert is dependent upon CASTS, one could easily exclude conversion support
fora given type if his current procedures depend on receiving a string, thus receiving a string that the "failsafe"
wouldprovide.
 
> 
> 
> > It seems to me is that you need to talk more to
> > people using plpython.  I am just one person.
> > There are others.  I hope I've misunderstood you 
> > about some of these things...
>     
>     I completely agree. The whole point of my e-mail to the list was to introduce my intentions and, more
importantly,to get some feedback(especially about type conversion and the pgsql interface).
 
> 
>     I hope this clears things up a bit.
>     Hrm, I was hoping for more responses to my message to the hackers list(as of right now, yours is the only one),
perhapsI will post some questions to the general pgsql list..
 
> 
> 
> Thanks again for your comments, more are welcome as well. :)
> 
> -James
> 
> 
> 
> > elein@varlena.com
> > 
> > On Thu, Sep 04, 2003 at 03:01:57PM -0700, James Pye wrote:
> > > 
> > > Greetings,
> > > 
> > >     I've recently been spending some quality time with the plpython module, and I think I'm well on the road to
animproved version of it(although, nothing about a trusted variant).  By improved, I mostly mean cleaned up, and
reorganized..
> > > 
> > > Here are some of the changes that I have made in my own version:
> > > 
> > >     Compilation and execution have been greatly simplified and should be faster(at least execution should be).
> > >     Caching of compiled code no longer references a Python dictionary(PLyProcedureCache). The handler keeps its
ownvector of procedure structs(should be faster, and is trivial).
 
> > >     Removal of plpython generated dictionaries SD and GD. They don't seem be very useful, as they are forgotten
whenthe postmaster exits and not remembered when a new one starts. SD is questionable, does/did anyone find SD very
useful?GD seems almost pointless as the global keyword should be sufficient. Although, I do think there was a mention
ofGD being "safe globals", but I don't know why it would be safer than "global var".
 
> > >     Removal of the built-in "plpy" python module that plpython creates. This is done because it provides
interfacesto pgsql functions that I feel should be located elsewhere; elsewhere being another python module. I've
alreadygenerated a preliminary interface to elog and SPI_* with SWIG that at first glance seems quite functional(it
links,and is at least able to properly call elog, I haven't really tested SPI).
 
> > >     Improvement to tracebacks, as it now NOTICE's the python tracebacks(There is already an ERROR, so I don't
thinkWARNING is necessary). PLy_traceback, originally, seemed to ignore the tb of the PyErr_Fetch.
 
> > >     Removal of plpython type conversion routines and data structures. This was done because I felt that there was
abetter way to do it. Not sure what yet, as it is one of my questions to the list, but it will probably end up being a
similarimplementation.
 
> > >     I also plan to make some changes to trigger handling, but I haven't done anything worth mentioning yet..
> > > 
> > > 
> > > Type conversion
> > > 
> > >     plpython's current type conversion implementation appears to be dependent on strings as the common format.
Thisis fine, but not very extensible as is, unless you don't mind explicitly parsing strings inside each function that
takesan unsupported data type.
 
> > >     I was thinking that a better solution would be creating a python object type inside the database. Thus
allowingusers to write casts to and from non-standard or unimplemented data types with little difficulty(well, maybe
some:). This would allow conversion in an extensible way, which doesn't require modification to plpython. Storage could
beeasily achieved by pickling the object.
 
> > >     Another thought would be to just pass valid PyObject pointers in and out of conversion procedures,
effectivelydisallowing storage(outside the process in which the object was created in), unless it is possible to have a
persistentstorage mechanism that makes it possible to go through pickle?.?..(yeah, I'm new to pgsql dev).
 
> > > 
> > > 
> > > Python PostgreSQL Interface
> > > 
> > >     plpython, currently, implements its own built-in module to interface with a few pgsql routines, and it works,
butI feel it should be located elsewhere, as I said before.
 
> > >     For the most part, I can only see most people using elog, and SPI within plpy, but perhaps that is too narrow
ofa view. Perhaps it would be useful to many to have access to some backend routines through plpy, but I'm not sure and
thatis why I'm asking the list.
 
> > >     How far should such an PostgreSQL interface module go?
> > >     What should its name be if full/semi-full interface is created? I was thinking simply py-pgsql as the package
name,and the module name, of course, would be pgsql.
 
> > >     What should the name be if it was only elog and SPI? py-pgspi?
> > >     I'm leaning towards py-pgsql, a partial interface consisting of elog and SPI and perhaps a few other useful
routines.But have the module as a package as to allow easy extensions to the package as subpackages..
 
> > >     From this interface, a DB-API 2.0 compatible SPI interface will come as well.
> > > 
> > > 
> > >     My version has a short ways to go before it is ready for usage, but if you want to see what I've done, just
dropme an e-mail.
 
> > > 
> > > 
> > > Comments? Criticisms? Feature suggestions?
> > > Anyone else doing significant work on plpython?
> > > 
> > > 
> > > -James
> > > 
> >

Re: plpython

From

Tilo Schwarz

Date:

12 September 2003, 17:12:06

James Pye writes:

> Type conversion
>
>     plpython's current type conversion implementation appears to be dependent
> on strings as the common format. This is fine, but not very extensible as
> is, unless you don't mind explicitly parsing strings inside each function
> that takes an unsupported data type. I was thinking that a better solution
> would be creating a python object type inside the database. Thus allowing
> users to write casts to and from non-standard or unimplemented data types
> with little difficulty(well, maybe some :). This would allow conversion in
> an extensible way, which doesn't require modification to plpython. Storage
> could be easily achieved by pickling the object. Another thought would be
> to just pass valid PyObject pointers in and out of conversion procedures,
> effectively disallowing storage(outside the process in which the object was
> created in), unless it is possible to have a persistent storage mechanism
> that makes it possible to go through pickle?.?..(yeah, I'm new to pgsql
> dev).

As a first step I would be already be happy, if plpython would use more Python
datatypes, for example:

- currently, a Point (or a box, polygon, etc.) is returned as a string
"(0, 1)"
instead of a Python tuple (or list)
(0, 1)

- the same holds for arrays: instead of getting an array as string (which I
have to parse into a python list), I would like to get a python list in the
first place.

Regards,
Tilo