Thread: pg_proc probin misuse

pg_proc probin misuse

From
James William Pye
Date:
Hi,

In PL/Py, I had the bright idea of storing bytecode in the probin field of the
function's pg_proc row. However, this idea has lately become rather dim as I
have recently rediscovered(thanks Adrian) that this breaks dumps; pg_dump outputs
a PL/Py function as "CREATE FUNCTION x() RETURNS y LANGUAGE python AS
'<bytecode>', '<source>'". Of course, when loading this, it fails:
'ERROR:  only one AS item needed for language "python"'.

So is this "fix your broken PL" or "pg_dump should only be doing that for C
language functions"? I imagine the former, so if that is the case perhaps
the 'probin' column description at [1] should be reworded to ensure others don't
get the same bright idea(the "language specific" part in particular).
Ugh, even if it were the latter, I would still be breaking existing versions,
so I'm inclined to fix it regardless..

Have a good evening (afternoon, morning, etc :).

[1] http://www.postgresql.org/docs/8.1/static/catalog-pg-proc.html
[Yeah, I do see the clarification at the bottom of the page. :( ]


Re: pg_proc probin misuse

From
Tom Lane
Date:
James William Pye <pgsql@jwp.name> writes:
> In PL/Py, I had the bright idea of storing bytecode in the probin field of the
> function's pg_proc row. However, this idea has lately become rather dim as I
> have recently rediscovered(thanks Adrian) that this breaks dumps; pg_dump outputs
> a PL/Py function as "CREATE FUNCTION x() RETURNS y LANGUAGE python AS
> '<bytecode>', '<source>'". Of course, when loading this, it fails:
> 'ERROR:  only one AS item needed for language "python"'.

> So is this "fix your broken PL" or "pg_dump should only be doing that for C
> language functions"?

Offhand it seems to me that pg_dump is behaving reasonably: it's storing
probin if it sees something there to be stored.  The asymmetry is in the
backend, specifically functioncmds.c's interpret_AS_clause(): it has a
hardwired assumption that probin is only relevant to C functions.

Feel free to propose a saner definition.  AFAICS the current coding
makes probin useless for all except C functions, so I think it could
be improved.
        regards, tom lane


Re: pg_proc probin misuse

From
James William Pye
Date:
On Fri, May 26, 2006 at 11:21:32PM -0400, Tom Lane wrote:
> James William Pye <pgsql@jwp.name> writes:
> > So is this "fix your broken PL" or "pg_dump should only be doing that for C
> > language functions"?
> 
> Offhand it seems to me that pg_dump is behaving reasonably: it's storing
> probin if it sees something there to be stored.  The asymmetry is in the
> backend, specifically functioncmds.c's interpret_AS_clause(): it has a
> hardwired assumption that probin is only relevant to C functions.
>
> Feel free to propose a saner definition.  AFAICS the current coding
> makes probin useless for all except C functions, so I think it could
> be improved.

I guess there are two ways to go about it. Simply remove the assumption that
probin is only relevant to C functions; perhaps allowing a hardwired exception
for builtin languages where allowing probin to be set would be deemed unsightly
(ie, the easy way ;). Or, add a column to pg_language that specifies the
language's probin usage so that pg_dump and the backend have an idea of how to
handle these things for the given language(the "takes a bit more work" way).
[I imagine the former could gracefully lead into the latter as well.]


Re: pg_proc probin misuse

From
Tom Lane
Date:
James William Pye <pgsql@jwp.name> writes:
> I guess there are two ways to go about it. Simply remove the
> assumption that probin is only relevant to C functions; perhaps
> allowing a hardwired exception for builtin languages where allowing
> probin to be set would be deemed unsightly (ie, the easy way ;). Or,
> add a column to pg_language that specifies the language's probin usage
> so that pg_dump and the backend have an idea of how to handle these
> things for the given language(the "takes a bit more work" way).  [I
> imagine the former could gracefully lead into the latter as well.]

I believe the reason interpret_AS_clause() is written the way that it is
is to provide some error checking, ie, not let the user specify a probin
clause for languages where it's not meaningful.  That check predates the
invention of language validator functions, IIRC.  It'd probably make
sense to get rid of the centralized check and expect the validator
functions to do it instead.

But we're still avoiding the central issue: does it make sense to dump a
probin clause at all for plpython functions?  If it's a compiled form of
prosrc then it probably doesn't belong in the dump.

On reflection I'm kind of inclined to think that plpython is abusing the
column.  If it were really expensive to derive bytecode from source text
then maybe it'd make sense to do what you're doing, but surely that's
not all that expensive.  Everyone else manages to parse prosrc on the
fly and cache the result in memory; why isn't plpython doing that?

If we think that plpython is leading the wave of the future, I'd be kind
of inclined to invent a new pg_proc column in which derived text can be
stored, rather than trying to use probin for the purpose.  Although
arguably probin itself was once meant to do that, there's too much
baggage now.
        regards, tom lane


Re: pg_proc probin misuse

From
PFC
Date:
> If it were really expensive to derive bytecode from source text
> then maybe it'd make sense to do what you're doing, but surely that's
> not all that expensive.  Everyone else manages to parse prosrc on the
> fly and cache the result in memory; why isn't plpython doing that?
It depends on the number of imported modules in the function. If it  
imports a lot of modules, it can take some time to compile a python  
function (especially if the modules have some initialisation code which  
must be run on import).


Re: pg_proc probin misuse

From
James William Pye
Date:
On Sun, May 28, 2006 at 09:12:34PM -0400, Tom Lane wrote:
> But we're still avoiding the central issue: does it make sense to dump a
> probin clause at all for plpython functions?  If it's a compiled form of
> prosrc then it probably doesn't belong in the dump.

That's why I initially thought pg_dump or I was the dirty one.
Even if CREATE FUNCTION would take it, the probin value would be ignored(well,
overwritten).

> On reflection I'm kind of inclined to think that plpython is abusing the
> column.  If it were really expensive to derive bytecode from source text
> then maybe it'd make sense to do what you're doing, but surely that's
> not all that expensive.  Everyone else manages to parse prosrc on the
> fly and cache the result in memory; why isn't plpython doing that?

Yeah, I don't think it's expensive. It wasn't a feature that I implemented
out of any kind of demand or testing. Rather, I knew I could marshal code
objects, and I figured it would likely yield some improvement on initial loads,
so I implemented it.

> If we think that plpython is leading the wave of the future, I'd be kind
> of inclined to invent a new pg_proc column in which derived text can be
> stored, rather than trying to use probin for the purpose.  Although
> arguably probin itself was once meant to do that, there's too much
> baggage now.

I think having something like that in pg_proc could be useful. Certainly my case
may not really be demanding, but I guess there may be some languages that could
enjoy a good benefit from avoiding recompilation. Tho, such a column seems like
it would be more of a mere convenience for PL authors. If initial load were
truly that expensive, I would think that it would justify creating a table
containing compiled code and taking the extra lookup hit on initial load.


Re: pg_proc probin misuse

From
Tom Lane
Date:
PFC <lists@peufeu.com> writes:
>> If it were really expensive to derive bytecode from source text
>> then maybe it'd make sense to do what you're doing, but surely that's
>> not all that expensive.  Everyone else manages to parse prosrc on the
>> fly and cache the result in memory; why isn't plpython doing that?

>     It depends on the number of imported modules in the function. If it  
> imports a lot of modules, it can take some time to compile a python  
> function (especially if the modules have some initialisation code which  
> must be run on import).

Surely the initialization code would have to be run anyway ... and if
the function does import a pile of modules, do you really want to cache
all that in its pg_proc entry?  What happens if some of the modules get
updated later?
        regards, tom lane


Re: pg_proc probin misuse

From
Bruce Momjian
Date:
Is there a TODO here?

---------------------------------------------------------------------------

Tom Lane wrote:
> PFC <lists@peufeu.com> writes:
> >> If it were really expensive to derive bytecode from source text
> >> then maybe it'd make sense to do what you're doing, but surely that's
> >> not all that expensive.  Everyone else manages to parse prosrc on the
> >> fly and cache the result in memory; why isn't plpython doing that?
> 
> >     It depends on the number of imported modules in the function. If it  
> > imports a lot of modules, it can take some time to compile a python  
> > function (especially if the modules have some initialisation code which  
> > must be run on import).
> 
> Surely the initialization code would have to be run anyway ... and if
> the function does import a pile of modules, do you really want to cache
> all that in its pg_proc entry?  What happens if some of the modules get
> updated later?
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
> 

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: pg_proc probin misuse

From
PFC
Date:
Hm, thinking again, I guess Tom Lane is right

>> Surely the initialization code would have to be run anyway ... and if
>> the function does import a pile of modules, do you really want to cache
>> all that in its pg_proc entry?  What happens if some of the modules get
>> updated later?
Besides, what happens if you store compiled bytecode in a table, then  
upgrade the python interpreter to a new version... would it be compatible  
? I suppose so, but I don't really know...Persistent connections should be used anyway, this makes the RAM caching  
good...