Thread: pg_proc probin misuse
Hi, In PL/Py, I had the bright idea of storing bytecode in the probin field of the function's pg_proc row. However, this idea has lately become rather dim as I have recently rediscovered(thanks Adrian) that this breaks dumps; pg_dump outputs a PL/Py function as "CREATE FUNCTION x() RETURNS y LANGUAGE python AS '<bytecode>', '<source>'". Of course, when loading this, it fails: 'ERROR: only one AS item needed for language "python"'. So is this "fix your broken PL" or "pg_dump should only be doing that for C language functions"? I imagine the former, so if that is the case perhaps the 'probin' column description at [1] should be reworded to ensure others don't get the same bright idea(the "language specific" part in particular). Ugh, even if it were the latter, I would still be breaking existing versions, so I'm inclined to fix it regardless.. Have a good evening (afternoon, morning, etc :). [1] http://www.postgresql.org/docs/8.1/static/catalog-pg-proc.html [Yeah, I do see the clarification at the bottom of the page. :( ]
James William Pye <pgsql@jwp.name> writes: > In PL/Py, I had the bright idea of storing bytecode in the probin field of the > function's pg_proc row. However, this idea has lately become rather dim as I > have recently rediscovered(thanks Adrian) that this breaks dumps; pg_dump outputs > a PL/Py function as "CREATE FUNCTION x() RETURNS y LANGUAGE python AS > '<bytecode>', '<source>'". Of course, when loading this, it fails: > 'ERROR: only one AS item needed for language "python"'. > So is this "fix your broken PL" or "pg_dump should only be doing that for C > language functions"? Offhand it seems to me that pg_dump is behaving reasonably: it's storing probin if it sees something there to be stored. The asymmetry is in the backend, specifically functioncmds.c's interpret_AS_clause(): it has a hardwired assumption that probin is only relevant to C functions. Feel free to propose a saner definition. AFAICS the current coding makes probin useless for all except C functions, so I think it could be improved. regards, tom lane
On Fri, May 26, 2006 at 11:21:32PM -0400, Tom Lane wrote: > James William Pye <pgsql@jwp.name> writes: > > So is this "fix your broken PL" or "pg_dump should only be doing that for C > > language functions"? > > Offhand it seems to me that pg_dump is behaving reasonably: it's storing > probin if it sees something there to be stored. The asymmetry is in the > backend, specifically functioncmds.c's interpret_AS_clause(): it has a > hardwired assumption that probin is only relevant to C functions. > > Feel free to propose a saner definition. AFAICS the current coding > makes probin useless for all except C functions, so I think it could > be improved. I guess there are two ways to go about it. Simply remove the assumption that probin is only relevant to C functions; perhaps allowing a hardwired exception for builtin languages where allowing probin to be set would be deemed unsightly (ie, the easy way ;). Or, add a column to pg_language that specifies the language's probin usage so that pg_dump and the backend have an idea of how to handle these things for the given language(the "takes a bit more work" way). [I imagine the former could gracefully lead into the latter as well.]
James William Pye <pgsql@jwp.name> writes: > I guess there are two ways to go about it. Simply remove the > assumption that probin is only relevant to C functions; perhaps > allowing a hardwired exception for builtin languages where allowing > probin to be set would be deemed unsightly (ie, the easy way ;). Or, > add a column to pg_language that specifies the language's probin usage > so that pg_dump and the backend have an idea of how to handle these > things for the given language(the "takes a bit more work" way). [I > imagine the former could gracefully lead into the latter as well.] I believe the reason interpret_AS_clause() is written the way that it is is to provide some error checking, ie, not let the user specify a probin clause for languages where it's not meaningful. That check predates the invention of language validator functions, IIRC. It'd probably make sense to get rid of the centralized check and expect the validator functions to do it instead. But we're still avoiding the central issue: does it make sense to dump a probin clause at all for plpython functions? If it's a compiled form of prosrc then it probably doesn't belong in the dump. On reflection I'm kind of inclined to think that plpython is abusing the column. If it were really expensive to derive bytecode from source text then maybe it'd make sense to do what you're doing, but surely that's not all that expensive. Everyone else manages to parse prosrc on the fly and cache the result in memory; why isn't plpython doing that? If we think that plpython is leading the wave of the future, I'd be kind of inclined to invent a new pg_proc column in which derived text can be stored, rather than trying to use probin for the purpose. Although arguably probin itself was once meant to do that, there's too much baggage now. regards, tom lane
> If it were really expensive to derive bytecode from source text > then maybe it'd make sense to do what you're doing, but surely that's > not all that expensive. Everyone else manages to parse prosrc on the > fly and cache the result in memory; why isn't plpython doing that? It depends on the number of imported modules in the function. If it imports a lot of modules, it can take some time to compile a python function (especially if the modules have some initialisation code which must be run on import).
On Sun, May 28, 2006 at 09:12:34PM -0400, Tom Lane wrote: > But we're still avoiding the central issue: does it make sense to dump a > probin clause at all for plpython functions? If it's a compiled form of > prosrc then it probably doesn't belong in the dump. That's why I initially thought pg_dump or I was the dirty one. Even if CREATE FUNCTION would take it, the probin value would be ignored(well, overwritten). > On reflection I'm kind of inclined to think that plpython is abusing the > column. If it were really expensive to derive bytecode from source text > then maybe it'd make sense to do what you're doing, but surely that's > not all that expensive. Everyone else manages to parse prosrc on the > fly and cache the result in memory; why isn't plpython doing that? Yeah, I don't think it's expensive. It wasn't a feature that I implemented out of any kind of demand or testing. Rather, I knew I could marshal code objects, and I figured it would likely yield some improvement on initial loads, so I implemented it. > If we think that plpython is leading the wave of the future, I'd be kind > of inclined to invent a new pg_proc column in which derived text can be > stored, rather than trying to use probin for the purpose. Although > arguably probin itself was once meant to do that, there's too much > baggage now. I think having something like that in pg_proc could be useful. Certainly my case may not really be demanding, but I guess there may be some languages that could enjoy a good benefit from avoiding recompilation. Tho, such a column seems like it would be more of a mere convenience for PL authors. If initial load were truly that expensive, I would think that it would justify creating a table containing compiled code and taking the extra lookup hit on initial load.
PFC <lists@peufeu.com> writes: >> If it were really expensive to derive bytecode from source text >> then maybe it'd make sense to do what you're doing, but surely that's >> not all that expensive. Everyone else manages to parse prosrc on the >> fly and cache the result in memory; why isn't plpython doing that? > It depends on the number of imported modules in the function. If it > imports a lot of modules, it can take some time to compile a python > function (especially if the modules have some initialisation code which > must be run on import). Surely the initialization code would have to be run anyway ... and if the function does import a pile of modules, do you really want to cache all that in its pg_proc entry? What happens if some of the modules get updated later? regards, tom lane
Is there a TODO here? --------------------------------------------------------------------------- Tom Lane wrote: > PFC <lists@peufeu.com> writes: > >> If it were really expensive to derive bytecode from source text > >> then maybe it'd make sense to do what you're doing, but surely that's > >> not all that expensive. Everyone else manages to parse prosrc on the > >> fly and cache the result in memory; why isn't plpython doing that? > > > It depends on the number of imported modules in the function. If it > > imports a lot of modules, it can take some time to compile a python > > function (especially if the modules have some initialisation code which > > must be run on import). > > Surely the initialization code would have to be run anyway ... and if > the function does import a pile of modules, do you really want to cache > all that in its pg_proc entry? What happens if some of the modules get > updated later? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Hm, thinking again, I guess Tom Lane is right >> Surely the initialization code would have to be run anyway ... and if >> the function does import a pile of modules, do you really want to cache >> all that in its pg_proc entry? What happens if some of the modules get >> updated later? Besides, what happens if you store compiled bytecode in a table, then upgrade the python interpreter to a new version... would it be compatible ? I suppose so, but I don't really know...Persistent connections should be used anyway, this makes the RAM caching good...