Thread: First feature patch for plperl - draft [PATCH]

First feature patch for plperl - draft [PATCH]

From
Tim Bunce
Date:
Building on my earlier plperl refactoring patch, here's a draft of my
first plperl feature patch.

Significant changes in this patch:

- New GUC plperl.on_perl_init='...perl...' for admin use.
- New GUC plperl.on_trusted_init='...perl...' for plperl user use.
- New GUC plperl.on_untrusted_init='...perl...' for plperlu user use.
- END blocks now run at backend exit (fixes bug #5066).
- Stored procedure subs are now given names ($name__$oid).
- More error checking and reporting.
- Warnings no longer have an extra newline in the NOTICE text.
- Various minor optimizations like pre-growing data structures.

I'm working on adding tests and documentation now, meanwhile I'd very
much appreciate any feedback on the patch.

Tim.

p.s. Once this patch is complete I plan to work on patches that:
- add quote_literal and quote_identifier functions in C.
- generalize the Safe setup code to enable more control.
- formalize namespace usage, moving things out of main::
- add a way to perform inter-sub calling (at least for simple cases).
- possibly rewrite _plperl_to_pg_array in C.


Attachment

Re: First feature patch for plperl - draft [PATCH]

From
"David E. Wheeler"
Date:
On Dec 3, 2009, at 3:30 PM, Tim Bunce wrote:

> - New GUC plperl.on_perl_init='...perl...' for admin use.
> - New GUC plperl.on_trusted_init='...perl...' for plperl user use.
> - New GUC plperl.on_untrusted_init='...perl...' for plperlu user use.

Since there is no documentation yet, how do these work, exactly? Or should I just wait for the docs?

> - END blocks now run at backend exit (fixes bug #5066).
> - Stored procedure subs are now given names ($name__$oid).
> - More error checking and reporting.
> - Warnings no longer have an extra newline in the NOTICE text.
> - Various minor optimizations like pre-growing data structures.

Nice.

> I'm working on adding tests and documentation now, meanwhile I'd very
> much appreciate any feedback on the patch.
>
> Tim.
>
> p.s. Once this patch is complete I plan to work on patches that:
> - add quote_literal and quote_identifier functions in C.

I expect you can just use the C versions in PostgreSQL. They're in utils/builtins.h, along with quote_nullable(), which
mightalso be useful to add. 

> - generalize the Safe setup code to enable more control.
> - formalize namespace usage, moving things out of main::

Nice.

> - add a way to perform inter-sub calling (at least for simple cases).
> - possibly rewrite _plperl_to_pg_array in C.

Sounds great, Tim. I'm not really qualified to say anything about the C code, but I'd be happy to try it out once there
aredocs. 

Best,

David




Re: First feature patch for plperl - draft [PATCH]

From
Tim Bunce
Date:
On Thu, Dec 03, 2009 at 04:53:47PM -0800, David E. Wheeler wrote:
> On Dec 3, 2009, at 3:30 PM, Tim Bunce wrote:
> 
> > - New GUC plperl.on_perl_init='...perl...' for admin use.
> > - New GUC plperl.on_trusted_init='...perl...' for plperl user use.
> > - New GUC plperl.on_untrusted_init='...perl...' for plperlu user use.
> 
> Since there is no documentation yet, how do these work, exactly? Or should I just wait for the docs?

The perl code in plperl.on_perl_init gets eval'd as soon as an
interpreter is created. That could be at server startup if
shared_preload_libraries is used. plperl.on_perl_init can only be set by
an admin (PGC_SUSET).

The perl code in plperl.on_trusted_init gets eval'd when an interpreter
is initialized into trusted mode, e.g., used for the plperl language.
The perl code is eval'd inside the Safe compartment.
plperl.on_trusted_init can be set by users but it's only useful if set
before the plperl interpreter is first used.

plperl.on_untrusted_init acts like plperl.on_trusted_init but for
plperlu code.

So, if all three were set then, before any perl stored procedure or DO
block is executed, the interpreter would have executed either
on_perl_init and then on_trusted_init (for plperl), or on_perl_init and
then on_untrusted_init (for plperlu).

> > - END blocks now run at backend exit (fixes bug #5066).
> > - Stored procedure subs are now given names ($name__$oid).
> > - More error checking and reporting.
> > - Warnings no longer have an extra newline in the NOTICE text.
> > - Various minor optimizations like pre-growing data structures.
> 
> Nice.

Thanks.

> > I'm working on adding tests and documentation now, meanwhile I'd very
> > much appreciate any feedback on the patch.
> > 
> > Tim.
> > 
> > p.s. Once this patch is complete I plan to work on patches that:
> > - add quote_literal and quote_identifier functions in C.
> 
> I expect you can just use the C versions in PostgreSQL. They're in utils/builtins.h,

That's my plan. (I've been discussing this and other issues with Andrew
Dunstan via IM.)

> along with quote_nullable(), which might also be useful to add.

I was planning to build that behaviour into quote_literal since it fits
naturally into perl's idea of undef and mirrors DBI's quote() method.
So:   quote_literal(undef) => "NULL"   quote_literal('foo') => "'foo'"

> > - generalize the Safe setup code to enable more control.

Specifically control what gets loaded into the Compartment, what gets
shared with it (e.g. sharing *a & *b as a workaround for the sort bug),
and what class to use for Safe (to enable deeper changes if desired via
subclassing).  Naturally all this is only possible for admin (via
plperl.on_perl_init).

> > - formalize namespace usage, moving things out of main::
> 
> Nice.
>
> > - add a way to perform inter-sub calling (at least for simple cases).

My current plan here is to use an SP::AUTOLOAD to handle loading and
dispatching.  So calling SP::some_random_procedure(...) will trigger
SP::AUTOLOAD to try to resolve "some_random_procedure" to a particular
stored procedure. There are three tricky parts: handling polymorphism (at
least "well enough"), making autoloading of stored procedures work
inside Safe, making it fast. I think I have reasonable approaches for
those but I won't know for sure till I work on it.

> > - possibly rewrite _plperl_to_pg_array in C.
> 
> Sounds great, Tim. I'm not really qualified to say anything about the
> C code, but I'd be happy to try it out once there are docs.

Great. Thanks David.

Tim.


Re: First feature patch for plperl - draft [PATCH]

From
Jeff
Date:
On Dec 4, 2009, at 6:18 AM, Tim Bunce wrote:
>
>>> - generalize the Safe setup code to enable more control.
>

Is there any possible way to enable "use strict;" for plperl (trusted)  
modules?
I would love to have that feature. Sure does help cut down on bugs and  
makes things nicer.

--
Jeff Trout <jeff@jefftrout.com>
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/





Re: First feature patch for plperl - draft [PATCH]

From
Tom Lane
Date:
Jeff <threshar@threshar.is-a-geek.com> writes:
> Is there any possible way to enable "use strict;" for plperl (trusted)  
> modules?

The plperl manual shows a way to do it using some weird syntax or
other.  It'd sure be nice to be able to use the regular syntax though.
        regards, tom lane


Re: First feature patch for plperl - draft [PATCH]

From
"David E. Wheeler"
Date:
On Dec 4, 2009, at 3:18 AM, Tim Bunce wrote:

> The perl code in plperl.on_perl_init gets eval'd as soon as an
> interpreter is created. That could be at server startup if
> shared_preload_libraries is used. plperl.on_perl_init can only be set by
> an admin (PGC_SUSET).

Are multiline GUCs allowed in the postgresql.conf file?

> The perl code in plperl.on_trusted_init gets eval'd when an interpreter
> is initialized into trusted mode, e.g., used for the plperl language.
> The perl code is eval'd inside the Safe compartment.
> plperl.on_trusted_init can be set by users but it's only useful if set
> before the plperl interpreter is first used.

So immediately after connecting would be the place to make sure you do it, IOW.

> plperl.on_untrusted_init acts like plperl.on_trusted_init but for
> plperlu code.
>
> So, if all three were set then, before any perl stored procedure or DO
> block is executed, the interpreter would have executed either
> on_perl_init and then on_trusted_init (for plperl), or on_perl_init and
> then on_untrusted_init (for plperlu).

Awesome, thanks! This is really a great feature.

>> along with quote_nullable(), which might also be useful to add.
>
> I was planning to build that behaviour into quote_literal since it fits
> naturally into perl's idea of undef and mirrors DBI's quote() method.
> So:
>    quote_literal(undef) => "NULL"
>    quote_literal('foo') => "'foo'"

Is there an existing `quote_literal()` in PL/Perl? If so, you might not want to change its behavior.

>>> - generalize the Safe setup code to enable more control.
>
> Specifically control what gets loaded into the Compartment, what gets
> shared with it (e.g. sharing *a & *b as a workaround for the sort bug),
> and what class to use for Safe (to enable deeper changes if desired via
> subclassing).  Naturally all this is only possible for admin (via
> plperl.on_perl_init).

Sounds good.

>>> - formalize namespace usage, moving things out of main::
>>
>> Nice.
>>
>>> - add a way to perform inter-sub calling (at least for simple cases).
>
> My current plan here is to use an SP::AUTOLOAD to handle loading and
> dispatching.  So calling SP::some_random_procedure(...) will trigger
> SP::AUTOLOAD to try to resolve "some_random_procedure" to a particular
> stored procedure. There are three tricky parts: handling polymorphism (at
> least "well enough"), making autoloading of stored procedures work
> inside Safe, making it fast. I think I have reasonable approaches for
> those but I won't know for sure till I work on it.

I'm wondering if there might be some way to use some sort of attributes to identify data types passed to a PL/Perl
functioncalled from another PL/Perl function. Maybe some other functions that identify types, in the case of
ambiguities?
 foo(int(1), text('bar'));

? Kind of ugly, but perhaps only to be used if there are ambiguities? Not sure it's a great idea, mind. Just thinking
outloud (so to speak). 

Best,

David





Re: First feature patch for plperl - draft [PATCH]

From
Tom Lane
Date:
"David E. Wheeler" <david@kineticode.com> writes:
> On Dec 4, 2009, at 3:18 AM, Tim Bunce wrote:
>> The perl code in plperl.on_perl_init gets eval'd as soon as an
>> interpreter is created. That could be at server startup if
>> shared_preload_libraries is used. plperl.on_perl_init can only be set by
>> an admin (PGC_SUSET).

> Are multiline GUCs allowed in the postgresql.conf file?

I don't think so.  In any case this seems like an extreme abuse of the
concept of a GUC, as well as being a solution in search of a problem,
as well as being something that should absolutely not ever happen inside
the postmaster process for both reliability and security reasons.
I vote a big no on this.
        regards, tom lane


Re: First feature patch for plperl - draft [PATCH]

From
"David E. Wheeler"
Date:
On Dec 4, 2009, at 10:36 AM, Tom Lane wrote:

>> Are multiline GUCs allowed in the postgresql.conf file?
>
> I don't think so.  In any case this seems like an extreme abuse of the
> concept of a GUC, as well as being a solution in search of a problem,
> as well as being something that should absolutely not ever happen inside
> the postmaster process for both reliability and security reasons.
> I vote a big no on this.

That's fine. It's relatively simple for an admin to create a Perl module that does everything she wants, call it PGInit
orsomething, and then just make the GUC: 
   plperl.on_perl_init = 'use PGInit;'

Best,

David

Re: First feature patch for plperl - draft [PATCH]

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Jeff <threshar@threshar.is-a-geek.com> writes:
>   
>> Is there any possible way to enable "use strict;" for plperl (trusted)  
>> modules?
>>     
>
> The plperl manual shows a way to do it using some weird syntax or
> other.  It'd sure be nice to be able to use the regular syntax though.
>
>             
>   

As is documented, all you have to do is have:
   custom_variable_classes = 'plperl'   plperl.use_strict = 'true'

in your config. You only need to put the documented BEGIN block in your 
function body if you want to do use strict mode on a case by case basis.

We can't allow an unrestricted "use strict;" in plperl functions because 
it invokes an operation (require) that Safe.pm rightly regards as unsafe.

cheers

andrew


Re: First feature patch for plperl - draft [PATCH]

From
Tom Lane
Date:
"David E. Wheeler" <david@kineticode.com> writes:
> On Dec 4, 2009, at 10:36 AM, Tom Lane wrote:
>> I vote a big no on this.

> That's fine. It's relatively simple for an admin to create a Perl module that does everything she wants, call it
PGInitor something, and then just make the GUC:
 

>     plperl.on_perl_init = 'use PGInit;'

No, you missed the point: I'm objecting to having any such thing as
plperl.on_perl_init, full stop.

Aside from the points I already made, it's not even well defined.
What is to happen if the admin changes the value when the system
is already up?
        regards, tom lane


Re: First feature patch for plperl - draft [PATCH]

From
Jeff
Date:
On Dec 4, 2009, at 1:44 PM, Andrew Dunstan wrote:

>
> As is documented, all you have to do is have:
>
>   custom_variable_classes = 'plperl'
>   plperl.use_strict = 'true'
>
> in your config. You only need to put the documented BEGIN block in  
> your function body if you want to do use strict mode on a case by  
> case basis.
>
> We can't allow an unrestricted "use strict;" in plperl functions  
> because it invokes an operation (require) that Safe.pm rightly  
> regards as unsafe.
>

Yeah, saw that in the manual in the plperl functions & arguments page  
(at the bottom).
I think my confusion came up because I'd read the trust/untrusted  
thing which removes the ability to use use/require.

Maybe a blurb or moving that chunk of doc to the trusted/untrusted  
page might make that tidbit easier to find?

--
Jeff Trout <jeff@jefftrout.com>
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/





Re: First feature patch for plperl - draft [PATCH]

From
Robert Haas
Date:
On Fri, Dec 4, 2009 at 1:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "David E. Wheeler" <david@kineticode.com> writes:
>> On Dec 4, 2009, at 10:36 AM, Tom Lane wrote:
>>> I vote a big no on this.
>
>> That's fine. It's relatively simple for an admin to create a Perl module that does everything she wants, call it
PGInitor something, and then just make the GUC: 
>
>>     plperl.on_perl_init = 'use PGInit;'
>
> No, you missed the point: I'm objecting to having any such thing as
> plperl.on_perl_init, full stop.
>
> Aside from the points I already made, it's not even well defined.
> What is to happen if the admin changes the value when the system
> is already up?

So, do we look for another way to provide the functionality besides
having a GUC, or is the functionality itself bad?

...Robert


Re: First feature patch for plperl - draft [PATCH]

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> So, do we look for another way to provide the functionality besides
> having a GUC, or is the functionality itself bad?

I don't think we want random Perl code running inside the postmaster,
no matter what the API to cause it is.  I might hold my nose for "on
load" code if it can only run in backends, though I still say that
it's a badly designed concept because of the uncertainty about who
will run what when.  Shlib load time is not an event that ought to be
user-visible.
        regards, tom lane


Re: First feature patch for plperl - draft [PATCH]

From
"David E. Wheeler"
Date:
On Dec 4, 2009, at 10:51 AM, Tom Lane wrote:

>>    plperl.on_perl_init = 'use PGInit;'
> 
> No, you missed the point: I'm objecting to having any such thing as
> plperl.on_perl_init, full stop.
> 
> Aside from the points I already made, it's not even well defined.
> What is to happen if the admin changes the value when the system
> is already up?

Nothing. Hence the "init".

Best,

David


Re: First feature patch for plperl - draft [PATCH]

From
"David E. Wheeler"
Date:
On Dec 4, 2009, at 11:05 AM, Tom Lane wrote:

>> So, do we look for another way to provide the functionality besides
>> having a GUC, or is the functionality itself bad?
>
> I don't think we want random Perl code running inside the postmaster,
> no matter what the API to cause it is.  I might hold my nose for "on
> load" code if it can only run in backends, though I still say that
> it's a badly designed concept because of the uncertainty about who
> will run what when.  Shlib load time is not an event that ought to be
> user-visible.

So only the child processes would be allowed to load the code? That could make connections even slower if there's a lot
ofPerl code to be added, though that's also the issue we have today. I guess I could live with that, though I'd rather
havesuch code shared across processes. 

If it's a badly designed concept, do you have any ideas that are less bad?

Best,

David

Re: First feature patch for plperl - draft [PATCH]

From
Robert Haas
Date:
On Fri, Dec 4, 2009 at 2:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> So, do we look for another way to provide the functionality besides
>> having a GUC, or is the functionality itself bad?
>
> I don't think we want random Perl code running inside the postmaster,
> no matter what the API to cause it is.  I might hold my nose for "on
> load" code if it can only run in backends, though I still say that
> it's a badly designed concept because of the uncertainty about who
> will run what when.  Shlib load time is not an event that ought to be
> user-visible.

I agree that the uncertainty is not a wonderful thing, but e.g. Apache
has the same problem with mod_perl, and you just deal with it.  I
choose to deal with it by doing "apachectl graceful" every time I
change the source code; or you can install Perl modules that check
whether the mod-times on the other modules you've loaded have changed
and reload them if so.  In practice, being able to pre-load the Perl
libraries you're going to want to execute is absolutely essential if
you don't want performance to be in the toilet.  My code base is so
large now that it takes 3 or 4 seconds for Apache to pull it all in on
my crappy dev box, but it's blazingly fast once it's up and running.
Having that be something that happens on the production server only
once a week or once a month when I roll out a new release rather than
any more frequently is really important.

...Robert


Re: First feature patch for plperl - draft [PATCH]

From
Tim Bunce
Date:
On Fri, Dec 04, 2009 at 11:01:42AM -0500, Tom Lane wrote:
> Jeff <threshar@threshar.is-a-geek.com> writes:
> > Is there any possible way to enable "use strict;" for plperl (trusted)  
> > modules?
> 
> The plperl manual shows a way to do it using some weird syntax or
> other.  It'd sure be nice to be able to use the regular syntax though.

Finding a solution is definitely on my list. I've spent a little time
exploring this already but haven't found a simple solution yet.

The neatest would have been overriding &CORE::GLOBAL::require but sadly
the Safe/Opcode mechanism takes priority over that and forbids compiling
code that does a use/require.

I may end up re-enabling the require opcode but redirecting it to run
some C code in plperl.c (the same 'opcode redirection' technique used by
my NYTProf profiler). That C code would only need to throw an exception
if the module hasn't been loaded already.

Tim.


Re: First feature patch for plperl - draft [PATCH]

From
Alvaro Herrera
Date:
David E. Wheeler escribió:

> If it's a badly designed concept, do you have any ideas that are less bad?

I'm not sure that we want to duplicate this idea today, but in pltcl
there's a pltcl_modules table that is scanned on interpreter init and
loads user-defined code.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: First feature patch for plperl - draft [PATCH]

From
Tim Bunce
Date:
On Fri, Dec 04, 2009 at 02:05:28PM -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > So, do we look for another way to provide the functionality besides
> > having a GUC, or is the functionality itself bad?
> 
> I don't think we want random Perl code running inside the postmaster,
> no matter what the API to cause it is.  I might hold my nose for "on
> load" code if it can only run in backends, though I still say that
> it's a badly designed concept because of the uncertainty about who
> will run what when.

Robert's comparison with mod_perl is very apt. Preloading code gives
dramatic performance gains in production situations where there's a
significant codebase and connections are frequent.

The docs for plperl.on_perl_init could include a section relating to
it's use with shared_preload_libraries. That could document any issues
and caveats you feel are important.

Tim.


Re: First feature patch for plperl - draft [PATCH]

From
Dimitri Fontaine
Date:
Le 4 déc. 2009 à 20:40, Tim Bunce a écrit :
> Robert's comparison with mod_perl is very apt. Preloading code gives
> dramatic performance gains in production situations where there's a
> significant codebase and connections are frequent.

How far do you go with using a connection pooler such as pgbouncer?

--
dim

Re: First feature patch for plperl - draft [PATCH]

From
"David E. Wheeler"
Date:
On Dec 4, 2009, at 11:40 AM, Tim Bunce wrote:

> Robert's comparison with mod_perl is very apt. Preloading code gives
> dramatic performance gains in production situations where there's a
> significant codebase and connections are frequent.
> 
> The docs for plperl.on_perl_init could include a section relating to
> it's use with shared_preload_libraries. That could document any issues
> and caveats you feel are important.

+1

Tom, what's your objection to Shlib load time being user-visible?

Best,

David


Re: First feature patch for plperl - draft [PATCH]

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>   
>> So, do we look for another way to provide the functionality besides
>> having a GUC, or is the functionality itself bad?
>>     
>
> I don't think we want random Perl code running inside the postmaster,
> no matter what the API to cause it is.  I might hold my nose for "on
> load" code if it can only run in backends, though I still say that
> it's a badly designed concept because of the uncertainty about who
> will run what when.  Shlib load time is not an event that ought to be
> user-visible.
>
>   

But you can load an arbitrary shared lib inside the postmaster and it 
can do what it likes, so I'm not clear that your caution is actually 
saving us from much.

cheers

andrew




Re: First feature patch for plperl - draft [PATCH]

From
Tom Lane
Date:
"David E. Wheeler" <david@kineticode.com> writes:
> Tom, what's your objection to Shlib load time being user-visible?

It's not really designed to be user-visible.  Let me give you just
two examples:

* We call a plperl function for the first time in a session, causing
plperl.so to be loaded.  Later the transaction fails and is rolled
back.  If loading plperl.so caused some user-visible things to happen,
should those be rolled back?  If so, how do we get perl to play along?
If not, how do we get postgres to play along?

* We call a plperl function for the first time in a session, causing
plperl.so to be loaded.  This happens in the context of a superuser
calling a non-superuser security definer function, or perhaps vice
versa.  Whose permissions apply to whatever the on_load code tries
to do?  (Hint: every answer is wrong.)

That doesn't even begin to cover the problems with allowing any of
this to happen inside the postmaster.  Recall that the postmaster
does not have any database access.  Furthermore, it is a very long
established reliability principle around here that the postmaster
process should do as little as possible, because every thing that it
does creates another opportunity to have a nonrecoverable failure.
The postmaster can recover if a child crashes, but the other way
round, not so much.
        regards, tom lane


Re: First feature patch for plperl - draft [PATCH]

From
Tim Bunce
Date:
On Sat, Dec 05, 2009 at 01:21:22AM -0500, Tom Lane wrote:
> "David E. Wheeler" <david@kineticode.com> writes:
> > Tom, what's your objection to Shlib load time being user-visible?
> 
> It's not really designed to be user-visible.  Let me give you just
> two examples:
> 
> * We call a plperl function for the first time in a session, causing
> plperl.so to be loaded.  Later the transaction fails and is rolled
> back.  If loading plperl.so caused some user-visible things to happen,
> should those be rolled back?

No. Establishing initial state, no matter how that's triggered, is not
part of a transaction.

> * We call a plperl function for the first time in a session, causing
> plperl.so to be loaded.  This happens in the context of a superuser
> calling a non-superuser security definer function, or perhaps vice
> versa.  Whose permissions apply to whatever the on_load code tries
> to do?  (Hint: every answer is wrong.)

I'll modify the patch to disable the SPI functions during
initialization (both on_perl_init and on_(un)trusted_init). 

Would that address your concerns?

> That doesn't even begin to cover the problems with allowing any of
> this to happen inside the postmaster.  Recall that the postmaster
> does not have any database access.  Furthermore, it is a very long
> established reliability principle around here that the postmaster
> process should do as little as possible, because every thing that it
> does creates another opportunity to have a nonrecoverable failure.
> The postmaster can recover if a child crashes, but the other way
> round, not so much.

I hope the combination of disabling the SPI functions during
initialization, and documenting the risks of combining on_perl_init and
shared_preload_libraries, is sufficient.

Tim.


Re: First feature patch for plperl - draft [PATCH]

From
Tom Lane
Date:
Tim Bunce <Tim.Bunce@pobox.com> writes:
> I'll modify the patch to disable the SPI functions during
> initialization (both on_perl_init and on_(un)trusted_init). 

Yeah, in the shower this morning I was thinking that not loading
SPI till after the on_init code runs would alleviate the concerns
about transactionality and permissions --- that would ensure that
whatever on_init does affects only the Perl world and not the database
world.

However, we're not out of the woods yet.  In a trusted interpreter
(plperl not plperlu), is the on_init code executed before we lock down
the interpreter with Safe?  I would think it has to be since the main
point AFAICS is to let you preload code via "use".  But then what is
left of the security guarantees of plperl?  I can hardly imagine DBAs
wanting to vet a few thousand lines of random Perl code to see if it
contains anything that could be subverted.  For example, the ability
to scribble on database files (like say pg_hba.conf) would almost surely
be easy to come by.

If you're willing to also confine the feature to plperlu, then maybe
the risk level could be decreased from insane to merely unreasonable.
        regards, tom lane


Re: First feature patch for plperl - draft [PATCH]

From
Andrew Dunstan
Date:

Tim Bunce wrote:
>> That doesn't even begin to cover the problems with allowing any of
>> this to happen inside the postmaster.  Recall that the postmaster
>> does not have any database access.  Furthermore, it is a very long
>> established reliability principle around here that the postmaster
>> process should do as little as possible, because every thing that it
>> does creates another opportunity to have a nonrecoverable failure.
>> The postmaster can recover if a child crashes, but the other way
>> round, not so much.
>>     
>
> I hope the combination of disabling the SPI functions during
> initialization, and documenting the risks of combining on_perl_init and
> shared_preload_libraries, is sufficient.
>
>
>   

We already do a lot during library load - plperl's _PG_init() calls 
plperl_init_interp() which sets up an interpreter, runs the boot code, 
loads the Dynaloader and bootstraps the SPI module.

Pre-loading perl libraries in forking servers has well known benefits, 
as Robert Haas noted.

We're not talking about touching the database at all.

If we turn Tim's proposal down, I suspect someone will create a fork of 
plperl that allows it anyway - it's not like it needs anything changed 
elsewhere in the backend - it would be a drop-in replacement, pretty much.

Here's a concrete example of something I was working on just yesterday, 
where it would be useful. One of my clients has a Postgres based 
application that needs to talk to a number of foreign databases, mostly 
SQLServer. In some cases it pulls data from them, in this new case we 
are pushing lots of data at arbitrary times into SQLServer, using 
plperlu with DBI/DBD::Sybase. We would probably get a significant 
performance gain if we could have DBI and DBD::Sybase preloaded. The 
application does use connection pooling, but every so often a function 
call will take significantly longer because it occurs in a new backend 
that is having to reload the libraries.

I think if we do this the on_perl_init setting should probably be 
PGC_POSTMASTER, which would remove any issue about it changing 
underneath us.

cheers

andrew






Re: First feature patch for plperl - draft [PATCH]

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> If we turn Tim's proposal down, I suspect someone will create a fork of 
> plperl that allows it anyway - it's not like it needs anything changed 
> elsewhere in the backend - it would be a drop-in replacement, pretty much.

The question is not about whether we think it's useful; the question
is about whether it's safe.

> I think if we do this the on_perl_init setting should probably be 
> PGC_POSTMASTER, which would remove any issue about it changing 
> underneath us.

Yes, if the main intended usage is in combination with preloading perl
at postmaster start, it would be pointless to imagine that PGC_SIGHUP
is useful anyway.
        regards, tom lane


Re: First feature patch for plperl - draft [PATCH]

From
Tim Bunce
Date:
On Sat, Dec 05, 2009 at 11:41:36AM -0500, Tom Lane wrote:
> Tim Bunce <Tim.Bunce@pobox.com> writes:
> > I'll modify the patch to disable the SPI functions during
> > initialization (both on_perl_init and on_(un)trusted_init). 
> 
> Yeah, in the shower this morning I was thinking that not loading
> SPI till after the on_init code runs would alleviate the concerns
> about transactionality and permissions --- that would ensure that
> whatever on_init does affects only the Perl world and not the database
> world.
> 
> However, we're not out of the woods yet.  In a trusted interpreter
> (plperl not plperlu), is the on_init code executed before we lock down
> the interpreter with Safe?

The on_perl_init code (PGC_SUSET) is run before Safe is loaded.

The on_trusted_init code (PGC_USERSET) is run inside Safe.

> I would think it has to be since the main point AFAICS is to let you
> preload code via "use".

The main use case being targeted at the moment for on_trusted_init
is setting values in %_SHARED, perhaps to enable debugging.

Inside Safe you'll only be able to 'use' modules that have already been
loaded inside Safe. In my draft patch that's currently just strict and
warnings.

(I am also adding an interface to enable DBAs to configure what gets
loaded into the Safe compartment and what gets shared with it.
That'll be the way extra modules can be used by plperl.
It'll be used via on_perl_init so be controlled via the DBA.)

> I can hardly imagine DBAs wanting to vet a few thousand lines of
> random Perl code to see if it contains anything that could be
> subverted.  For example, the ability to scribble on database files
> (like say pg_hba.conf) would almost surely be easy to come by.

It's surely better to give the DBA that option than to remove the choice
entirely.

> If you're willing to also confine the feature to plperlu, then maybe
> the risk level could be decreased from insane to merely unreasonable.

I believe I can arrange for the SPI functions to be disabled during
on_*_init for both plperl and plperlu. Hopefully then the default risk
level will be better than unreasonable :)

Tim.


Re: First feature patch for plperl - draft [PATCH]

From
Alvaro Herrera
Date:
Tom Lane escribió:
> "David E. Wheeler" <david@kineticode.com> writes:
> > Tom, what's your objection to Shlib load time being user-visible?
> 
> It's not really designed to be user-visible.  Let me give you just
> two examples:
> 
> * We call a plperl function for the first time in a session, causing
> plperl.so to be loaded.  Later the transaction fails and is rolled
> back.

I don't think there's any way for this to work sanely unless the library
has been loaded previously.  What about allowing those settings only if
plperl is specified in shared_preload_libraries?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support