Thread: Possible major bug in PlPython (plus some other ideas)

Possible major bug in PlPython (plus some other ideas)

From
Kevin Jacobs
Date:
Hi everyone,

I have noticed a possibly major issues in Plpython that may need to be
addressed before 7.2 is released:
 1) If Plpython is installed as a trusted language, and from what little I    can glean from the documentation, it
shouldnot have any filesystem access.    However, the default behavior of the restricted execution environment    being
usedallows read-only filesystem access.
 
    Here is the current behavior (from the Python Library Reference):
    r_open(filename[, mode[, bufsize]])
      Method called when open() is called in the restricted environment.      The arguments are identical to those of
open(),and a file object (or      a class instance compatible with file objects) should be returned.      RExec's
defaultbehavior is allow opening any file for reading, but      forbidding any attempt to write a file. See the example
belowfor an      implementation of a less restrictive r_open().
 
    It is fairly easy to override this method to unconditionally raise an    access exception.

I have some other suggestions that may not be appropriate for the 7.2
release, but think should be addressed before too long:
 2) I'm not sure why the TD dictionary exists.  Why not create objects    'new', 'old' or 'event' in the global
namespacewhen the interpreter is    called in the appropriate contexts?  The current way is unwieldy, not    very
'Pythonic'(a frequent justification for change in the Python    world), and not consistent with other PostgreSQL
proceduralbackends.    Its possible to keep TD for backward compatibility, so there is no    downside.
 
 3) 'old' and 'new' should also provide class-like syntax:
      e.g.       old.foo, new.baz         (using getitem)
      instead of                 old['foo'], new['baz']   (using getattr)
    Of course we cannot drop the getattr interface, since many valid column    names are not valid python identifiers
(Ithink -- I haven't looked at    the SQL grammar lately, though I'm guessing that is the case).
 
 4) Plpython does not use the standard Python boolean checks, which is also    not very Pythonic and somewhat
confusing:
    e.g.
         CREATE OR REPLACE FUNCTION py_true() RETURNS bool AS '           return "a"         ' LANGUAGE 'plpython';
         CREATE OR REPLACE FUNCTION py_false() RETURNS bool AS '           return {}         ' LANGUAGE 'plpython';
         # select py_true();         ERROR:  Bad boolean external representation 'a'         select py_false();
ERROR: Bad boolean external representation '{}'
 
    These should return:
         # select py_true(); -- non-empty strings evaluate to true          bool         ------          t         (1
row)        select py_false();  -- empty dictionaries evaluate to false          bool         ------          f
(1row)
 
    I suggest changing the semantics of boolean return values to use    PyObject_IsTrue(PyObject *o) to properly test
fortruth values.
 
 5) It should be trivial to create an "untrusted" version of Plpython that    bypasses the restricted execution
environment. This is worthy of some    consideration, since it may be very useful and can be implemented with
relativeease.
 
 6) [Very low priority] Its not insane to consider a Plpython procedure    that spawns off a Python thread to do
backgroundprocessing tasks.    This is obviously something that will only be possible in an untrusted    version of the
interpreter. Also, if the SPI interface is thread-safe,    then it may be useful to release the Python interpreter lock
around   some of the SPI calls.
 

OK, so I've got a laundry list of issues.  Only the first issue is a real
show-stopper in my mind, though I'd like to see at least 1-4 addressed
before 7.2 or 7.2.1, if at all possible.  After some discussion, I'll even
by happy to implement most/all of these items, though I'd like more
collaboration than just submitting patches blindly for consideration.

Thanks,
-Kevin Jacobs

PS: Oh, I'd like to thank everyone working on PostgreSQL for the wonderful   job they've done.  I'm a _very_ new user
andam in the process of   porting a _very large_ project from MySQL/MSSQL to PostgresQL, for   obvious reasons.  I had
expectedthe process to be painful and tedious,   but instead it has been a real pleasure and I have enjoyed exploring
all  of the wonderful things you all have given me to play with.
 


--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19         E-mail: jacobs@theopalgroup.com
Fax:   (216) 986-0714              WWW:    http://www.theopalgroup.com



Re: Possible major bug in PlPython (plus some other ideas)

From
Bradley McLean
Date:
* Kevin Jacobs (jacobs@penguin.theopalgroup.com) [011109 08:59]:
> 
>   1) If Plpython is installed as a trusted language, and from what little I
>      can glean from the documentation, it should not have any filesystem access.
>      However, the default behavior of the restricted execution environment
>      being used allows read-only filesystem access.
>... 
>      It is fairly easy to override this method to unconditionally raise an
>      access exception.

Agreed.  Just to amplify the point below, there is currently no
distinction between trusted and untrusted in PLpython (hmm, need
a doc change to identify this?).  lanpltrusted appears nowhere
in the implementation.

>   2) I'm not sure why the TD dictionary exists.  Why not create objects
>      'new', 'old' or 'event' in the global namespace when the interpreter is
>      called in the appropriate contexts?  The current way is unwieldy, not
>      very 'Pythonic' (a frequent justification for change in the Python
>      world), and not consistent with other PostgreSQL procedural backends.
>      Its possible to keep TD for backward compatibility, so there is no
>      downside.
> 
>   3) 'old' and 'new' should also provide class-like syntax:
> 
>        e.g.       old.foo, new.baz         (using getitem)
> 
>        instead of
>                   old['foo'], new['baz']   (using getattr)
> 
>      Of course we cannot drop the getattr interface, since many valid column
>      names are not valid python identifiers (I think -- I haven't looked at
>      the SQL grammar lately, though I'm guessing that is the case).

Agree on both.
>   4) Plpython does not use the standard Python boolean checks, which is also
>      not very Pythonic and somewhat confusing:
> ...
>      I suggest changing the semantics of boolean return values to use
>      PyObject_IsTrue(PyObject *o) to properly test for truth values.

Agree.  Is this the only type that needs special treatment?
>   5) It should be trivial to create an "untrusted" version of Plpython that
>      bypasses the restricted execution environment.  This is worthy of some
>      consideration, since it may be very useful and can be implemented with
>      relative ease.

Strongly agree.  I'd like to see it for your "7.2.1" proposal.
>   6) [Very low priority] Its not insane to consider a Plpython procedure
>      that spawns off a Python thread to do background processing tasks.
>      This is obviously something that will only be possible in an untrusted
>      version of the interpreter.  Also, if the SPI interface is thread-safe,
>      then it may be useful to release the Python interpreter lock around
>      some of the SPI calls.

Three weeks ago, I really wanted a feature like this, but I've slowly
been convincing myself that it's a bad idea.  Consider the effects of
a race condition in user generated threaded python code that results
in a deadlock within the backend.  Extend that to a backend that's
holding a database lock of some kind.  The expertise required to
diagnose and debug such a condition would be rather substantial -
threading, python threading, sql, postgres internals, ... in one
person.

I solved my design issue with the use of an external process and
a NOTIFY, and I'm much happier for it.  The threaded python stuff
can stay out in another process where the interaction concerns are
minimized.

I will say that the discrepancy between executing SQL from plpython
and from python with the pg driver does cause some cognitive
disconnect, because despite working in the same language, I have
to use different APIs to the same functionality.  I don't have a
good proposal to solve it, though.
> OK, so I've got a laundry list of issues.  Only the first issue is a real
> show-stopper in my mind, though I'd like to see at least 1-4 addressed
> before 7.2 or 7.2.1, if at all possible.  After some discussion, I'll even
> by happy to implement most/all of these items, though I'd like more
> collaboration than just submitting patches blindly for consideration.

I'll happily help.

One last issue:

7)  Would it be completely impossible to make an extra-global
dictionary that was shared between multiple back-ends (place
in shared memory)?  Anything I'm likely to place in GD, I'm likely
to initialize once and want in all backends.

-Brad McLean


Re: Possible major bug in PlPython (plus some other ideas)

From
Hannu Krosing
Date:
Kevin Jacobs wrote:
> 
> > >   1) If Plpython is installed as a trusted language, and from what little I
> > >      can glean from the documentation, it should not have any filesystem access.
> > >      However, the default behavior of the restricted execution environment
> > >      being used allows read-only filesystem access.
> >
> > we have 'read-only filesystem access anyhow' :
> 
> Then I consider this a bug if a non-super-user can do this.

It's not that bad - only postgresql superuser can use copy to/from file
.

-------------
Hannu


Re: Possible major bug in PlPython (plus some other ideas)

From
Hannu Krosing
Date:
Kevin Jacobs wrote:
> 
> Hi everyone,
> 
> I have noticed a possibly major issues in Plpython that may need to be
> addressed before 7.2 is released:
> 
>   1) If Plpython is installed as a trusted language, and from what little I
>      can glean from the documentation, it should not have any filesystem access.
>      However, the default behavior of the restricted execution environment
>      being used allows read-only filesystem access.

we have 'read-only filesystem access anyhow' :

pg72b2=# create table hack(row text);
CREATE
pg72b2=# copy hack from '/home/pg72b2/data/pg_hba.conf' DELIMITERS
'\01';
COPY
pg72b2=# select * from hack limit 10;                                    
row                                      
-------------------------------------------------------------------------------# #                PostgreSQL HOST-BASED
ACCESS(HBA) CONTROL FILE# # # This file controls:#      o which hosts are allowed to connect#      o how users are
authenticatedon each host#      o databases accessible by each host# # It is read on postmaster startup and when the
postmasterreceives a
 
SIGHUP.
(10 rows)

do I can't consider having it in plputhon any bigger security threat.

using copy xxx to '/file/' we have even read-write access, we just can't 
overwrite 0600 files. And you can do only what the postgres user can do.

>   2) I'm not sure why the TD dictionary exists.  Why not create objects
>      'new', 'old' or 'event' in the global namespace when the interpreter is
>      called in the appropriate contexts?  The current way is unwieldy, not
>      very 'Pythonic' (a frequent justification for change in the Python
>      world), and not consistent with other PostgreSQL procedural backends.
>      Its possible to keep TD for backward compatibility, so there is no
>      downside.
> 
>   3) 'old' and 'new' should also provide class-like syntax:
> 
>        e.g.       old.foo, new.baz         (using getitem)
> 
>        instead of
>                   old['foo'], new['baz']   (using getattr)
> 
>      Of course we cannot drop the getattr interface, since many valid column
>      names are not valid python identifiers (I think -- I haven't looked at
>      the SQL grammar lately, though I'm guessing that is the case).

You can have almost anything in an identifier if it is quoted.

-----------
Hannu


Re: Possible major bug in PlPython (plus some other ideas)

From
Tom Lane
Date:
Kevin Jacobs <jacobs@penguin.theopalgroup.com> writes:
> I have noticed a possibly major issues in Plpython that may need to be
> addressed before 7.2 is released:

>   1) If Plpython is installed as a trusted language, and from what little I
>      can glean from the documentation, it should not have any filesystem access.
>      However, the default behavior of the restricted execution environment
>      being used allows read-only filesystem access.

I agree, this is not good.  If it's easy to patch, please submit a
patch.

What worries me is not so much this particular hole, which is easily
plugged now that we know about it, as that it suggests that Python's
idea of a restricted environment is considerably less restricted than
we would like.  Perhaps there are other facilities that need to be
turned off as well?

The alternative we could consider is to mark plpython as untrusted for
7.2, until someone has time for a more complete review of possible
security problems.

> I have some other suggestions that may not be appropriate for the 7.2
> release, but think should be addressed before too long:

This would all be good stuff to address in 7.3 or further in the future.
As far as I'm concerned, all the PL languages except plpgsql are barely
out of the "proof of concept" stage; they all need a lot of work from
interested people to bring them to the "industrial strength" stage.
If you want to be one of those people, step right up!

>   6) [Very low priority] Its not insane to consider a Plpython procedure
>      that spawns off a Python thread to do background processing tasks.
>      This is obviously something that will only be possible in an untrusted
>      version of the interpreter.  Also, if the SPI interface is thread-safe,
>      then it may be useful to release the Python interpreter lock around
>      some of the SPI calls.

SPI is not thread-safe; in fact no part of the backend is thread-safe
or designed for multithreading at all.  This one I'd view with great
wariness.
        regards, tom lane


Re: Possible major bug in PlPython (plus some other ideas)

From
Bradley McLean
Date:
* Kevin Jacobs (jacobs@penguin.theopalgroup.com) [011109 10:53]:
> 
> On Fri, 9 Nov 2001, Bradley McLean wrote:
> 
> Actually, I'm mostly unaware of how createlang works, but I assumed that
> plpython was installed as a trusted backend.  Can anyone clarify _exactly_
> what is meant by a 'trusted'?

IANA Expert here.  createlang (shell script, just read) defines plpython
as "TRUSTED", but you can install it either way with CREATE LANGUAGE.

Looking at the pl/tcl and pl/perl implementations, plpython.c needs to
look at the Form_pg_language->lanpltrusted element to get the flag.

Then, plprocedure_compile needs to be modified to set up either exec or
r_exec as appropriate.  Also looks like PLy_init and it's sub procedures
need some rework as well.

Note that I'm just another newcomer who has submitted one more patch
than you, but has a keen interest in a strong python PL.

> To fix this issue, is it better to create a C-API method override to RExec
> or should we implement a real plpy.py module that does this (and some of the
> rest of my suggestions).  I'd prefer the module, since it is _much_ easier
> to do, though it does incur some extra overhead.

I think we need to continue to work in the C-API, and not at the Python
level.  It's not *that* much harder, and I don't see a continuing need
to work at the python level after this situation is fixed.

> Virtually all the other basic types will usually do the right thing.
> We could add support for array conversions:
> 
>    Python [1,2,3] => Postgres INTEGER[] {1,2,3}
> 
> Can you think of any others?

No, and I couldn't before; just asking the design question.

> > Three weeks ago, I really wanted a feature like this, but I've slowly
> > been convincing myself that it's a bad idea.  Consider the effects of
> > a race condition in user generated threaded python code that results
> > in a deadlock within the backend.
> 
> I'm not completely sold on this either.  I'm going to let this idea
> percolate for a while and wait to see if I can find a need for it.

Sounds like a plan.

> Other than not having the same modules available, what is annoying you most?

CREATE FUNCTION one() returns int4 AS '
result = plpy.execute("SELECT 1 as one")
return result[0]["one"]
' language 'plpython';

vs.

from pg import DB
def one():   db = DB(dbname="foo")   result = db.query("SELECT 1 as one")   return result.getresult()[0][0]

----- 

query vs execute, getresult() in one case and not the other,
results by column name vs column index.

Again, I'm not sure I have a strong proposal to fix it, since
the two APIs run in very different environments, and one is
influenced by the python database api, the other by the postgres
SPI api.  I'm just whining because when they're up in two
emacs windows, I'm forever putting the wrong one in the wrong
place.

> > 7)  Would it be completely impossible to make an extra-global
> > dictionary that was shared between multiple back-ends (place
> > in shared memory)?  Anything I'm likely to place in GD, I'm likely
> > to initialize once and want in all backends.
> 
> It will be very tricky to do in full generality.  Anything that is inserted
> into the "shared global" dictionary will have to be copied, re-allocated on
> the shared heap, and hidden from the garbage collector.  This is fairly easy
> to do for a simple types {int,float,string}; much harder for more complex
> types.

Agreed it would be tricky.  But is it useful?  Would write-once (constant)
semantics help?  

-Brad


Re: Possible major bug in PlPython (plus some other ideas)

From
Tom Lane
Date:
Hannu Krosing <hannu@tm.ee> writes:
>> However, the default behavior of the restricted execution environment
>> being used allows read-only filesystem access.

> we have 'read-only filesystem access anyhow' :

> pg72b2=# create table hack(row text);
> CREATE
> pg72b2=# copy hack from '/home/pg72b2/data/pg_hba.conf' DELIMITERS
> '\01';

Only if you're superuser, which is exactly the point of the trusted
vs untrusted function restriction.  The plpython problem lets
non-superusers read any file that the postgres user can read, which
is not cool.
        regards, tom lane


Re: Possible major bug in PlPython (plus some other ideas)

From
Doug McNaught
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> What worries me is not so much this particular hole, which is easily
> plugged now that we know about it, as that it suggests that Python's
> idea of a restricted environment is considerably less restricted than
> we would like.  Perhaps there are other facilities that need to be
> turned off as well?

Could be.  FWIW, Zope (www.zope.org) allows for Python scripts, created 
and managed through the web, that run in a "sandbox" with many of the
same restrictions as PG puts on untrusted languages--they actually
disallow regex matching so you can't hang the webserver thread with a
regex that backtracks forever.  Might be worthhhile for the plpython
folks to take a look at Zope.

> The alternative we could consider is to mark plpython as untrusted for
> 7.2, until someone has time for a more complete review of possible
> security problems.

This sounds like a good idea to me.

-Doug
-- 
Let us cross over the river, and rest under the shade of the trees.  --T. J. Jackson, 1863


Re: Possible major bug in PlPython (plus some other ideas)

From
teg@redhat.com (Trond Eivind Glomsrød)
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Hannu Krosing <hannu@tm.ee> writes:
> >> However, the default behavior of the restricted execution environment
> >> being used allows read-only filesystem access.
> 
> > we have 'read-only filesystem access anyhow' :
> 
> > pg72b2=# create table hack(row text);
> > CREATE
> > pg72b2=# copy hack from '/home/pg72b2/data/pg_hba.conf' DELIMITERS
> > '\01';
> 
> Only if you're superuser, which is exactly the point of the trusted
> vs untrusted function restriction.  The plpython problem lets
> non-superusers read any file that the postgres user can read, which
> is not cool.

If a fix is made, will it be backported to the 7.1 branch so vendors
can upgrade their packages if this is necesarry?

-- 
Trond Eivind Glomsrød
Red Hat, Inc.


Re: Possible major bug in PlPython (plus some other ideas)

From
Doug McNaught
Date:
Doug McNaught <doug@wireboard.com> writes:

>        FWIW, Zope (www.zope.org) allows for Python scripts, created 
> and managed through the web, that run in a "sandbox" with many of the
> same restrictions as PG puts on untrusted languages--they actually                                  ^^^^^^^^

Er, I meant 'trusted' here, of course.

-Doug
-- 
Let us cross over the river, and rest under the shade of the trees.  --T. J. Jackson, 1863


Re: Possible major bug in PlPython (plus some other ideas)

From
"Ross J. Reedstrom"
Date:
On Fri, Nov 09, 2001 at 03:25:04PM -0500, Doug McNaught wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> 
> > What worries me is not so much this particular hole, which is easily
> > plugged now that we know about it, as that it suggests that Python's
> > idea of a restricted environment is considerably less restricted than
> > we would like.  Perhaps there are other facilities that need to be
> > turned off as well?
> 
> Could be.  FWIW, Zope (www.zope.org) allows for Python scripts, created 
> and managed through the web, that run in a "sandbox" with many of the
> same restrictions as PG puts on untrusted languages--they actually
> disallow regex matching so you can't hang the webserver thread with a
> regex that backtracks forever.  Might be worthhhile for the plpython
> folks to take a look at Zope.

And it took _forever_ to convince the Zope folks to put it in, for this
very reason. Those who wanted python scripts (through the web interface,
as opposed to through the filesystem) had to jump through all the hoops
to make it safe enough.

Ross


Re: Possible major bug in PlPython (plus some other ideas)

From
Hannu Krosing
Date:
"Ross J. Reedstrom" wrote:
> 
> On Fri, Nov 09, 2001 at 03:25:04PM -0500, Doug McNaught wrote:
> > Tom Lane <tgl@sss.pgh.pa.us> writes:
> >
> > > What worries me is not so much this particular hole, which is easily
> > > plugged now that we know about it, as that it suggests that Python's
> > > idea of a restricted environment is considerably less restricted than
> > > we would like.  Perhaps there are other facilities that need to be
> > > turned off as well?

Perhaps we need some general guidelines for trusted PLs - lists of
things 
to restrict, e.t.c. 

Perhaps we even need something like CodeBase Principals from 
Netscape/Javascript ?

Or just more fine-grained PRIVILEGEs ?

The python way of defining restricted execution is quite flexible in
what 
to allow. In Zope they used to restict even some kinds of loops and list 
constructors to prevent users from shooting themselves in the foot. 

They are now relaxing some restrictions, as there are too many 
unrestricted ways for doing it to warrant making common operations 
cumbersome.

> > Could be.  FWIW, Zope (www.zope.org) allows for Python scripts, created
> > and managed through the web, that run in a "sandbox" with many of the
> > same restrictions as PG puts on untrusted languages--they actually
> > disallow regex matching so you can't hang the webserver thread with a
> > regex that backtracks forever.

Are there any plans to disallow regex matching in postgreSQL as well ???

AFAIK there are simpler ways to hog a DB server as anyone writing 
SQL queries can tell you ;) 

A more reliable approach for DB server may be establishing per-user
memory/time/cpu quotas and just rolling back queries that exceed them.

> >  Might be worthhhile for the plpython folks to take a look at Zope.
> 
> And it took _forever_ to convince the Zope folks to put it in, for this
> very reason. Those who wanted python scripts (through the web interface,
> as opposed to through the filesystem) had to jump through all the hoops
> to make it safe enough.

-----------------
Hannu


Re: Possible major bug in PlPython (plus some other ideas)

From
Tom Lane
Date:
Hannu Krosing <hannu@tm.ee> writes:
> Are there any plans to disallow regex matching in postgreSQL as well ???
> AFAIK there are simpler ways to hog a DB server as anyone writing 
> SQL queries can tell you ;) 

That was my reaction too --- disabling regexes is not appropriate for
the Postgres environment.  But we do need to prevent access to the
filesystem, even if it's just read-only access.
        regards, tom lane


Re: Possible major bug in PlPython (plus some other ideas)

From
Peter Eisentraut
Date:
Trond Eivind Glomsrød writes:

> If a fix is made, will it be backported to the 7.1 branch so vendors
> can upgrade their packages if this is necesarry?

Probably not, since there is no PL/Python in 7.1.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: Possible major bug in PlPython (plus some other ideas)

From
Bruce Momjian
Date:
I would like to have a patch for this into 7.2 because it is a security
problem.


---------------------------------------------------------------------------

> Kevin Jacobs <jacobs@penguin.theopalgroup.com> writes:
> > I have noticed a possibly major issues in Plpython that may need to be
> > addressed before 7.2 is released:
> 
> >   1) If Plpython is installed as a trusted language, and from what little I
> >      can glean from the documentation, it should not have any filesystem access.
> >      However, the default behavior of the restricted execution environment
> >      being used allows read-only filesystem access.
> 
> I agree, this is not good.  If it's easy to patch, please submit a
> patch.
> 
> What worries me is not so much this particular hole, which is easily
> plugged now that we know about it, as that it suggests that Python's
> idea of a restricted environment is considerably less restricted than
> we would like.  Perhaps there are other facilities that need to be
> turned off as well?
> 
> The alternative we could consider is to mark plpython as untrusted for
> 7.2, until someone has time for a more complete review of possible
> security problems.
> 
> > I have some other suggestions that may not be appropriate for the 7.2
> > release, but think should be addressed before too long:
> 
> This would all be good stuff to address in 7.3 or further in the future.
> As far as I'm concerned, all the PL languages except plpgsql are barely
> out of the "proof of concept" stage; they all need a lot of work from
> interested people to bring them to the "industrial strength" stage.
> If you want to be one of those people, step right up!
> 
> >   6) [Very low priority] Its not insane to consider a Plpython procedure
> >      that spawns off a Python thread to do background processing tasks.
> >      This is obviously something that will only be possible in an untrusted
> >      version of the interpreter.  Also, if the SPI interface is thread-safe,
> >      then it may be useful to release the Python interpreter lock around
> >      some of the SPI calls.
> 
> SPI is not thread-safe; in fact no part of the backend is thread-safe
> or designed for multithreading at all.  This one I'd view with great
> wariness.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
> http://www.postgresql.org/users-lounge/docs/faq.html
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Possible major bug in PlPython (plus some other ideas)

From
Bradley McLean
Date:
I'm going to do this today (Kevin, I'm not stepping on your toes, am I?)

I have some other problems with the module to be addressed (there are
about a million, or at least a dozen ways to get it to return Null pointers
to the backend, crash the backend, and cause a general shared mem
corruption message from the postmaster).

-Brad

* Bruce Momjian (pgman@candle.pha.pa.us) [011112 00:41]:
> 
> I would like to have a patch for this into 7.2 because it is a security
> problem.
> 
> 
> ---------------------------------------------------------------------------
> 
> > Kevin Jacobs <jacobs@penguin.theopalgroup.com> writes:
> > > I have noticed a possibly major issues in Plpython that may need to be
> > > addressed before 7.2 is released:
> > 
> > >   1) If Plpython is installed as a trusted language, and from what little I
> > >      can glean from the documentation, it should not have any filesystem access.
> > >      However, the default behavior of the restricted execution environment
> > >      being used allows read-only filesystem access.
> > 
> > I agree, this is not good.  If it's easy to patch, please submit a
> > patch.
> > 
> > What worries me is not so much this particular hole, which is easily
> > plugged now that we know about it, as that it suggests that Python's
> > idea of a restricted environment is considerably less restricted than
> > we would like.  Perhaps there are other facilities that need to be
> > turned off as well?
> > 
> > The alternative we could consider is to mark plpython as untrusted for
> > 7.2, until someone has time for a more complete review of possible
> > security problems.
> > 
> > > I have some other suggestions that may not be appropriate for the 7.2
> > > release, but think should be addressed before too long:
> > 
> > This would all be good stuff to address in 7.3 or further in the future.
> > As far as I'm concerned, all the PL languages except plpgsql are barely
> > out of the "proof of concept" stage; they all need a lot of work from
> > interested people to bring them to the "industrial strength" stage.
> > If you want to be one of those people, step right up!
> > 
> > >   6) [Very low priority] Its not insane to consider a Plpython procedure
> > >      that spawns off a Python thread to do background processing tasks.
> > >      This is obviously something that will only be possible in an untrusted
> > >      version of the interpreter.  Also, if the SPI interface is thread-safe,
> > >      then it may be useful to release the Python interpreter lock around
> > >      some of the SPI calls.
> > 
> > SPI is not thread-safe; in fact no part of the backend is thread-safe
> > or designed for multithreading at all.  This one I'd view with great
> > wariness.
> > 
> >             regards, tom lane
> > 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 5: Have you checked our extensive FAQ?
> > 
> > http://www.postgresql.org/users-lounge/docs/faq.html
> > 
> 
> -- 
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
> http://www.postgresql.org/users-lounge/docs/faq.html


Re: Possible major bug in PlPython (plus some other ideas)

From
Tom Lane
Date:
Kevin Jacobs <jacobs@penguin.theopalgroup.com> writes:
> I had to guess what a "trusted" language should or should not do. For
> example, I do not allow Python to report the platform it is running on,
> though I do allow it to report the native byte-order and interpreter
> version.

Do you mean platform type, or host name?  "select version()" reports the
platform type, so I see no reason to hide that in plpython.  Otherwise
the changes sound reasonable in brief.  I don't know Python well enough
to review the change properly, though.  D'Arcy or someone, please check
it...
        regards, tom lane


Re: Possible major bug in PlPython (plus some other ideas)

From
Kevin Jacobs
Date:
Hi Bradley,

Thanks for the response!  I'm very relieved to get feedback on my
suggestions.

On Fri, 9 Nov 2001, Bradley McLean wrote:

> * Kevin Jacobs (jacobs@penguin.theopalgroup.com) [011109 08:59]:
> >
> >   1) If Plpython is installed as a trusted language, and from what little I
> >      can glean from the documentation, it should not have any filesystem access.
> >      However, the default behavior of the restricted execution environment
> >      being used allows read-only filesystem access.
> >...
> >      It is fairly easy to override this method to unconditionally raise an
> >      access exception.
>
> Agreed.  Just to amplify the point below, there is currently no
> distinction between trusted and untrusted in PLpython (hmm, need
> a doc change to identify this?).  lanpltrusted appears nowhere
> in the implementation.

Actually, I'm mostly unaware of how createlang works, but I assumed that
plpython was installed as a trusted backend.  Can anyone clarify _exactly_
what is meant by a 'trusted'?

To fix this issue, is it better to create a C-API method override to RExec
or should we implement a real plpy.py module that does this (and some of the
rest of my suggestions).  I'd prefer the module, since it is _much_ easier
to do, though it does incur some extra overhead.

> >   4) Plpython does not use the standard Python boolean checks, which is also
> >      not very Pythonic and somewhat confusing:
> > ...
> >      I suggest changing the semantics of boolean return values to use
> >      PyObject_IsTrue(PyObject *o) to properly test for truth values.
>
> Agree.  Is this the only type that needs special treatment?

Virtually all the other basic types will usually do the right thing.
We could add support for array conversions:
  Python [1,2,3] => Postgres INTEGER[] {1,2,3}

Can you think of any others?

> >   6) [Very low priority] Its not insane to consider a Plpython procedure
> >      that spawns off a Python thread to do background processing tasks.
> >      This is obviously something that will only be possible in an untrusted
> >      version of the interpreter.  Also, if the SPI interface is thread-safe,
> >      then it may be useful to release the Python interpreter lock around
> >      some of the SPI calls.
>
> Three weeks ago, I really wanted a feature like this, but I've slowly
> been convincing myself that it's a bad idea.  Consider the effects of
> a race condition in user generated threaded python code that results
> in a deadlock within the backend.

I'm not completely sold on this either.  I'm going to let this idea
percolate for a while and wait to see if I can find a need for it.

> I will say that the discrepancy between executing SQL from plpython
> and from python with the pg driver does cause some cognitive
> disconnect, because despite working in the same language, I have
> to use different APIs to the same functionality.  I don't have a
> good proposal to solve it, though.

Other than not having the same modules available, what is annoying you most?

> One last issue:
>
> 7)  Would it be completely impossible to make an extra-global
> dictionary that was shared between multiple back-ends (place
> in shared memory)?  Anything I'm likely to place in GD, I'm likely
> to initialize once and want in all backends.

It will be very tricky to do in full generality.  Anything that is inserted
into the "shared global" dictionary will have to be copied, re-allocated on
the shared heap, and hidden from the garbage collector.  This is fairly easy
to do for a simple types {int,float,string}; much harder for more complex
types.

-Kevin

--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19         E-mail: jacobs@theopalgroup.com
Fax:   (216) 986-0714              WWW:    http://www.theopalgroup.com



Re: Possible major bug in PlPython (plus some other ideas)

From
Kevin Jacobs
Date:
On 9 Nov 2001, Doug McNaught wrote:
> Doug McNaught <doug@wireboard.com> writes:
>
> >        FWIW, Zope (www.zope.org) allows for Python scripts, created
> > and managed through the web, that run in a "sandbox" with many of the
> > same restrictions as PG puts on untrusted languages--they actually
>
> Er, I meant 'trusted' here, of course.

plpython does this too -- the problem I've found is due to a bad default
setting in the "sandbox".

-Kevin

--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19         E-mail: jacobs@theopalgroup.com
Fax:   (216) 986-0714              WWW:    http://www.theopalgroup.com




Re: Possible major bug in PlPython (plus some other ideas)

From
Kevin Jacobs
Date:
On Fri, 9 Nov 2001, Hannu Krosing wrote:
> Kevin Jacobs wrote:
> >
> > > >   1) If Plpython is installed as a trusted language, and from what little I
> > > >      can glean from the documentation, it should not have any filesystem access.
> > > >      However, the default behavior of the restricted execution environment
> > > >      being used allows read-only filesystem access.
> > >
> > > we have 'read-only filesystem access anyhow' :
> >
> > Then I consider this a bug if a non-super-user can do this.
>
> It's not that bad - only postgresql superuser can use copy to/from file

Ah -- then it still means we should take read-only filesystem access away
from plpython for now.  If we want to implemente a trusted mode, then we can
add it back in.

-Kevin

--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19         E-mail: jacobs@theopalgroup.com
Fax:   (216) 986-0714              WWW:    http://www.theopalgroup.com




Re: Possible major bug in PlPython (plus some other ideas)

From
Kevin Jacobs
Date:
On Fri, 9 Nov 2001, Tom Lane wrote:
> Kevin Jacobs <jacobs@penguin.theopalgroup.com> writes:
> > I have noticed a possibly major issues in Plpython that may need to be
> > addressed before 7.2 is released:
>
> >   1) If Plpython is installed as a trusted language, and from what little I
> >      can glean from the documentation, it should not have any filesystem access.
> >      However, the default behavior of the restricted execution environment
> >      being used allows read-only filesystem access.
>
> I agree, this is not good.  If it's easy to patch, please submit a
> patch.

I'll have something ready by Monday.

> What worries me is not so much this particular hole, which is easily
> plugged now that we know about it, as that it suggests that Python's
> idea of a restricted environment is considerably less restricted than
> we would like.  Perhaps there are other facilities that need to be
> turned off as well?

I'm going to do a very careful review of the code.  Upfront, I expect that
I've found the only major problem.  There is already a very good "restricted
execution" enviornment in place.  The read-only filesystem issue slipped
through the cracks because it is the default behavior for the evironment.
I'll spend the time to go over any nooks and crannies that bear careful
scrutiny.

> The alternative we could consider is to mark plpython as untrusted for
> 7.2, until someone has time for a more complete review of possible
> security problems.

If I don't feel that the code is 100% then I'll vote for this option too.

-Kevin

--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19         E-mail: jacobs@theopalgroup.com
Fax:   (216) 986-0714              WWW:    http://www.theopalgroup.com




Re: Possible major bug in PlPython (plus some other ideas)

From
Kevin Jacobs
Date:
> >   1) If Plpython is installed as a trusted language, and from what little I
> >      can glean from the documentation, it should not have any filesystem access.
> >      However, the default behavior of the restricted execution environment
> >      being used allows read-only filesystem access.
>
> we have 'read-only filesystem access anyhow' :

Then I consider this a bug if a non-super-user can do this.

> using copy xxx to '/file/' we have even read-write access, we just can't
> overwrite 0600 files. And you can do only what the postgres user can do.

This is an even bigger bug.  I didn't think I needed to run PostgreSQL in a
chroot jail, but its looking more like that may be needed.  Any comments
from other developers?  Is this really the security model you want?

If keep telling me things like this, I'll stop using Postgres!

-Kevin

--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19         E-mail: jacobs@theopalgroup.com
Fax:   (216) 986-0714              WWW:    http://www.theopalgroup.com




Re: Possible major bug in PlPython (plus some other ideas)

From
Kevin Jacobs
Date:
On Mon, 12 Nov 2001, Tom Lane wrote:
> Kevin Jacobs <jacobs@penguin.theopalgroup.com> writes:
> > I had to guess what a "trusted" language should or should not do. For
> > example, I do not allow Python to report the platform it is running on,
> > though I do allow it to report the native byte-order and interpreter
> > version.
>
> Do you mean platform type, or host name?  "select version()" reports the
> platform type, so I see no reason to hide that in plpython.

I meant platform type.  Since its already available in Postgre, I'll make it
available in my next path.

-Kevin

--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19         E-mail: jacobs@theopalgroup.com
Fax:   (216) 986-0714              WWW:    http://www.theopalgroup.com




Re: Possible major bug in PlPython (plus some other ideas)

From
Bradley McLean
Date:
Kevin, I've been using / reviewing your patch and it looks good.

Did you submit it to patches?

(Everyone)  Would a patch to add trusted language support be accepted
for 7.2, or is it too late?

-Brad


Re: Possible major bug in PlPython (plus some other ideas)

From
Tom Lane
Date:
Bradley McLean <brad@bradm.net> writes:
> (Everyone)  Would a patch to add trusted language support be accepted
> for 7.2, or is it too late?

I think the code in there already is the trusted case, no?  The addition
would be an untrusted mode for plpython.

trusted = language handler prevents security violations, so unprivileged
users are allowed to define functions in the language (ie, we trust the
language itself to prevent security breaches)

untrusted = language allows user to access things outside database,
so only Postgres superusers are allowed to define functions in the
language (ie, we must trust the function author instead of the language)

In any case, a second security level in plpython would clearly be a new
feature, and so I'd say it's too late to consider it for 7.2.  All that
we want to do at this point is verify Kevin's proposed patch for the
existing security level.  But certainly a "plpythonu" addition would
be welcome for 7.3.
        regards, tom lane


Re: Possible major bug in PlPython (plus some other ideas)

From
Kevin Jacobs
Date:
On Tue, 13 Nov 2001, Bradley McLean wrote:
> Kevin, I've been using / reviewing your patch and it looks good.

Great!  I'll have a very slightly updated version ready by tomorrow that
incorprates some suggestions from Tom Lane.

> Did you submit it to patches?

I sent it to hackers, though the list manager really hates that I'm not
subscribed.

> (Everyone)  Would a patch to add trusted language support be accepted
> for 7.2, or is it too late?

I've got a patch about half-done to do this.  If this gets the green light,
then lets work together on this.

-Kevin



Re: Possible major bug in PlPython (plus some other ideas)

From
Bruce Momjian
Date:
Has this all been addressed?  Are there any TODO items here?

---------------------------------------------------------------------------

> Bradley McLean <brad@bradm.net> writes:
> > (Everyone)  Would a patch to add trusted language support be accepted
> > for 7.2, or is it too late?
> 
> I think the code in there already is the trusted case, no?  The addition
> would be an untrusted mode for plpython.
> 
> trusted = language handler prevents security violations, so unprivileged
> users are allowed to define functions in the language (ie, we trust the
> language itself to prevent security breaches)
> 
> untrusted = language allows user to access things outside database,
> so only Postgres superusers are allowed to define functions in the
> language (ie, we must trust the function author instead of the language)
> 
> In any case, a second security level in plpython would clearly be a new
> feature, and so I'd say it's too late to consider it for 7.2.  All that
> we want to do at this point is verify Kevin's proposed patch for the
> existing security level.  But certainly a "plpythonu" addition would
> be welcome for 7.3.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Possible major bug in PlPython (plus some other ideas)

From
Bruce Momjian
Date:
> On Sat, 17 Nov 2001, Bruce Momjian wrote:
> > Has this all been addressed?  Are there any TODO items here?
> 
> All of the security related _problems_ that affect the rest of 7.2 have been
> solved, to the best of my knowledge. The discussion below pretains to adding
> an additional untrusted mode like plperl has.  Since this is a new feature,
> it is on the TODO list for 7.3.

OK, added to TODO:
* Add untrusted version of plpython


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Possible major bug in PlPython (plus some other ideas)

From
Kevin Jacobs
Date:
On Sat, 17 Nov 2001, Bruce Momjian wrote:
> Has this all been addressed?  Are there any TODO items here?

All of the security related _problems_ that affect the rest of 7.2 have been
solved, to the best of my knowledge. The discussion below pretains to adding
an additional untrusted mode like plperl has.  Since this is a new feature,
it is on the TODO list for 7.3.

Regards,
-Kevin Jacobs

>
> ---------------------------------------------------------------------------
>
> > Bradley McLean <brad@bradm.net> writes:
> > > (Everyone)  Would a patch to add trusted language support be accepted
> > > for 7.2, or is it too late?
> >
> > I think the code in there already is the trusted case, no?  The addition
> > would be an untrusted mode for plpython.
> >
> > trusted = language handler prevents security violations, so unprivileged
> > users are allowed to define functions in the language (ie, we trust the
> > language itself to prevent security breaches)
> >
> > untrusted = language allows user to access things outside database,
> > so only Postgres superusers are allowed to define functions in the
> > language (ie, we must trust the function author instead of the language)
> >
> > In any case, a second security level in plpython would clearly be a new
> > feature, and so I'd say it's too late to consider it for 7.2.  All that
> > we want to do at this point is verify Kevin's proposed patch for the
> > existing security level.  But certainly a "plpythonu" addition would
> > be welcome for 7.3.
> >
> >             regards, tom lane
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
> >
>
>

--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19         E-mail: jacobs@theopalgroup.com
Fax:   (216) 986-0714              WWW:    http://www.theopalgroup.com