Thread: [patch] plproxy v2

[patch] plproxy v2

From

"Marko Kreen"

Date:

28 June 2008, 10:36:12

We have written a FAQ, where are common issues collected, there
are also discussion about design decisions and why concentrating
only to functions is good idea.

 http://plproxy.projects.postgresql.org/faq/faq.html

The patch itself is attached, and also made available here:

 http://plproxy.projects.postgresql.org/plproxy-v2.diff.gz

Changes since v1:
- dynamic record patch from Lei Yonghua.  That makes functions
  that require result types specified with AS clause work.
- accepts int2/4/8 from hash function
- removed get_cluster_config()
- draft sgml documentation
- disabled binary I/O by default.

- parser.y: remove use of alloca, define malloc/free to palloc/pfree
- scanner.l: fix crash if palloc() throws exception by reinitializing
  flex properly
- make installcheck work
- removed backwards compat code
- use bool instead bitfields to conform to Postgres coding style.

Todo:
- the docs should be more fleshed out
- should regtest moved to src/test/regress?
- make GUC vars for binary i/o and connection_lifetime?

I mentioned that I planned to remove SELECT/CONNECT too.
Now I've thought about it more and it seems to me that its better
to keep them.  As they give additional flexibility.

CONNECT - allows to test plproxy without creating plproxy schema.
Good for quick hacks, RPC-s.  Main problem is that as soon
user starts to have several environments - dev/test/live -
he will get hurt, but that could be solved by giving suggestions
in docs how to avoid that. Eg. suggest using service= or fake hostnames.

SELECT - main reason to keep it would be to allow calling
different functions, having different argument/result names
or types.

--
marko

Attachment

plproxy-v2.diff.gz

Re: [patch] plproxy v2

From

"Marko Kreen"

Date:

02 July 2008, 08:10:06

I took the libery and changed the status from WIP to Pending Review,
because the pending cleanups do not affect reviewing nor committing.

And anyway, because of the patch size I expect reviewers requesting
furthers changes so I'd like to push the tiny stuff to final version
of the patch.

-- 
marko

Re: [patch] plproxy v2

From

Simon Riggs

Date:

08 July 2008, 06:16:16

On Sat, 2008-06-28 at 16:36 +0300, Marko Kreen wrote:

> I mentioned that I planned to remove SELECT/CONNECT too.
> Now I've thought about it more and it seems to me that its better
> to keep them.  As they give additional flexibility.

I very much like PL/Proxy and support your vision. Including the
features of PL/Proxy in core seems like a great idea to me.

If we have just a couple of commands, would it be easier to include
those features by some additional attributes on pg_proc? That way we
could include the features in a more native way, similar to the way we
have integrated text search, without needing a plugin language at all. 
 CREATE CLUSTER foo ...
 CREATE FUNCTION bar() CLUSTER foo RUN ON ANY ...

If we did that, we might also include a similar proxy feature for
tables, making the feature exciting for more users than just those who
can specify implementing all logic through functions. It would also
remove the need for a specific SELECT command in PL/Proxy.
 CREATE TABLE bar CLUSTER foo RUN ON ANY ...

If we're running a SELECT and all tables accessed run on the same
cluster we ship the whole SQL statement according to the RUN ON clause. 
It would effectively bring some parts of dblink into core.

If all tables not on same cluster we throw an error in this release, but
in later releases we might introduce distributed join features and full
distributed DML support.

Having the PL/Proxy features available via the catalog will allow a
clear picture of what runs where without parsing the function text. It
will also allow things like a pg_dump of all objects relating to a
cluster.

Adding this feature for tables would be interesting with Hot Standby,
since it would allow you to offload SELECT statements onto the standby
automatically.

This would be considerably easier to integrate than text search was.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: [patch] plproxy v2

From

"Joshua D. Drake"

Date:

08 July 2008, 12:24:33


On Tue, 2008-07-08 at 10:21 +0100, Simon Riggs wrote:
> On Sat, 2008-06-28 at 16:36 +0300, Marko Kreen wrote:
> 
> I very much like PL/Proxy and support your vision. Including the
> features of PL/Proxy in core seems like a great idea to me.

> Adding this feature for tables would be interesting with Hot Standby,
> since it would allow you to offload SELECT statements onto the standby
> automatically.
> 
> This would be considerably easier to integrate than text search was.

First let me say that I too enjoy PL/Proxy quite a bit. However, I don't
think it needs to be in core. I wouldn't mind seeing it in contrib (or
better yet modules/ should we ever get around to that).

Sincerely,

Joshua D. Drake

Re: [patch] plproxy v2

From

"Marko Kreen"

Date:

08 July 2008, 12:29:21

On 7/8/08, Simon Riggs <simon@2ndquadrant.com> wrote:
>  On Sat, 2008-06-28 at 16:36 +0300, Marko Kreen wrote:
>  > I mentioned that I planned to remove SELECT/CONNECT too.
>  > Now I've thought about it more and it seems to me that its better
>  > to keep them.  As they give additional flexibility.
>
> I very much like PL/Proxy and support your vision. Including the
>  features of PL/Proxy in core seems like a great idea to me.
>
>  If we have just a couple of commands, would it be easier to include
>  those features by some additional attributes on pg_proc? That way we
>  could include the features in a more native way, similar to the way we
>  have integrated text search, without needing a plugin language at all.
>
>   CREATE CLUSTER foo ...
>
>   CREATE FUNCTION bar() CLUSTER foo RUN ON ANY ...
>
>  If we did that, we might also include a similar proxy feature for
>  tables, making the feature exciting for more users than just those who
>  can specify implementing all logic through functions. It would also
>  remove the need for a specific SELECT command in PL/Proxy.
>
>   CREATE TABLE bar CLUSTER foo RUN ON ANY ...
>
>  If we're running a SELECT and all tables accessed run on the same
>  cluster we ship the whole SQL statement according to the RUN ON clause.
>  It would effectively bring some parts of dblink into core.
>
>  If all tables not on same cluster we throw an error in this release, but
>  in later releases we might introduce distributed join features and full
>  distributed DML support.
>
>  Having the PL/Proxy features available via the catalog will allow a
>  clear picture of what runs where without parsing the function text. It
>  will also allow things like a pg_dump of all objects relating to a
>  cluster.
>
>  Adding this feature for tables would be interesting with Hot Standby,
>  since it would allow you to offload SELECT statements onto the standby
>  automatically.
>
>  This would be considerably easier to integrate than text search was.

Interesting proposal.

First I want to say - we can forget the SELECT/CONNECT statements
when discussing this approach.  They are in because they were easy
to add and gave some additional flexibility.  But they are not important.
If they don't fit some new approach, there is no problem dropping them.

So that leaves functions in form:
   CLUSTER <expr>;   RUN ON <expr>;

and potentially SPREAD BY as discussed in:
   http://lists.pgfoundry.org/pipermail/plproxy-users/2008-June/000093.html

which sends different arguments to different partitions.  I'm not yet
sure it's worthwhile addition, but I work mostly on OLTP databases
and that feature would target OLAP ones.  So I let others decide.

Now few technical points about your proposal:

- One feature that current function-based configuration approach gives is that we can manage cluster configuration
centrallyand replicate to actual proxy databases.  And this is something I would like to keep.
 
 This can be solved by using also plain table or functions behind the scenes.

- How about CREATE REMOTE FUNCTION / TABLE .. ; for syntax?

- Currently both hash and cluster selection expressions can be quite free-form.  So parsing them out to some pg_proc
fieldwould not be much help actually.
 

And some philosophical points:

- PL/Proxy main use-case is complex read-write transactions in OLTP setting.  But remote table/views target simple
read-onlytransactions with free-form queries.
 

- PL/Proxy has concrete argument list and free-form cluster and partition selection.  Remote tables have free-form
arguments,maybe they want more rigid cluster / partition selection?
 

If the syntax and backend implementation can be merged, its good,
but it should not be forced.  So before we start adding syntax
to core, maybe it would be good to have concrete idea how the remote
tables will look like and what representation they want for a cluster?

Especially if you want to do stuff like distributed joins.

OTOH, if you say that current PL/Proxy approach fits remote tables
as well, I'm not against doing it SQL level.

-- 
marko

Re: [patch] plproxy v2

From

"Marko Kreen"

Date:

08 July 2008, 12:43:46

On 7/8/08, Joshua D. Drake <jd@commandprompt.com> wrote:
>  On Tue, 2008-07-08 at 10:21 +0100, Simon Riggs wrote:
>  > On Sat, 2008-06-28 at 16:36 +0300, Marko Kreen wrote:
>
> > I very much like PL/Proxy and support your vision. Including the
>  > features of PL/Proxy in core seems like a great idea to me.
>
>
> > Adding this feature for tables would be interesting with Hot Standby,
>  > since it would allow you to offload SELECT statements onto the standby
>  > automatically.
>  >
>  > This would be considerably easier to integrate than text search was.
>
>
> First let me say that I too enjoy PL/Proxy quite a bit. However, I don't
>  think it needs to be in core. I wouldn't mind seeing it in contrib (or
>  better yet modules/ should we ever get around to that).

I'm not against contrib/ considering that the docs are now nicely
integrated, but then, whats the difference between src/pl/ and contrib/?

OTOH, if you argue LANGUAGE plproxy vs. CREATE REMOTE FUNCTION,
then thats different matter.  It seems to be we should do REMOTE FUNCTION
after, not before REMOTE TABLE as the table/view implementation
needs to dictate the actual details.

-- 
marko

Re: [patch] plproxy v2

From

Simon Riggs

Date:

08 July 2008, 12:47:42

On Tue, 2008-07-08 at 18:29 +0300, Marko Kreen wrote:
> and potentially SPREAD BY as discussed in:
> 
> 
> http://lists.pgfoundry.org/pipermail/plproxy-users/2008-June/000093.html
> 

That *sounds* cool, but its just the first part of the implementation of
a massively parallel executor. You'll quickly end up wanting to do
something else as well. Redistributing data is the hard part of a hard
problem. I'd steer clear of that. Skytools are good cause they do simple
things well.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: [patch] plproxy v2

From

"Joshua D. Drake"

Date:

08 July 2008, 12:58:07


On Tue, 2008-07-08 at 18:43 +0300, Marko Kreen wrote:
> On 7/8/08, Joshua D. Drake <jd@commandprompt.com> wrote:
> > First let me say that I too enjoy PL/Proxy quite a bit. However, I don't
> >  think it needs to be in core. I wouldn't mind seeing it in contrib (or
> >  better yet modules/ should we ever get around to that).
> 
> I'm not against contrib/ considering that the docs are now nicely
> integrated, but then, whats the difference between src/pl/ and contrib/?
> 

I am actually against adding to the grammar which is what Simon was
suggesting. If it wants to go into src/pl I wouldn't have a problem with
that. Sorry for not being more clear.

Joshua D. Drake

Re: [patch] plproxy v2

From

Simon Riggs

Date:

08 July 2008, 13:06:31

On Tue, 2008-07-08 at 08:58 -0700, Joshua D. Drake wrote:
> 
> On Tue, 2008-07-08 at 18:43 +0300, Marko Kreen wrote:
> > On 7/8/08, Joshua D. Drake <jd@commandprompt.com> wrote:
> > > First let me say that I too enjoy PL/Proxy quite a bit. However, I don't
> > >  think it needs to be in core. I wouldn't mind seeing it in contrib (or
> > >  better yet modules/ should we ever get around to that).
> > 
> > I'm not against contrib/ considering that the docs are now nicely
> > integrated, but then, whats the difference between src/pl/ and contrib/?
> > 
> 
> I am actually against adding to the grammar 

Why? 

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: [patch] plproxy v2

From

"Marko Kreen"

Date:

08 July 2008, 13:13:22

On 7/8/08, Simon Riggs <simon@2ndquadrant.com> wrote:
>  On Tue, 2008-07-08 at 18:29 +0300, Marko Kreen wrote:
>  > and potentially SPREAD BY as discussed in:
>  >
>  > http://lists.pgfoundry.org/pipermail/plproxy-users/2008-June/000093.html
>
> That *sounds* cool, but its just the first part of the implementation of
>  a massively parallel executor. You'll quickly end up wanting to do
>  something else as well. Redistributing data is the hard part of a hard
>  problem. I'd steer clear of that. Skytools are good cause they do simple
>  things well.

Well, for PL/Proxy it would be the _last_ part.  Yes, now the user
can build parallel OLAP executor, but all of this will be up to user.
PL/Proxy itself will stay dumb and simple.  It would not need do to
any guesswork, all the data will be provided by user.

The amount of code needed to make the SPREAD work would be minimal,
mostly reactoring of existing code is needed.  So it fits the
current design.

The point is - PL/Proxy already executes single query with same
arguments in parallel.  With the SPREAD feature it could execute
single query with different arguments in parallel.  And the
next step of executing different queries in parallel does not make
sense for PL/Proxy as it's main concept is function-calls not queries.

But ofcourse we can decide we don't want do go that way,
and that's ok also.

-- 
marko

Re: [patch] plproxy v2

From

"Marko Kreen"

Date:

08 July 2008, 13:16:41

On 7/8/08, Joshua D. Drake <jd@commandprompt.com> wrote:
>  On Tue, 2008-07-08 at 18:43 +0300, Marko Kreen wrote:
>  > On 7/8/08, Joshua D. Drake <jd@commandprompt.com> wrote:
>
> > > First let me say that I too enjoy PL/Proxy quite a bit. However, I don't
>  > >  think it needs to be in core. I wouldn't mind seeing it in contrib (or
>  > >  better yet modules/ should we ever get around to that).
>  >
>  > I'm not against contrib/ considering that the docs are now nicely
>  > integrated, but then, whats the difference between src/pl/ and contrib/?
>
> I am actually against adding to the grammar which is what Simon was
>  suggesting. If it wants to go into src/pl I wouldn't have a problem with
>  that. Sorry for not being more clear.

Current patch is for src/pl/.  Diffstat for patch v2:
doc/src/sgml/filelist.sgml                         |    1doc/src/sgml/plproxy.sgml                          |  221
++++++doc/src/sgml/postgres.sgml                        |    1src/include/catalog/pg_pltemplate.h                |
1src/pl/Makefile                                   |    2src/pl/plproxy/Makefile                            |   89
++src/pl/plproxy/cluster.c                          |  469 +++++++++++++src/pl/plproxy/execute.c
  |  724 +++++++++++++++++++++src/pl/plproxy/expected/plproxy_clustermap.out     |   71
++src/pl/plproxy/expected/plproxy_dynamic_record.out|   51 +src/pl/plproxy/expected/plproxy_errors.out         |   66
+src/pl/plproxy/expected/plproxy_init.out          |    2src/pl/plproxy/expected/plproxy_many.out           |  116
+++src/pl/plproxy/expected/plproxy_select.out        |   37 +src/pl/plproxy/expected/plproxy_test.out           |  312
+++++++++src/pl/plproxy/function.c                         |  479 +++++++++++++src/pl/plproxy/main.c
         |  214 ++++++src/pl/plproxy/parser.y                            |  203 +++++src/pl/plproxy/plproxy.h
               |  301 ++++++++src/pl/plproxy/poll_compat.c                       |  140
++++src/pl/plproxy/poll_compat.h                      |   58 +src/pl/plproxy/query.c                             |  316
+++++++++src/pl/plproxy/result.c                           |  222 ++++++src/pl/plproxy/rowstamp.h
  |   27src/pl/plproxy/scanner.l                           |  320 +++++++++src/pl/plproxy/sql/plproxy_clustermap.sql
     |   56 +src/pl/plproxy/sql/plproxy_dynamic_record.sql      |   43 +src/pl/plproxy/sql/plproxy_errors.sql
  |   63 +src/pl/plproxy/sql/plproxy_init.sql                |   57 +src/pl/plproxy/sql/plproxy_many.sql
|  66 +src/pl/plproxy/sql/plproxy_select.sql              |   37 +src/pl/plproxy/sql/plproxy_test.sql                |
200+++++src/pl/plproxy/type.c                              |  336 +++++++++33 files changed, 5299 insertions(+), 2
modifications(!)


-- 
marko

Re: [patch] plproxy v2

From

Tom Lane

Date:

21 July 2008, 17:07:41

"Marko Kreen" <markokr@gmail.com> writes:
> [ plproxy ]

I looked through this a bit, and my principal reaction was "what are
the security implications?"  It seems like it'd be very easy to create
functions that allow untrusted users to execute arbitrary SQL on
other databases in the plproxy cluster.  As far as I saw there was
no privilege-checking within plproxy itself, you were just relying
on SQL-level permissions checking --- so even though plproxy functions
can only be *created* by superusers, by default they can be *used* by
anybody.  So it'd take a great deal of care to avoid making
unintentional security holes.

I'm not sure about a good solution to this problem, but I think it needs
to be solved before plproxy can be recommended for widespread use.
The first thought that comes to mind is to somehow propagate the userid
on the calling server to the execution on the remote, so that a user
can't get more privilege on the remote than if he were executing there
directly.  I'm imagining that the definition of a cluster would include
a map from local to remote userids (and thereby imply that anyone not
explicitly listed can't do remote access at all).
        regards, tom lane

Re: [patch] plproxy v2

From

"Marko Kreen"

Date:

21 July 2008, 17:22:05

On 7/21/08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Marko Kreen" <markokr@gmail.com> writes:
>  > [ plproxy ]
>
>  I looked through this a bit, and my principal reaction was "what are
>  the security implications?"  It seems like it'd be very easy to create
>  functions that allow untrusted users to execute arbitrary SQL on
>  other databases in the plproxy cluster.  As far as I saw there was
>  no privilege-checking within plproxy itself, you were just relying
>  on SQL-level permissions checking --- so even though plproxy functions
>  can only be *created* by superusers, by default they can be *used* by
>  anybody.  So it'd take a great deal of care to avoid making
>  unintentional security holes.
>
>  I'm not sure about a good solution to this problem, but I think it needs
>  to be solved before plproxy can be recommended for widespread use.
>  The first thought that comes to mind is to somehow propagate the userid
>  on the calling server to the execution on the remote, so that a user
>  can't get more privilege on the remote than if he were executing there
>  directly.  I'm imagining that the definition of a cluster would include
>  a map from local to remote userids (and thereby imply that anyone not
>  explicitly listed can't do remote access at all).

There are 2 aspects to it:

1.  Function can be created only by superuser.
2.  If cluster connection strings do not have 'user=' key,   ' user=' || current_username() is appended to it.  Note
that  connections are per-backend, not shared.  Also, plroxy does   _nothing_ with passwords.  That means the password
forremote   connection must be in postgres user's .pgpass, or there   is pooler between plproxy and remote database who
handles  passwords.

What else do you see is needed?  I'm not sure a map is a good idea,
is seems to create unnecessary coplexity.  Ofcourse, it can be done.

But I don't think plproxy can and should protect dumb admins who
create remote_exec(sql) function and allow untrusted users to
execute it.

-- 
marko

Re: [patch] plproxy v2

From

Tom Lane

Date:

21 July 2008, 22:33:19

"Marko Kreen" <markokr@gmail.com> writes:
> On 7/21/08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I looked through this a bit, and my principal reaction was "what are
>> the security implications?

> There are 2 aspects to it:

> 1.  Function can be created only by superuser.

What I'm concerned about is who they can be *called* by.  I'd be happier
if the default behavior was that there was no public execute privilege
for plproxy functions.

I think right now that could be enforced by having plproxy's validator
procedure replace any null proacl entry with something that explicitly
refuses public execute.  That's a bit of a hack though.  Maybe it'd be
worth inventing per-PL default ACLs, instead of having a
one-size-fits-all policy?

> 2.  If cluster connection strings do not have 'user=' key,
>     ' user=' || current_username() is appended to it.

Cool, I missed that.  At minimum the documentation has to explain this
point and emphasize the security implications.  Is it a good idea
to allow user= in the cluster strings at all?

>     Also, plroxy does
>     _nothing_ with passwords.  That means the password for remote
>     connection must be in postgres user's .pgpass,

That seems *exactly* backwards, because putting the password in postgres
user's .pgpass is as good as disabling password auth altogether.
Consider that it would also hand all the keys to the kingdom over to
someone who had access to dblink on the same machine (not even the same
cluster, so long as it was run by the same postgres user!).

> But I don't think plproxy can and should protect dumb admins who
> create remote_exec(sql) function and allow untrusted users to
> execute it.

We regularly get beat up about any aspect of our security apparatus
that isn't "secure by default".  This definitely isn't, and from
a PR point of view (if nothing else) that doesn't seem a good idea.

I repeat that I don't feel comfortable in the least with plproxy's
security model.
        regards, tom lane

Re: [patch] plproxy v2

From

Andrew Sullivan

Date:

22 July 2008, 11:40:08

On Mon, Jul 21, 2008 at 09:32:57PM -0400, Tom Lane wrote:
> "Marko Kreen" <markokr@gmail.com> writes:

> > 2.  If cluster connection strings do not have 'user=' key,
> >     ' user=' || current_username() is appended to it.
> 
> Cool, I missed that.  At minimum the documentation has to explain this
> point and emphasize the security implications.  Is it a good idea
> to allow user= in the cluster strings at all?

I wondered about this myself.  Is there anything at all preventing me
from doing 'user=' for some other user?  If not. . .

> >     Also, plroxy does
> >     _nothing_ with passwords.  That means the password for remote
> >     connection must be in postgres user's .pgpass,
> 
> That seems *exactly* backwards, because putting the password in postgres
> user's .pgpass is as good as disabling password auth altogether.

. . .this means that any user on system1 for which there is at least
one user on system2 with plproxy access automatically also has that
access on system2.  (Plus what Tom noted).

> We regularly get beat up about any aspect of our security apparatus
> that isn't "secure by default".  This definitely isn't, and from
> a PR point of view (if nothing else) that doesn't seem a good idea.

I'm less worried about the PR, and more worried about the truck-sized
hole this opens in any authentication controls.  It seems to me that
it's a fairly serious problem.

A

-- 
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/

Re: [patch] plproxy v2

From

"Marko Kreen"

Date:

22 July 2008, 11:50:32

On 7/22/08, Andrew Sullivan <ajs@commandprompt.com> wrote:
> On Mon, Jul 21, 2008 at 09:32:57PM -0400, Tom Lane wrote:
>  > "Marko Kreen" <markokr@gmail.com> writes:
> > > 2.  If cluster connection strings do not have 'user=' key,
>  > >     ' user=' || current_username() is appended to it.
>  >
>  > Cool, I missed that.  At minimum the documentation has to explain this
>  > point and emphasize the security implications.  Is it a good idea
>  > to allow user= in the cluster strings at all?
>
>
> I wondered about this myself.  Is there anything at all preventing me
>  from doing 'user=' for some other user?  If not. . .

For that you need to overwrite the plproxy.get_cluster_partitions()
function or the data it operates on.

I don't see any hole in this, unless explicitly created.

>  > >     Also, plroxy does
>  > >     _nothing_ with passwords.  That means the password for remote
>  > >     connection must be in postgres user's .pgpass,
>  >
>  > That seems *exactly* backwards, because putting the password in postgres
>  > user's .pgpass is as good as disabling password auth altogether.
>
>
> . . .this means that any user on system1 for which there is at least
>  one user on system2 with plproxy access automatically also has that
>  access on system2.  (Plus what Tom noted).

For that the system2 needs to be added as partion to a cluster.
Or specified explicitly in CONNECT statement.

And user can execute only pre-determines queries/functions on system2.

>  > We regularly get beat up about any aspect of our security apparatus
>  > that isn't "secure by default".  This definitely isn't, and from
>  > a PR point of view (if nothing else) that doesn't seem a good idea.
>
> I'm less worried about the PR, and more worried about the truck-sized
>  hole this opens in any authentication controls.  It seems to me that
>  it's a fairly serious problem.

Do you still see a big hole?

-- 
marko

Re: [patch] plproxy v2

From

Tom Lane

Date:

22 July 2008, 12:09:49

Andrew Sullivan <ajs@commandprompt.com> writes:
> On Mon, Jul 21, 2008 at 09:32:57PM -0400, Tom Lane wrote:
>> "Marko Kreen" <markokr@gmail.com> writes:
>>> 2.  If cluster connection strings do not have 'user=' key,
>>> ' user=' || current_username() is appended to it.
>> 
>> Cool, I missed that.  At minimum the documentation has to explain this
>> point and emphasize the security implications.  Is it a good idea
>> to allow user= in the cluster strings at all?

> I wondered about this myself.  Is there anything at all preventing me
> from doing 'user=' for some other user?  If not. . .

I think the assumption is that the cluster connection info would be set
up by a superuser.  However, if there's any way for a non-superuser to
subvert the info returned by the plproxy configuration functions, you
got trouble.  So a lot would depend on how carefully those are coded.
        regards, tom lane

Re: [patch] plproxy v2

From

"Marko Kreen"

Date:

22 July 2008, 12:52:33

On 7/22/08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Marko Kreen" <markokr@gmail.com> writes:
>  > On 7/21/08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I looked through this a bit, and my principal reaction was "what are
>  >> the security implications?
> > There are 2 aspects to it:
>  > 1.  Function can be created only by superuser.
>
> What I'm concerned about is who they can be *called* by.  I'd be happier
>  if the default behavior was that there was no public execute privilege
>  for plproxy functions.
>
>  I think right now that could be enforced by having plproxy's validator
>  procedure replace any null proacl entry with something that explicitly
>  refuses public execute.  That's a bit of a hack though.  Maybe it'd be
>  worth inventing per-PL default ACLs, instead of having a
>  one-size-fits-all policy?

Note1 that if user (admin) wants he can also do user filtering/mapping
in plproxy.* functions.

Note2 - instead of restricting privileges on actual functions, we could
instead restrict privilege on the 2 functions under 'plproxy' schema,
or directly on schema.  Seems simpler.  Eg. create simple default
installation with REVOKE ALL FROM PUBLIC.

>  > 2.  If cluster connection strings do not have 'user=' key,
>  >     ' user=' || current_username() is appended to it.
>
>
> Cool, I missed that.  At minimum the documentation has to explain this
>  point and emphasize the security implications.

Ok.

>  Is it a good idea
>  to allow user= in the cluster strings at all?

I think so - if the plproxy database itself already is main point of
authentication, both can-connect and can-execute sense, then it's
good to avoid complicating setup and send queries away under single
user that has minimal rights on partition dbs and can only execute
requested functions.

>  >     Also, plroxy does
>  >     _nothing_ with passwords.  That means the password for remote
>  >     connection must be in postgres user's .pgpass,
>
>
> That seems *exactly* backwards, because putting the password in postgres
>  user's .pgpass is as good as disabling password auth altogether.
>  Consider that it would also hand all the keys to the kingdom over to
>  someone who had access to dblink on the same machine (not even the same
>  cluster, so long as it was run by the same postgres user!).

Good point.  Some ideas for password handling:

1. Require that user always provider both username and password in  plproxy.get_cluster_partitions().  We could add
separatefunction  for that or add fields to plproxy.get_partitions(), although  this is not necessary - user can add
themsimply to connect string.

  Main problems with this is that maybe you don't want to show the  passwords to anyone who can execute plproxy.*
functions?

2. Let PL/Proxy fetch user password hash from pg_shadow, add API to libpq  to use the hash on authentication instead
plaintextpassword.  This ofcourse expects that remote server uses same auth method as  current one.

  Despite appearance it does not have security problems - the hashes  are already equivalent to plaintext password.

>  > But I don't think plproxy can and should protect dumb admins who
>  > create remote_exec(sql) function and allow untrusted users to
>  > execute it.
>
> We regularly get beat up about any aspect of our security apparatus
>  that isn't "secure by default".  This definitely isn't, and from
>  a PR point of view (if nothing else) that doesn't seem a good idea.
>
>  I repeat that I don't feel comfortable in the least with plproxy's
>  security model.

Btw, I'm very thankful for your review.  I would really like improve the
security of plproxy whatever the merge decision will be, so hopefully
we can discuss it further.

To make discussion easier, here are list of possible problems/fixes
discussed thus far (as I see):

Problems:

- restrict users who can access remote dbs by default.
- avoid spreading passwords too far. - .pgpass gives them to any libpq client - inside connect string they are visible
tocalling user   (although only his own?)

Documentation/default setup fixes:

1. Restrict access to 'plproxy' schema or functions under that schema.  Only users that have grants can use plproxy
functions.

2. Create default setup that does user filtering/mapping by default.  Have the permissions on the functions and tables
carefullytuned  to allow minimal access.

3. Make the default setup also handle passwords from tables.  So instead user adding password handling insecurely, he
canuse it  or remove it from already secure setup.

Code fixes:

4. Create plproxy functions without execute permissions by default.  (Seems unnecessary as 1, 2 already give that?)

5. Let plproxy use user password hash directly from pg_shadow.  (Unless user= or password= is given on connection
string?)

Seems like restricting access is easy, but only 5) gives secure
password handling.

-- 
marko

Re: [patch] plproxy v2

From

Tom Lane

Date:

22 July 2008, 12:56:26

"Marko Kreen" <markokr@gmail.com> writes:
> And user can execute only pre-determines queries/functions on system2.

If that were actually the case then the security issue wouldn't loom
quite so large, but the dynamic_query example in the plproxy regression
tests provides a perfect example of how to ruin your security.

> Do you still see a big hole?

Truck-sized, at least.

The complaint here is not that it's impossible to use plproxy securely;
the complaint is that it's so very easy to use it insecurely.
        regards, tom lane

Re: [patch] plproxy v2

From

"Marko Kreen"

Date:

23 July 2008, 06:48:30

On 7/22/08, Marko Kreen <markokr@gmail.com> wrote:
> On 7/22/08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>  > "Marko Kreen" <markokr@gmail.com> writes:
>  >  >     Also, plroxy does
>  >  >     _nothing_ with passwords.  That means the password for remote
>  >  >     connection must be in postgres user's .pgpass,
>  >
>  >
>  > That seems *exactly* backwards, because putting the password in postgres
>  >  user's .pgpass is as good as disabling password auth altogether.
>  >  Consider that it would also hand all the keys to the kingdom over to
>  >  someone who had access to dblink on the same machine (not even the same
>  >  cluster, so long as it was run by the same postgres user!).
>
> Good point.

I happened to take a look at dblink and I'm not convinced anymore.

Any untrusted PL can be used for remote calls and in all cases,
including PL/Proxy, the superuser must create the function that
specifies both connect string and query.  To allow regular user
specify either connect string or query, the superuser must take
explicit action by coding the function that way.

But for dblink, the *normal* mode of action is that regular user
can (heh, *must*) specify both connect string and query...
And the require-password-hack for dblink makes things only worse
as it obfuscates the security implications of using it.

So indeed, there is a "hole a truck can drive through" and
it's called 'dblink'.  And I don't I should make life miserable
for PL/Proxy users just because core has merged insecure module.

Currently the PL/Proxy acts the same as any other libpq-using PL,
using .pgpass is quite natural for them and I don't see any reason
to lock it down further for no good reason.

I know you had a fiasco with dblink security already, but PL/Proxy
is not in dblink camp.  It's in untrusted-PL camp.

So, now we have clear technical argument for immediate merge -
as a replacement for dblink, as it's more secure and better designed.

And only thing PL/Proxy really needs is more documentation, there needs
to be list of various ways to set it up and security implications of each.

-- 
marko

Re: [patch] plproxy v2

From

Hannu Krosing

Date:

24 July 2008, 13:05:34

On Tue, 2008-07-22 at 11:25 -0400, Tom Lane wrote:
> "Marko Kreen" <markokr@gmail.com> writes:
> > And user can execute only pre-determines queries/functions on system2.
> 
> If that were actually the case then the security issue wouldn't loom
> quite so large, but the dynamic_query example in the plproxy regression
> tests provides a perfect example of how to ruin your security.

The idea is to allow the pl/proxy user only access to the needed
functions and nothing else on the remote db side.

dynamic_query ruins your security, if your pl/proxy remote user has too
much privileges.

> > Do you still see a big hole?
> 
> Truck-sized, at least.
> 
> The complaint here is not that it's impossible to use plproxy securely;
> the complaint is that it's so very easy to use it insecurely.

You mean "easy" like it is very easy to always use your OS as root ?

On Unix this is fixed by stating it as a bad idea in docs (and numerous
books), on windows you have a "privileged" checkbox when creating new
users.

---------------
Hannu