Thread: [patch] plproxy v2
We have written a FAQ, where are common issues collected, there are also discussion about design decisions and why concentrating only to functions is good idea. http://plproxy.projects.postgresql.org/faq/faq.html The patch itself is attached, and also made available here: http://plproxy.projects.postgresql.org/plproxy-v2.diff.gz Changes since v1: - dynamic record patch from Lei Yonghua. That makes functions that require result types specified with AS clause work. - accepts int2/4/8 from hash function - removed get_cluster_config() - draft sgml documentation - disabled binary I/O by default. - parser.y: remove use of alloca, define malloc/free to palloc/pfree - scanner.l: fix crash if palloc() throws exception by reinitializing flex properly - make installcheck work - removed backwards compat code - use bool instead bitfields to conform to Postgres coding style. Todo: - the docs should be more fleshed out - should regtest moved to src/test/regress? - make GUC vars for binary i/o and connection_lifetime? I mentioned that I planned to remove SELECT/CONNECT too. Now I've thought about it more and it seems to me that its better to keep them. As they give additional flexibility. CONNECT - allows to test plproxy without creating plproxy schema. Good for quick hacks, RPC-s. Main problem is that as soon user starts to have several environments - dev/test/live - he will get hurt, but that could be solved by giving suggestions in docs how to avoid that. Eg. suggest using service= or fake hostnames. SELECT - main reason to keep it would be to allow calling different functions, having different argument/result names or types. -- marko
Attachment
I took the libery and changed the status from WIP to Pending Review, because the pending cleanups do not affect reviewing nor committing. And anyway, because of the patch size I expect reviewers requesting furthers changes so I'd like to push the tiny stuff to final version of the patch. -- marko
On Sat, 2008-06-28 at 16:36 +0300, Marko Kreen wrote: > I mentioned that I planned to remove SELECT/CONNECT too. > Now I've thought about it more and it seems to me that its better > to keep them. As they give additional flexibility. I very much like PL/Proxy and support your vision. Including the features of PL/Proxy in core seems like a great idea to me. If we have just a couple of commands, would it be easier to include those features by some additional attributes on pg_proc? That way we could include the features in a more native way, similar to the way we have integrated text search, without needing a plugin language at all. CREATE CLUSTER foo ... CREATE FUNCTION bar() CLUSTER foo RUN ON ANY ... If we did that, we might also include a similar proxy feature for tables, making the feature exciting for more users than just those who can specify implementing all logic through functions. It would also remove the need for a specific SELECT command in PL/Proxy. CREATE TABLE bar CLUSTER foo RUN ON ANY ... If we're running a SELECT and all tables accessed run on the same cluster we ship the whole SQL statement according to the RUN ON clause. It would effectively bring some parts of dblink into core. If all tables not on same cluster we throw an error in this release, but in later releases we might introduce distributed join features and full distributed DML support. Having the PL/Proxy features available via the catalog will allow a clear picture of what runs where without parsing the function text. It will also allow things like a pg_dump of all objects relating to a cluster. Adding this feature for tables would be interesting with Hot Standby, since it would allow you to offload SELECT statements onto the standby automatically. This would be considerably easier to integrate than text search was. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Tue, 2008-07-08 at 10:21 +0100, Simon Riggs wrote: > On Sat, 2008-06-28 at 16:36 +0300, Marko Kreen wrote: > > I very much like PL/Proxy and support your vision. Including the > features of PL/Proxy in core seems like a great idea to me. > Adding this feature for tables would be interesting with Hot Standby, > since it would allow you to offload SELECT statements onto the standby > automatically. > > This would be considerably easier to integrate than text search was. First let me say that I too enjoy PL/Proxy quite a bit. However, I don't think it needs to be in core. I wouldn't mind seeing it in contrib (or better yet modules/ should we ever get around to that). Sincerely, Joshua D. Drake
On 7/8/08, Simon Riggs <simon@2ndquadrant.com> wrote: > On Sat, 2008-06-28 at 16:36 +0300, Marko Kreen wrote: > > I mentioned that I planned to remove SELECT/CONNECT too. > > Now I've thought about it more and it seems to me that its better > > to keep them. As they give additional flexibility. > > I very much like PL/Proxy and support your vision. Including the > features of PL/Proxy in core seems like a great idea to me. > > If we have just a couple of commands, would it be easier to include > those features by some additional attributes on pg_proc? That way we > could include the features in a more native way, similar to the way we > have integrated text search, without needing a plugin language at all. > > CREATE CLUSTER foo ... > > CREATE FUNCTION bar() CLUSTER foo RUN ON ANY ... > > If we did that, we might also include a similar proxy feature for > tables, making the feature exciting for more users than just those who > can specify implementing all logic through functions. It would also > remove the need for a specific SELECT command in PL/Proxy. > > CREATE TABLE bar CLUSTER foo RUN ON ANY ... > > If we're running a SELECT and all tables accessed run on the same > cluster we ship the whole SQL statement according to the RUN ON clause. > It would effectively bring some parts of dblink into core. > > If all tables not on same cluster we throw an error in this release, but > in later releases we might introduce distributed join features and full > distributed DML support. > > Having the PL/Proxy features available via the catalog will allow a > clear picture of what runs where without parsing the function text. It > will also allow things like a pg_dump of all objects relating to a > cluster. > > Adding this feature for tables would be interesting with Hot Standby, > since it would allow you to offload SELECT statements onto the standby > automatically. > > This would be considerably easier to integrate than text search was. Interesting proposal. First I want to say - we can forget the SELECT/CONNECT statements when discussing this approach. They are in because they were easy to add and gave some additional flexibility. But they are not important. If they don't fit some new approach, there is no problem dropping them. So that leaves functions in form: CLUSTER <expr>; RUN ON <expr>; and potentially SPREAD BY as discussed in: http://lists.pgfoundry.org/pipermail/plproxy-users/2008-June/000093.html which sends different arguments to different partitions. I'm not yet sure it's worthwhile addition, but I work mostly on OLTP databases and that feature would target OLAP ones. So I let others decide. Now few technical points about your proposal: - One feature that current function-based configuration approach gives is that we can manage cluster configuration centrallyand replicate to actual proxy databases. And this is something I would like to keep. This can be solved by using also plain table or functions behind the scenes. - How about CREATE REMOTE FUNCTION / TABLE .. ; for syntax? - Currently both hash and cluster selection expressions can be quite free-form. So parsing them out to some pg_proc fieldwould not be much help actually. And some philosophical points: - PL/Proxy main use-case is complex read-write transactions in OLTP setting. But remote table/views target simple read-onlytransactions with free-form queries. - PL/Proxy has concrete argument list and free-form cluster and partition selection. Remote tables have free-form arguments,maybe they want more rigid cluster / partition selection? If the syntax and backend implementation can be merged, its good, but it should not be forced. So before we start adding syntax to core, maybe it would be good to have concrete idea how the remote tables will look like and what representation they want for a cluster? Especially if you want to do stuff like distributed joins. OTOH, if you say that current PL/Proxy approach fits remote tables as well, I'm not against doing it SQL level. -- marko
On 7/8/08, Joshua D. Drake <jd@commandprompt.com> wrote: > On Tue, 2008-07-08 at 10:21 +0100, Simon Riggs wrote: > > On Sat, 2008-06-28 at 16:36 +0300, Marko Kreen wrote: > > > I very much like PL/Proxy and support your vision. Including the > > features of PL/Proxy in core seems like a great idea to me. > > > > Adding this feature for tables would be interesting with Hot Standby, > > since it would allow you to offload SELECT statements onto the standby > > automatically. > > > > This would be considerably easier to integrate than text search was. > > > First let me say that I too enjoy PL/Proxy quite a bit. However, I don't > think it needs to be in core. I wouldn't mind seeing it in contrib (or > better yet modules/ should we ever get around to that). I'm not against contrib/ considering that the docs are now nicely integrated, but then, whats the difference between src/pl/ and contrib/? OTOH, if you argue LANGUAGE plproxy vs. CREATE REMOTE FUNCTION, then thats different matter. It seems to be we should do REMOTE FUNCTION after, not before REMOTE TABLE as the table/view implementation needs to dictate the actual details. -- marko
On Tue, 2008-07-08 at 18:29 +0300, Marko Kreen wrote: > and potentially SPREAD BY as discussed in: > > > http://lists.pgfoundry.org/pipermail/plproxy-users/2008-June/000093.html > That *sounds* cool, but its just the first part of the implementation of a massively parallel executor. You'll quickly end up wanting to do something else as well. Redistributing data is the hard part of a hard problem. I'd steer clear of that. Skytools are good cause they do simple things well. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Tue, 2008-07-08 at 18:43 +0300, Marko Kreen wrote: > On 7/8/08, Joshua D. Drake <jd@commandprompt.com> wrote: > > First let me say that I too enjoy PL/Proxy quite a bit. However, I don't > > think it needs to be in core. I wouldn't mind seeing it in contrib (or > > better yet modules/ should we ever get around to that). > > I'm not against contrib/ considering that the docs are now nicely > integrated, but then, whats the difference between src/pl/ and contrib/? > I am actually against adding to the grammar which is what Simon was suggesting. If it wants to go into src/pl I wouldn't have a problem with that. Sorry for not being more clear. Joshua D. Drake
On Tue, 2008-07-08 at 08:58 -0700, Joshua D. Drake wrote: > > On Tue, 2008-07-08 at 18:43 +0300, Marko Kreen wrote: > > On 7/8/08, Joshua D. Drake <jd@commandprompt.com> wrote: > > > First let me say that I too enjoy PL/Proxy quite a bit. However, I don't > > > think it needs to be in core. I wouldn't mind seeing it in contrib (or > > > better yet modules/ should we ever get around to that). > > > > I'm not against contrib/ considering that the docs are now nicely > > integrated, but then, whats the difference between src/pl/ and contrib/? > > > > I am actually against adding to the grammar Why? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On 7/8/08, Simon Riggs <simon@2ndquadrant.com> wrote: > On Tue, 2008-07-08 at 18:29 +0300, Marko Kreen wrote: > > and potentially SPREAD BY as discussed in: > > > > http://lists.pgfoundry.org/pipermail/plproxy-users/2008-June/000093.html > > That *sounds* cool, but its just the first part of the implementation of > a massively parallel executor. You'll quickly end up wanting to do > something else as well. Redistributing data is the hard part of a hard > problem. I'd steer clear of that. Skytools are good cause they do simple > things well. Well, for PL/Proxy it would be the _last_ part. Yes, now the user can build parallel OLAP executor, but all of this will be up to user. PL/Proxy itself will stay dumb and simple. It would not need do to any guesswork, all the data will be provided by user. The amount of code needed to make the SPREAD work would be minimal, mostly reactoring of existing code is needed. So it fits the current design. The point is - PL/Proxy already executes single query with same arguments in parallel. With the SPREAD feature it could execute single query with different arguments in parallel. And the next step of executing different queries in parallel does not make sense for PL/Proxy as it's main concept is function-calls not queries. But ofcourse we can decide we don't want do go that way, and that's ok also. -- marko
On 7/8/08, Joshua D. Drake <jd@commandprompt.com> wrote: > On Tue, 2008-07-08 at 18:43 +0300, Marko Kreen wrote: > > On 7/8/08, Joshua D. Drake <jd@commandprompt.com> wrote: > > > > First let me say that I too enjoy PL/Proxy quite a bit. However, I don't > > > think it needs to be in core. I wouldn't mind seeing it in contrib (or > > > better yet modules/ should we ever get around to that). > > > > I'm not against contrib/ considering that the docs are now nicely > > integrated, but then, whats the difference between src/pl/ and contrib/? > > I am actually against adding to the grammar which is what Simon was > suggesting. If it wants to go into src/pl I wouldn't have a problem with > that. Sorry for not being more clear. Current patch is for src/pl/. Diffstat for patch v2: doc/src/sgml/filelist.sgml | 1doc/src/sgml/plproxy.sgml | 221 ++++++doc/src/sgml/postgres.sgml | 1src/include/catalog/pg_pltemplate.h | 1src/pl/Makefile | 2src/pl/plproxy/Makefile | 89 ++src/pl/plproxy/cluster.c | 469 +++++++++++++src/pl/plproxy/execute.c | 724 +++++++++++++++++++++src/pl/plproxy/expected/plproxy_clustermap.out | 71 ++src/pl/plproxy/expected/plproxy_dynamic_record.out| 51 +src/pl/plproxy/expected/plproxy_errors.out | 66 +src/pl/plproxy/expected/plproxy_init.out | 2src/pl/plproxy/expected/plproxy_many.out | 116 +++src/pl/plproxy/expected/plproxy_select.out | 37 +src/pl/plproxy/expected/plproxy_test.out | 312 +++++++++src/pl/plproxy/function.c | 479 +++++++++++++src/pl/plproxy/main.c | 214 ++++++src/pl/plproxy/parser.y | 203 +++++src/pl/plproxy/plproxy.h | 301 ++++++++src/pl/plproxy/poll_compat.c | 140 ++++src/pl/plproxy/poll_compat.h | 58 +src/pl/plproxy/query.c | 316 +++++++++src/pl/plproxy/result.c | 222 ++++++src/pl/plproxy/rowstamp.h | 27src/pl/plproxy/scanner.l | 320 +++++++++src/pl/plproxy/sql/plproxy_clustermap.sql | 56 +src/pl/plproxy/sql/plproxy_dynamic_record.sql | 43 +src/pl/plproxy/sql/plproxy_errors.sql | 63 +src/pl/plproxy/sql/plproxy_init.sql | 57 +src/pl/plproxy/sql/plproxy_many.sql | 66 +src/pl/plproxy/sql/plproxy_select.sql | 37 +src/pl/plproxy/sql/plproxy_test.sql | 200+++++src/pl/plproxy/type.c | 336 +++++++++33 files changed, 5299 insertions(+), 2 modifications(!) -- marko
"Marko Kreen" <markokr@gmail.com> writes: > [ plproxy ] I looked through this a bit, and my principal reaction was "what are the security implications?" It seems like it'd be very easy to create functions that allow untrusted users to execute arbitrary SQL on other databases in the plproxy cluster. As far as I saw there was no privilege-checking within plproxy itself, you were just relying on SQL-level permissions checking --- so even though plproxy functions can only be *created* by superusers, by default they can be *used* by anybody. So it'd take a great deal of care to avoid making unintentional security holes. I'm not sure about a good solution to this problem, but I think it needs to be solved before plproxy can be recommended for widespread use. The first thought that comes to mind is to somehow propagate the userid on the calling server to the execution on the remote, so that a user can't get more privilege on the remote than if he were executing there directly. I'm imagining that the definition of a cluster would include a map from local to remote userids (and thereby imply that anyone not explicitly listed can't do remote access at all). regards, tom lane
On 7/21/08, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Marko Kreen" <markokr@gmail.com> writes: > > [ plproxy ] > > I looked through this a bit, and my principal reaction was "what are > the security implications?" It seems like it'd be very easy to create > functions that allow untrusted users to execute arbitrary SQL on > other databases in the plproxy cluster. As far as I saw there was > no privilege-checking within plproxy itself, you were just relying > on SQL-level permissions checking --- so even though plproxy functions > can only be *created* by superusers, by default they can be *used* by > anybody. So it'd take a great deal of care to avoid making > unintentional security holes. > > I'm not sure about a good solution to this problem, but I think it needs > to be solved before plproxy can be recommended for widespread use. > The first thought that comes to mind is to somehow propagate the userid > on the calling server to the execution on the remote, so that a user > can't get more privilege on the remote than if he were executing there > directly. I'm imagining that the definition of a cluster would include > a map from local to remote userids (and thereby imply that anyone not > explicitly listed can't do remote access at all). There are 2 aspects to it: 1. Function can be created only by superuser. 2. If cluster connection strings do not have 'user=' key, ' user=' || current_username() is appended to it. Note that connections are per-backend, not shared. Also, plroxy does _nothing_ with passwords. That means the password forremote connection must be in postgres user's .pgpass, or there is pooler between plproxy and remote database who handles passwords. What else do you see is needed? I'm not sure a map is a good idea, is seems to create unnecessary coplexity. Ofcourse, it can be done. But I don't think plproxy can and should protect dumb admins who create remote_exec(sql) function and allow untrusted users to execute it. -- marko
"Marko Kreen" <markokr@gmail.com> writes: > On 7/21/08, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I looked through this a bit, and my principal reaction was "what are >> the security implications? > There are 2 aspects to it: > 1. Function can be created only by superuser. What I'm concerned about is who they can be *called* by. I'd be happier if the default behavior was that there was no public execute privilege for plproxy functions. I think right now that could be enforced by having plproxy's validator procedure replace any null proacl entry with something that explicitly refuses public execute. That's a bit of a hack though. Maybe it'd be worth inventing per-PL default ACLs, instead of having a one-size-fits-all policy? > 2. If cluster connection strings do not have 'user=' key, > ' user=' || current_username() is appended to it. Cool, I missed that. At minimum the documentation has to explain this point and emphasize the security implications. Is it a good idea to allow user= in the cluster strings at all? > Also, plroxy does > _nothing_ with passwords. That means the password for remote > connection must be in postgres user's .pgpass, That seems *exactly* backwards, because putting the password in postgres user's .pgpass is as good as disabling password auth altogether. Consider that it would also hand all the keys to the kingdom over to someone who had access to dblink on the same machine (not even the same cluster, so long as it was run by the same postgres user!). > But I don't think plproxy can and should protect dumb admins who > create remote_exec(sql) function and allow untrusted users to > execute it. We regularly get beat up about any aspect of our security apparatus that isn't "secure by default". This definitely isn't, and from a PR point of view (if nothing else) that doesn't seem a good idea. I repeat that I don't feel comfortable in the least with plproxy's security model. regards, tom lane
On Mon, Jul 21, 2008 at 09:32:57PM -0400, Tom Lane wrote: > "Marko Kreen" <markokr@gmail.com> writes: > > 2. If cluster connection strings do not have 'user=' key, > > ' user=' || current_username() is appended to it. > > Cool, I missed that. At minimum the documentation has to explain this > point and emphasize the security implications. Is it a good idea > to allow user= in the cluster strings at all? I wondered about this myself. Is there anything at all preventing me from doing 'user=' for some other user? If not. . . > > Also, plroxy does > > _nothing_ with passwords. That means the password for remote > > connection must be in postgres user's .pgpass, > > That seems *exactly* backwards, because putting the password in postgres > user's .pgpass is as good as disabling password auth altogether. . . .this means that any user on system1 for which there is at least one user on system2 with plproxy access automatically also has that access on system2. (Plus what Tom noted). > We regularly get beat up about any aspect of our security apparatus > that isn't "secure by default". This definitely isn't, and from > a PR point of view (if nothing else) that doesn't seem a good idea. I'm less worried about the PR, and more worried about the truck-sized hole this opens in any authentication controls. It seems to me that it's a fairly serious problem. A -- Andrew Sullivan ajs@commandprompt.com +1 503 667 4564 x104 http://www.commandprompt.com/
On 7/22/08, Andrew Sullivan <ajs@commandprompt.com> wrote: > On Mon, Jul 21, 2008 at 09:32:57PM -0400, Tom Lane wrote: > > "Marko Kreen" <markokr@gmail.com> writes: > > > 2. If cluster connection strings do not have 'user=' key, > > > ' user=' || current_username() is appended to it. > > > > Cool, I missed that. At minimum the documentation has to explain this > > point and emphasize the security implications. Is it a good idea > > to allow user= in the cluster strings at all? > > > I wondered about this myself. Is there anything at all preventing me > from doing 'user=' for some other user? If not. . . For that you need to overwrite the plproxy.get_cluster_partitions() function or the data it operates on. I don't see any hole in this, unless explicitly created. > > > Also, plroxy does > > > _nothing_ with passwords. That means the password for remote > > > connection must be in postgres user's .pgpass, > > > > That seems *exactly* backwards, because putting the password in postgres > > user's .pgpass is as good as disabling password auth altogether. > > > . . .this means that any user on system1 for which there is at least > one user on system2 with plproxy access automatically also has that > access on system2. (Plus what Tom noted). For that the system2 needs to be added as partion to a cluster. Or specified explicitly in CONNECT statement. And user can execute only pre-determines queries/functions on system2. > > We regularly get beat up about any aspect of our security apparatus > > that isn't "secure by default". This definitely isn't, and from > > a PR point of view (if nothing else) that doesn't seem a good idea. > > I'm less worried about the PR, and more worried about the truck-sized > hole this opens in any authentication controls. It seems to me that > it's a fairly serious problem. Do you still see a big hole? -- marko
Andrew Sullivan <ajs@commandprompt.com> writes: > On Mon, Jul 21, 2008 at 09:32:57PM -0400, Tom Lane wrote: >> "Marko Kreen" <markokr@gmail.com> writes: >>> 2. If cluster connection strings do not have 'user=' key, >>> ' user=' || current_username() is appended to it. >> >> Cool, I missed that. At minimum the documentation has to explain this >> point and emphasize the security implications. Is it a good idea >> to allow user= in the cluster strings at all? > I wondered about this myself. Is there anything at all preventing me > from doing 'user=' for some other user? If not. . . I think the assumption is that the cluster connection info would be set up by a superuser. However, if there's any way for a non-superuser to subvert the info returned by the plproxy configuration functions, you got trouble. So a lot would depend on how carefully those are coded. regards, tom lane
On 7/22/08, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Marko Kreen" <markokr@gmail.com> writes: > > On 7/21/08, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I looked through this a bit, and my principal reaction was "what are > >> the security implications? > > There are 2 aspects to it: > > 1. Function can be created only by superuser. > > What I'm concerned about is who they can be *called* by. I'd be happier > if the default behavior was that there was no public execute privilege > for plproxy functions. > > I think right now that could be enforced by having plproxy's validator > procedure replace any null proacl entry with something that explicitly > refuses public execute. That's a bit of a hack though. Maybe it'd be > worth inventing per-PL default ACLs, instead of having a > one-size-fits-all policy? Note1 that if user (admin) wants he can also do user filtering/mapping in plproxy.* functions. Note2 - instead of restricting privileges on actual functions, we could instead restrict privilege on the 2 functions under 'plproxy' schema, or directly on schema. Seems simpler. Eg. create simple default installation with REVOKE ALL FROM PUBLIC. > > 2. If cluster connection strings do not have 'user=' key, > > ' user=' || current_username() is appended to it. > > > Cool, I missed that. At minimum the documentation has to explain this > point and emphasize the security implications. Ok. > Is it a good idea > to allow user= in the cluster strings at all? I think so - if the plproxy database itself already is main point of authentication, both can-connect and can-execute sense, then it's good to avoid complicating setup and send queries away under single user that has minimal rights on partition dbs and can only execute requested functions. > > Also, plroxy does > > _nothing_ with passwords. That means the password for remote > > connection must be in postgres user's .pgpass, > > > That seems *exactly* backwards, because putting the password in postgres > user's .pgpass is as good as disabling password auth altogether. > Consider that it would also hand all the keys to the kingdom over to > someone who had access to dblink on the same machine (not even the same > cluster, so long as it was run by the same postgres user!). Good point. Some ideas for password handling: 1. Require that user always provider both username and password in plproxy.get_cluster_partitions(). We could add separatefunction for that or add fields to plproxy.get_partitions(), although this is not necessary - user can add themsimply to connect string. Main problems with this is that maybe you don't want to show the passwords to anyone who can execute plproxy.* functions? 2. Let PL/Proxy fetch user password hash from pg_shadow, add API to libpq to use the hash on authentication instead plaintextpassword. This ofcourse expects that remote server uses same auth method as current one. Despite appearance it does not have security problems - the hashes are already equivalent to plaintext password. > > But I don't think plproxy can and should protect dumb admins who > > create remote_exec(sql) function and allow untrusted users to > > execute it. > > We regularly get beat up about any aspect of our security apparatus > that isn't "secure by default". This definitely isn't, and from > a PR point of view (if nothing else) that doesn't seem a good idea. > > I repeat that I don't feel comfortable in the least with plproxy's > security model. Btw, I'm very thankful for your review. I would really like improve the security of plproxy whatever the merge decision will be, so hopefully we can discuss it further. To make discussion easier, here are list of possible problems/fixes discussed thus far (as I see): Problems: - restrict users who can access remote dbs by default. - avoid spreading passwords too far. - .pgpass gives them to any libpq client - inside connect string they are visible tocalling user (although only his own?) Documentation/default setup fixes: 1. Restrict access to 'plproxy' schema or functions under that schema. Only users that have grants can use plproxy functions. 2. Create default setup that does user filtering/mapping by default. Have the permissions on the functions and tables carefullytuned to allow minimal access. 3. Make the default setup also handle passwords from tables. So instead user adding password handling insecurely, he canuse it or remove it from already secure setup. Code fixes: 4. Create plproxy functions without execute permissions by default. (Seems unnecessary as 1, 2 already give that?) 5. Let plproxy use user password hash directly from pg_shadow. (Unless user= or password= is given on connection string?) Seems like restricting access is easy, but only 5) gives secure password handling. -- marko
"Marko Kreen" <markokr@gmail.com> writes: > And user can execute only pre-determines queries/functions on system2. If that were actually the case then the security issue wouldn't loom quite so large, but the dynamic_query example in the plproxy regression tests provides a perfect example of how to ruin your security. > Do you still see a big hole? Truck-sized, at least. The complaint here is not that it's impossible to use plproxy securely; the complaint is that it's so very easy to use it insecurely. regards, tom lane
On 7/22/08, Marko Kreen <markokr@gmail.com> wrote: > On 7/22/08, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > "Marko Kreen" <markokr@gmail.com> writes: > > > Also, plroxy does > > > _nothing_ with passwords. That means the password for remote > > > connection must be in postgres user's .pgpass, > > > > > > That seems *exactly* backwards, because putting the password in postgres > > user's .pgpass is as good as disabling password auth altogether. > > Consider that it would also hand all the keys to the kingdom over to > > someone who had access to dblink on the same machine (not even the same > > cluster, so long as it was run by the same postgres user!). > > Good point. I happened to take a look at dblink and I'm not convinced anymore. Any untrusted PL can be used for remote calls and in all cases, including PL/Proxy, the superuser must create the function that specifies both connect string and query. To allow regular user specify either connect string or query, the superuser must take explicit action by coding the function that way. But for dblink, the *normal* mode of action is that regular user can (heh, *must*) specify both connect string and query... And the require-password-hack for dblink makes things only worse as it obfuscates the security implications of using it. So indeed, there is a "hole a truck can drive through" and it's called 'dblink'. And I don't I should make life miserable for PL/Proxy users just because core has merged insecure module. Currently the PL/Proxy acts the same as any other libpq-using PL, using .pgpass is quite natural for them and I don't see any reason to lock it down further for no good reason. I know you had a fiasco with dblink security already, but PL/Proxy is not in dblink camp. It's in untrusted-PL camp. So, now we have clear technical argument for immediate merge - as a replacement for dblink, as it's more secure and better designed. And only thing PL/Proxy really needs is more documentation, there needs to be list of various ways to set it up and security implications of each. -- marko
On Tue, 2008-07-22 at 11:25 -0400, Tom Lane wrote: > "Marko Kreen" <markokr@gmail.com> writes: > > And user can execute only pre-determines queries/functions on system2. > > If that were actually the case then the security issue wouldn't loom > quite so large, but the dynamic_query example in the plproxy regression > tests provides a perfect example of how to ruin your security. The idea is to allow the pl/proxy user only access to the needed functions and nothing else on the remote db side. dynamic_query ruins your security, if your pl/proxy remote user has too much privileges. > > Do you still see a big hole? > > Truck-sized, at least. > > The complaint here is not that it's impossible to use plproxy securely; > the complaint is that it's so very easy to use it insecurely. You mean "easy" like it is very easy to always use your OS as root ? On Unix this is fixed by stating it as a bad idea in docs (and numerous books), on windows you have a "privileged" checkbox when creating new users. --------------- Hannu