Re: [patch] plproxy v2 - Mailing list pgsql-hackers
From | Marko Kreen |
---|---|
Subject | Re: [patch] plproxy v2 |
Date | |
Msg-id | e51f66da0807080829t310c1ad5p3784e9543a20b6ff@mail.gmail.com Whole thread Raw |
In response to | Re: [patch] plproxy v2 (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: [patch] plproxy v2
|
List | pgsql-hackers |
On 7/8/08, Simon Riggs <simon@2ndquadrant.com> wrote: > On Sat, 2008-06-28 at 16:36 +0300, Marko Kreen wrote: > > I mentioned that I planned to remove SELECT/CONNECT too. > > Now I've thought about it more and it seems to me that its better > > to keep them. As they give additional flexibility. > > I very much like PL/Proxy and support your vision. Including the > features of PL/Proxy in core seems like a great idea to me. > > If we have just a couple of commands, would it be easier to include > those features by some additional attributes on pg_proc? That way we > could include the features in a more native way, similar to the way we > have integrated text search, without needing a plugin language at all. > > CREATE CLUSTER foo ... > > CREATE FUNCTION bar() CLUSTER foo RUN ON ANY ... > > If we did that, we might also include a similar proxy feature for > tables, making the feature exciting for more users than just those who > can specify implementing all logic through functions. It would also > remove the need for a specific SELECT command in PL/Proxy. > > CREATE TABLE bar CLUSTER foo RUN ON ANY ... > > If we're running a SELECT and all tables accessed run on the same > cluster we ship the whole SQL statement according to the RUN ON clause. > It would effectively bring some parts of dblink into core. > > If all tables not on same cluster we throw an error in this release, but > in later releases we might introduce distributed join features and full > distributed DML support. > > Having the PL/Proxy features available via the catalog will allow a > clear picture of what runs where without parsing the function text. It > will also allow things like a pg_dump of all objects relating to a > cluster. > > Adding this feature for tables would be interesting with Hot Standby, > since it would allow you to offload SELECT statements onto the standby > automatically. > > This would be considerably easier to integrate than text search was. Interesting proposal. First I want to say - we can forget the SELECT/CONNECT statements when discussing this approach. They are in because they were easy to add and gave some additional flexibility. But they are not important. If they don't fit some new approach, there is no problem dropping them. So that leaves functions in form: CLUSTER <expr>; RUN ON <expr>; and potentially SPREAD BY as discussed in: http://lists.pgfoundry.org/pipermail/plproxy-users/2008-June/000093.html which sends different arguments to different partitions. I'm not yet sure it's worthwhile addition, but I work mostly on OLTP databases and that feature would target OLAP ones. So I let others decide. Now few technical points about your proposal: - One feature that current function-based configuration approach gives is that we can manage cluster configuration centrallyand replicate to actual proxy databases. And this is something I would like to keep. This can be solved by using also plain table or functions behind the scenes. - How about CREATE REMOTE FUNCTION / TABLE .. ; for syntax? - Currently both hash and cluster selection expressions can be quite free-form. So parsing them out to some pg_proc fieldwould not be much help actually. And some philosophical points: - PL/Proxy main use-case is complex read-write transactions in OLTP setting. But remote table/views target simple read-onlytransactions with free-form queries. - PL/Proxy has concrete argument list and free-form cluster and partition selection. Remote tables have free-form arguments,maybe they want more rigid cluster / partition selection? If the syntax and backend implementation can be merged, its good, but it should not be forced. So before we start adding syntax to core, maybe it would be good to have concrete idea how the remote tables will look like and what representation they want for a cluster? Especially if you want to do stuff like distributed joins. OTOH, if you say that current PL/Proxy approach fits remote tables as well, I'm not against doing it SQL level. -- marko
pgsql-hackers by date: