Re: [patch] plproxy v2 - Mailing list pgsql-hackers

From Marko Kreen
Subject Re: [patch] plproxy v2
Date
Msg-id e51f66da0807080829t310c1ad5p3784e9543a20b6ff@mail.gmail.com
Whole thread Raw
In response to Re: [patch] plproxy v2  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: [patch] plproxy v2
List pgsql-hackers
On 7/8/08, Simon Riggs <simon@2ndquadrant.com> wrote:
>  On Sat, 2008-06-28 at 16:36 +0300, Marko Kreen wrote:
>  > I mentioned that I planned to remove SELECT/CONNECT too.
>  > Now I've thought about it more and it seems to me that its better
>  > to keep them.  As they give additional flexibility.
>
> I very much like PL/Proxy and support your vision. Including the
>  features of PL/Proxy in core seems like a great idea to me.
>
>  If we have just a couple of commands, would it be easier to include
>  those features by some additional attributes on pg_proc? That way we
>  could include the features in a more native way, similar to the way we
>  have integrated text search, without needing a plugin language at all.
>
>   CREATE CLUSTER foo ...
>
>   CREATE FUNCTION bar() CLUSTER foo RUN ON ANY ...
>
>  If we did that, we might also include a similar proxy feature for
>  tables, making the feature exciting for more users than just those who
>  can specify implementing all logic through functions. It would also
>  remove the need for a specific SELECT command in PL/Proxy.
>
>   CREATE TABLE bar CLUSTER foo RUN ON ANY ...
>
>  If we're running a SELECT and all tables accessed run on the same
>  cluster we ship the whole SQL statement according to the RUN ON clause.
>  It would effectively bring some parts of dblink into core.
>
>  If all tables not on same cluster we throw an error in this release, but
>  in later releases we might introduce distributed join features and full
>  distributed DML support.
>
>  Having the PL/Proxy features available via the catalog will allow a
>  clear picture of what runs where without parsing the function text. It
>  will also allow things like a pg_dump of all objects relating to a
>  cluster.
>
>  Adding this feature for tables would be interesting with Hot Standby,
>  since it would allow you to offload SELECT statements onto the standby
>  automatically.
>
>  This would be considerably easier to integrate than text search was.

Interesting proposal.

First I want to say - we can forget the SELECT/CONNECT statements
when discussing this approach.  They are in because they were easy
to add and gave some additional flexibility.  But they are not important.
If they don't fit some new approach, there is no problem dropping them.

So that leaves functions in form:
   CLUSTER <expr>;   RUN ON <expr>;

and potentially SPREAD BY as discussed in:
   http://lists.pgfoundry.org/pipermail/plproxy-users/2008-June/000093.html

which sends different arguments to different partitions.  I'm not yet
sure it's worthwhile addition, but I work mostly on OLTP databases
and that feature would target OLAP ones.  So I let others decide.

Now few technical points about your proposal:

- One feature that current function-based configuration approach gives is that we can manage cluster configuration
centrallyand replicate to actual proxy databases.  And this is something I would like to keep.
 
 This can be solved by using also plain table or functions behind the scenes.

- How about CREATE REMOTE FUNCTION / TABLE .. ; for syntax?

- Currently both hash and cluster selection expressions can be quite free-form.  So parsing them out to some pg_proc
fieldwould not be much help actually.
 

And some philosophical points:

- PL/Proxy main use-case is complex read-write transactions in OLTP setting.  But remote table/views target simple
read-onlytransactions with free-form queries.
 

- PL/Proxy has concrete argument list and free-form cluster and partition selection.  Remote tables have free-form
arguments,maybe they want more rigid cluster / partition selection?
 

If the syntax and backend implementation can be merged, its good,
but it should not be forced.  So before we start adding syntax
to core, maybe it would be good to have concrete idea how the remote
tables will look like and what representation they want for a cluster?

Especially if you want to do stuff like distributed joins.

OTOH, if you say that current PL/Proxy approach fits remote tables
as well, I'm not against doing it SQL level.

-- 
marko


pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: [patch] plproxy v2
Next
From: David Fetter
Date:
Subject: Re: Exposing quals