Re: Federated Postgresql architecture ? - Mailing list pgsql-performance

From Marko Kreen
Subject Re: Federated Postgresql architecture ?
Date
Msg-id e51f66da0806300616q47a3433bt52f78716427c5665@mail.gmail.com
Whole thread Raw
In response to Re: Federated Postgresql architecture ?  (Chris Browne <cbbrowne@acm.org>)
Responses Re: Federated Postgresql architecture ?
List pgsql-performance
On 6/27/08, Chris Browne <cbbrowne@acm.org> wrote:
> josh@agliodbs.com (Josh Berkus) writes:
>  > Jonah,
>  >
>  >> Hmm, I didn't think the Skype tools could really provide federated
>  >> database functionality without a good amount of custom work.  Or, am I
>  >> mistaken?
>  >
>  > Sure, what do you think pl/proxy is for?
>
>
> Ah, but the thing is, it changes the model from a relational one,
>  where you can have fairly arbitrary "where clauses," to one where
>  parameterization of queries must be predetermined.
>
>  The "hard part" of federated database functionality at this point is
>  the [parenthesized portion] of...
>
>   select * from table@node [where criterion = x];
>
>  What we'd like to be able to do is to ascertain that [where criterion
>  = x] portion, and run it on the remote DBMS, so that only the relevant
>  tuples would come back.
>
>  Consider...
>
>  What if table@node is a remote table with 200 million tuples, and
>  [where criterion = x] restricts the result set to 200 of those.
>
>  If you *cannot* push the "where clause" down to the remote node, then
>  you're stuck with pulling all 200 million tuples, and filtering out,
>  on the "local" node, the 200 tuples that need to be kept.
>
>  To do better, with pl/proxy, requires having a predetermined function
>  that would do that filtering, and if it's missing, you're stuck
>  pulling 200M tuples, and throwing out nearly all of them.
>
>  In contrast, with the work David Fetter's looking at, the [where
>  criterion = x] clause would get pushed to the node which the data is
>  being drawn from, and so the query, when running on "table@node,"
>  could use indices, and return only the 200 tuples that are of
>  interest.
>
>  It's a really big win, if it works.

I agree that for doing free-form queries on remote database,
the PL/Proxy is not the right answer.  (Although the recent patch
to support dynamic records with AS clause at least makes them work.)

But I want to clarify it's goal - it is not to run "pre-determined
queries."  It is to run "pre-determined complex transactions."

And to make those work in a "federated database" takes huge amount
of complexity that PL/Proxy simply sidesteps.  At the price of
requiring function-based API.  But as the function-based API has
other advantages even without PL/Proxy, it seems fine tradeoff.

--
marko

pgsql-performance by date:

Previous
From: Moritz Onken
Date:
Subject: Re: Planner should use index on a LIKE 'foo%' query
Next
From: "Jonah H. Harris"
Date:
Subject: Re: Federated Postgresql architecture ?