On 6/27/08, Chris Browne <cbbrowne@acm.org> wrote:
> josh@agliodbs.com (Josh Berkus) writes:
> > Jonah,
> >
> >> Hmm, I didn't think the Skype tools could really provide federated
> >> database functionality without a good amount of custom work. Or, am I
> >> mistaken?
> >
> > Sure, what do you think pl/proxy is for?
>
>
> Ah, but the thing is, it changes the model from a relational one,
> where you can have fairly arbitrary "where clauses," to one where
> parameterization of queries must be predetermined.
>
> The "hard part" of federated database functionality at this point is
> the [parenthesized portion] of...
>
> select * from table@node [where criterion = x];
>
> What we'd like to be able to do is to ascertain that [where criterion
> = x] portion, and run it on the remote DBMS, so that only the relevant
> tuples would come back.
>
> Consider...
>
> What if table@node is a remote table with 200 million tuples, and
> [where criterion = x] restricts the result set to 200 of those.
>
> If you *cannot* push the "where clause" down to the remote node, then
> you're stuck with pulling all 200 million tuples, and filtering out,
> on the "local" node, the 200 tuples that need to be kept.
>
> To do better, with pl/proxy, requires having a predetermined function
> that would do that filtering, and if it's missing, you're stuck
> pulling 200M tuples, and throwing out nearly all of them.
>
> In contrast, with the work David Fetter's looking at, the [where
> criterion = x] clause would get pushed to the node which the data is
> being drawn from, and so the query, when running on "table@node,"
> could use indices, and return only the 200 tuples that are of
> interest.
>
> It's a really big win, if it works.
I agree that for doing free-form queries on remote database,
the PL/Proxy is not the right answer. (Although the recent patch
to support dynamic records with AS clause at least makes them work.)
But I want to clarify it's goal - it is not to run "pre-determined
queries." It is to run "pre-determined complex transactions."
And to make those work in a "federated database" takes huge amount
of complexity that PL/Proxy simply sidesteps. At the price of
requiring function-based API. But as the function-based API has
other advantages even without PL/Proxy, it seems fine tradeoff.
--
marko