Re: FDW for PostgreSQL - Mailing list pgsql-hackers

From Tom Lane
Subject Re: FDW for PostgreSQL
Date
Msg-id 24991.1361458388@sss.pgh.pa.us
Whole thread Raw
In response to Re: FDW for PostgreSQL  (Albe Laurenz <laurenz.albe@wien.gv.at>)
Responses Re: FDW for PostgreSQL  (Albe Laurenz <laurenz.albe@wien.gv.at>)
List pgsql-hackers
Albe Laurenz <laurenz.albe@wien.gv.at> writes:
> Tom Lane wrote:
>> As I mentioned earlier, I think it would be better to force the remote
>> session's search_path setting to just "pg_catalog" and then reduce the
>> amount of explicit schema naming in the queries --- any opinions about
>> that?

> I think that that would make the remore query much more readable.
> That would improve EXPLAIN VERBOSE output, which is a user visible
> improvement.

Yeah, that's really the main point.  OPERATOR() is tremendously ugly...

>> The bigger picture here though is that we're already relying on the user
>> to make sure that remote tables have column data types matching the local
>> definition, so why can't we say that they've got to make sure collations
>> match too?  So I think this is largely a documentation issue and we don't
>> need any automated enforcement mechanism, or at least it's silly to try
>> to enforce this when we're not enforcing column type matching (yet?).

> I think that the question what to push down is a different question
> from checking column data types, because there we can rely on the
> type input functions to reject bad values.

Unfortunately, that's a very myopic view of the situation: there
are many cases where datatype semantics can vary without the I/O
functions having any idea that anything is wrong.  To take one example,
what if the underlying column is type citext but the user wrote "text"
in the foreign table definition?  postgres_fdw would see no reason not
to push "col = 'foo'" across, but that clause would behave quite
differently on the remote side.  Another example is that float8 and
numeric will have different opinions about the truth of
"1.000000000000000000001 = 1.000000000000000000002", so you're going
to get into trouble if you declare an FT column as one when the
underlying column is the other, even though the I/O functions for these
types will happily take each other's output.

So I think (and have written in the committed docs) that users had
better be careful to ensure that FT columns are declared as the same
type as the underlying columns, even though we can't readily enforce
that, at least not for non-builtin types.

And there's really no difference between that situation and the
collation situation, though I agree with you that the latter is a lot
more likely to bite careless users.

> What about the following design principle:
> Only push down conditions which are sure to return the correct
> result, provided that the PostgreSQL system objects have not
> been tampered with.

That's a nice, simple, and useless-in-practice design principle,
because it will toss out many situations that users will want to work;
situations that in fact *would* work as long as the users adhere to
safe coding conventions.  I do not believe that when people ask "why
does performance of LIKE suck on my foreign table", they will accept an
answer of "we don't allow that to be pushed across because we think
you're too stupid to make the remote collation match".

If you want something provably correct, the way to get there is
to work out a way to check if the remote types and collations
really match.  But that's a hard problem AFAICT, so if we want
something usable in the meantime, we are going to have to accept
some uncertainty about what will happen if the user messes up.

> Would it be reasonable to push down operators and functions
> only if
> a) they are in the pg_catalog schema
> b) they have been around for a couple of releases
> c) they are not collation sensitive?

We don't track (b) nor (c), so this suggestion is entirely
unimplementable in the 9.3 time frame; nor is it provably safe,
unless by (b) you mean "must have been around since 7.4 or so".

On a longer time horizon this might be doable, but it would be a lot
of work to implement a solution that most people will find far too
restrictive.  I'd rather see the long-term focus be on doing
type/collation matching, so that we can expand not restrict the set
of things we can push across.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: Materialized views WIP patch
Next
From: Tom Lane
Date:
Subject: Re: FDW for PostgreSQL