Re: pgsql_fdw, FDW for PostgreSQL server - Mailing list pgsql-hackers

From Tom Lane
Subject Re: pgsql_fdw, FDW for PostgreSQL server
Date
Msg-id 12181.1331223482@sss.pgh.pa.us
Whole thread Raw
In response to Re: pgsql_fdw, FDW for PostgreSQL server  (Shigeru Hanada <shigeru.hanada@gmail.com>)
Responses Re: pgsql_fdw, FDW for PostgreSQL server  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Shigeru Hanada <shigeru.hanada@gmail.com> writes:
> [ pgsql_fdw_v15.patch ]

I've been looking at this patch a little bit over the past day or so.
I'm pretty unhappy with deparse.c --- it seems like a real kluge,
inefficient and full of corner-case bugs.  After some thought I believe
that you're ultimately going to have to abandon depending on ruleutils.c
for reverse-listing services, and it would be best to bite that bullet
now and rewrite this code from scratch.  ruleutils.c is serving two
masters already (rule-dumping and EXPLAIN) and it's not going to be
practical to tweak its behavior further for this usage; yet there are
all sorts of clear problems that you are going to run into, boiling down
to the fact that names on the remote end aren't necessarily the same as
names on the local end.  For instance, ruleutils.c is not going to be
helpful at schema-qualifying function names in a way that's correct for
the foreign server environment.  Another issue is that as soon as you
try to push down join clauses for parameterized paths, you are going to
want Vars of other relations to be printed as parameters ($n, most
likely) and ruleutils is not going to know to do that.  Seeing that
semantic constraints will greatly limit the set of node types that can
ever be pushed down anyway, I think it's likely to be easiest to just
write your own node-printing code and not even try to use ruleutils.c.

There are a couple of other points that make me think we need to revisit
the PlanForeignScan API definition some more, too.  First, deparse.c is
far from cheap.  While this doesn't matter greatly as long as there's
only one possible path for a foreign table, as soon as you want to
create more than one it's going to be annoying to do all that work N
times and then throw away N-1 of the results.  I also choked on the fact
that the pushdown patch thinks it can modify baserel->baserestrictinfo.
That might accidentally fail to malfunction right now, but it's never
going to scale to multiple paths with potentially different sets of
remotely-applied constraints.  So I'm thinking we really need to let
FDWs in on the Path versus Plan distinction --- that is, a Path just
needs to be a cheap summary of a way to do things, and then at
createplan.c time you convert the selected Path into a full-fledged
Plan.  Most of the work done in deparse.c could be postponed to
createplan time and done only once, even with multiple paths.
The baserestrictinfo hack would be unnecessary too if the FDW had more
direct control over generation of the ForeignScan plan node.

Another thing I'm thinking we should let FDWs in on is the distinction
between rowcount estimation and path generation.  When we did the first
API design last year it was okay to expect a single call to do both,
but as of a couple months ago allpaths.c does those steps in two
separate passes over the baserels, and it'd be handy if FDWs would
cooperate.

So we need to break down what PlanForeignScan currently does into three
separate steps.  The first idea that comes to mind is to call them
GetForeignRelSize, GetForeignPaths, GetForeignPlan; but maybe somebody
has a better idea for names?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: Inline Extension
Next
From: Bruce Momjian
Date:
Subject: Re: pg_upgrade --logfile option documentation