Re: pgsql_fdw, FDW for PostgreSQL server - Mailing list pgsql-hackers

From Albe Laurenz
Subject Re: pgsql_fdw, FDW for PostgreSQL server
Date
Msg-id D960CB61B694CF459DCFB4B0128514C207AA8AFF@exadv11.host.magwien.gv.at
Whole thread Raw
In response to Re: pgsql_fdw, FDW for PostgreSQL server  (Shigeru HANADA <shigeru.hanada@gmail.com>)
Responses Re: pgsql_fdw, FDW for PostgreSQL server
Re: pgsql_fdw, FDW for PostgreSQL server
List pgsql-hackers
Shigeru HANADA wrote:
> I've implemented pgsql_fdw's own deparser and enhanced some features
> since last post.  Please apply attached patches in the order below:

> Changes from previous version
> =============================
>
> 1) Don't use remote EXPLAIN for cost/rows estimation, so now planner
> estimates result rows and costs on the basis of local statistics such
as
> pg_class and pg_statistic.  To update local statistics, I added
> pgsql_fdw_analyze() SQL function which updates local statistics of a
> foreign table by retrieving remote statistics, such as pg_class and
> pg_statistic, via libpq.  This would make the planning of pgsql_fdw
> simple and fast.  This function can be easily modified to handle
ANALYZE
> command invoked for a foreign table (Fujita-san is proposing this as
> common feature in another thread).
>
> 2) Defer planning stuffs as long as possible to clarify the role of
each
> function.  Currently GetRelSize just estimates result rows from local
> statistics, and GetPaths adds only one path which represents SeqScan
on
> remote side.  As result of this change, PgsqlFdwPlanState struct is
> obsolete.

I see the advantage of being able to do all this locally, but
I think there are a lot of downsides too:
- You have an additional maintenance task if you want to keep statistics for remote tables accurate.  I understand that
thismay get better in a future release. 
- You depend on the structure of pg_statistic, which means a potential incompatibility between server versions.  You
canadd cases to pgsql_fdw_analyze to cater for changes, but that is cumbersome and 
will only help for later PostgreSQL versions connecting to earlier ones.
- Planning and execution will change (improve, of course) between server versions.  The local planner may choose an
inferiorplan based on a wrong assumption of how a certain query can be handled on the remote. 
- You have no statistics if the foreign table points to a view on the remote system.

My gut feeling is that planning should be done by the server which
will execute the query.

> 3) Implement pgsql_fdw's own deparser which pushes down collation-free
> and immutable expressions in local WHERE clause.  This means that most
> of numeric conditions can be pushed down, but conditions using
character
> types are not.

I understand that this is simple and practical, but it is a pity that
this excludes equality and inequality conditions on strings.
Correct me if I am wrong, but I thought that these work the same
regardless of the collation.

Yours,
Laurenz Albe


pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Improvement of log messages in pg_basebackup
Next
From: Fujii Masao
Date:
Subject: Re: Reporting WAL file containing checkpoint's REDO record in pg_controldata's result