Re: pgsql_fdw, FDW for PostgreSQL server - Mailing list pgsql-hackers
From | Shigeru HANADA |
---|---|
Subject | Re: pgsql_fdw, FDW for PostgreSQL server |
Date | |
Msg-id | 4F71A101.9060407@gmail.com Whole thread Raw |
In response to | Re: pgsql_fdw, FDW for PostgreSQL server ("Albe Laurenz" <laurenz.albe@wien.gv.at>) |
Responses |
Re: pgsql_fdw, FDW for PostgreSQL server
|
List | pgsql-hackers |
Thanks for the comments. (2012/03/27 18:36), Albe Laurenz wrote: >> 2) Defer planning stuffs as long as possible to clarify the role of > each >> function. Currently GetRelSize just estimates result rows from local >> statistics, and GetPaths adds only one path which represents SeqScan > on >> remote side. As result of this change, PgsqlFdwPlanState struct is >> obsolete. > > I see the advantage of being able to do all this locally, but > I think there are a lot of downsides too: > - You have an additional maintenance task if you want to keep > statistics for remote tables accurate. I understand that this may > get better in a future release. > - You depend on the structure of pg_statistic, which means a potential > incompatibility between server versions. You can add cases to > pgsql_fdw_analyze to cater for changes, but that is cumbersome and > will > only help for later PostgreSQL versions connecting to earlier ones. > - Planning and execution will change (improve, of course) between server > versions. The local planner may choose an inferior plan based on a > wrong assumption of how a certain query can be handled on the remote. > - You have no statistics if the foreign table points to a view on the > remote system. Especially for 2nd and 4th, generating pg_statistic records without calling do_analyze_rel() seems unpractical in multiple version environment. As you pointed out, I've missed another semantics-different problem here. We would have to use do_analyze_rel() and custom sampling function which returns sample rows from remote data source, if we want to have statistics of foreign data locally. This method would be available for most of FDWs, but requires some changes in core. [I'll comment on Fujita-san's ANALYZE patch about this issue soon.] > My gut feeling is that planning should be done by the server which > will execute the query. Agreed, if selectivity of both local filtering and remote filtering were available, we can estimate result rows correctly and choose better plan. How about getting # of rows estimate by executing EXPLAIN for fully-fledged remote query (IOW, contains pushed-down WHERE clause), and estimate selectivity of local filter on the basis of the statistics which are generated by FDW via do_analyze_rel() and FDW-specific sampling function? In this design, we would be able to use quite correct rows estimate because we can consider filtering stuffs done on each side separately, though it requires expensive remote EXPLAIN for each possible path. >> 3) Implement pgsql_fdw's own deparser which pushes down collation-free >> and immutable expressions in local WHERE clause. This means that most >> of numeric conditions can be pushed down, but conditions using > character >> types are not. > > I understand that this is simple and practical, but it is a pity that > this excludes equality and inequality conditions on strings. > Correct me if I am wrong, but I thought that these work the same > regardless of the collation. You are right, built-in equality and inequality operators don't cause collation problem. Perhaps allowing them would cover significant cases of string comparison, but I'm not sure how to determine whether an operator is = or != in generic way. We might have to hold list of oid for collation-safe operator/functions until we support ROUTINE MAPPING or something like that... Anyway, I'll fix pgsql_fdw to allow = and != for character types. Regards, -- Shigeru Hanada
pgsql-hackers by date: