Attached are latest version of pgsql_fdw patches. Note that
pgsql_fdw_analyze.patch is only for test the effect of local statistics.
Please apply patches in the order below:
(1) pgsql_fdw_v18.patch
(2) pgsql_fdw_pushdown_v11.patch
(3) pgsql_fdw_analyze.patch (if you want to try local stats)
(2012/03/27 20:49), Albe Laurenz wrote:
>> How about getting # of rows estimate by executing EXPLAIN for
>> fully-fledged remote query (IOW, contains pushed-down WHERE clause),
> and
>> estimate selectivity of local filter on the basis of the statistics
>> which are generated by FDW via do_analyze_rel() and FDW-specific
>> sampling function? In this design, we would be able to use quite
>> correct rows estimate because we can consider filtering stuffs done on
>> each side separately, though it requires expensive remote EXPLAIN for
>> each possible path.
>
> That sounds nice.
> How would that work with a query that has one condition that could be
> pushed down and one that has to be filtered locally?
>
> Would you use the (local) statistics for the full table or can you
> somehow account for the fact that rows have already been filtered
> out remotely, which might influence the distribution?
First of all, pgsql_fdw sorts conditions into three groups:
(A) can be pushed down, and contains no Param node
(B) can be pushed down, and contains Param node
(C) can not be pushed down
Then pgsql_fdw generates SELECT statement which contains only (A), and
execute EXPLAIN for that query to get rough estimate. Then, pgsql
estimates selectivity of (B) and (C) by calling
clauselist_selectivity(). Finally pgsql_fdw multiply rough estimate,
selectivity of (B) and (C). Thus we can get estimate of # of rows
returned by the scan.
Aside that, before executing actual query, pgsql_fdw appends (B) to the
remote query string to use external parameter. Conditions in (C) are
evaluated on local side anyway.
>> You are right, built-in equality and inequality operators don't cause
>> collation problem. Perhaps allowing them would cover significant
> cases
>> of string comparison, but I'm not sure how to determine whether an
>> operator is = or != in generic way. We might have to hold list of oid
>> for collation-safe operator/functions until we support ROUTINE MAPPING
>> or something like that... Anyway, I'll fix pgsql_fdw to allow = and
> !=
>> for character types.
>
> I believe that this covers a significant percentage of real-world cases.
> I'd think that every built-in operator with name "=" or "<>" could
> be pushed down.
I've fixed pgsql_fdw to push down "=" and "<>" even if they take
collatable type as operands. Now collation check is done for only input
type of OpExpr and ScalarArrayOpExpr. In addition to the two, operators
below seem safe to push down.
concatenate : ||
LIKE : ~~, ~~*, !~~, !~~*
POSIX regex : ~, ~*, !~, !~*
Text Search : @@
Regards,
--
Shigeru HANADA