On Fri, Dec 18, 2015 at 8:09 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
> My gut feeling is that for a join where all join predicates can be pushed down, it
> will usually be a win to push the join to the foreign server.
>
> So in your first scenario, I'd opt for always pushing down the join
> if possible if use_remote_estimate is OFF.
>
> Your second scenario is essentially to estimate that a pushed down join will
> always be executed as a nested loop join, which will in most cases produce
> an unfairly negative estimate.
+1 to all that. Whatever we do here for costing in detail, it should
be set up so that the pushed-down join wins unless there's some pretty
tangible reason to think, in a particular case, that it will lose.
> What about using local statistics to come up with an estimated row count for
> the join and use that as the basis for an estimate? My idea here is that it
> is always be a win to push down a join unless the result set is so large that
> transferring it becomes the bottleneck.
This also sounds about right.
> Maybe, to come up with something remotely realistic, a formula like
>
> sum of locally estimated costs of sequential scan for the base table
> plus count of estimated result rows (times a factor)
Was this meant to say "the base tables", plural?
I think whatever we do here should try to extend the logic in
postgres_fdw's estimate_path_cost_size() to foreign tables in some
reasonably natural way, but I'm not sure exactly what that should look
like. Maybe do what that function currently does for single-table
scans, and then add all the values up, or something like that. I'm a
little worried, though, that the planner might then view a query that
will be executed remotely as a nested loop with inner index-scan as
not worth pushing down, because in that case the join actually will
not touch every row from both tables, as a hash or merge join would.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company