Re: Costing foreign joins in postgres_fdw - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Costing foreign joins in postgres_fdw
Date
Msg-id CA+TgmoZbbnCX_9c=kqUis9cMUb61GO+5EJP7rMCigVmYupOXzQ@mail.gmail.com
Whole thread Raw
In response to Re: Costing foreign joins in postgres_fdw  (Albe Laurenz <laurenz.albe@wien.gv.at>)
Responses Re: Costing foreign joins in postgres_fdw  (Albe Laurenz <laurenz.albe@wien.gv.at>)
List pgsql-hackers
On Fri, Dec 18, 2015 at 8:09 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
> My gut feeling is that for a join where all join predicates can be pushed down, it
> will usually be a win to push the join to the foreign server.
>
> So in your first scenario, I'd opt for always pushing down the join
> if possible if use_remote_estimate is OFF.
>
> Your second scenario is essentially to estimate that a pushed down join will
> always be executed as a nested loop join, which will in most cases produce
> an unfairly negative estimate.

+1 to all that.  Whatever we do here for costing in detail, it should
be set up so that the pushed-down join wins unless there's some pretty
tangible reason to think, in a particular case, that it will lose.

> What about using local statistics to come up with an estimated row count for
> the join and use that as the basis for an estimate?  My idea here is that it
> is always be a win to push down a join unless the result set is so large that
> transferring it becomes the bottleneck.

This also sounds about right.

> Maybe, to come up with something remotely realistic, a formula like
>
> sum of locally estimated costs of sequential scan for the base table
> plus count of estimated result rows (times a factor)

Was this meant to say "the base tables", plural?

I think whatever we do here should try to extend the logic in
postgres_fdw's estimate_path_cost_size() to foreign tables in some
reasonably natural way, but I'm not sure exactly what that should look
like.  Maybe do what that function currently does for single-table
scans, and then add all the values up, or something like that.  I'm a
little worried, though, that the planner might then view a query that
will be executed remotely as a nested loop with inner index-scan as
not worth pushing down, because in that case the join actually will
not touch every row from both tables, as a hash or merge join would.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Speed up Clog Access by increasing CLOG buffers
Next
From: Stephen Frost
Date:
Subject: Re: [COMMITTERS] pgsql: Handle policies during DROP OWNED BY