Thread: Updating row and width estimates in postgres_fdw

Updating row and width estimates in postgres_fdw

From
Ashutosh Bapat
Date:
Hi,
In postgresGetForeignJoinPaths(), I see

   /* Estimate costs for bare join relation */
    estimate_path_cost_size(root, joinrel, NIL, NIL, NULL,
                            &rows, &width, &startup_cost, &total_cost);
    /* Now update this information in the joinrel */
    joinrel->rows = rows;
    joinrel->reltarget->width = width;

This code is good as well as bad.

For a join relation, we estimate the number of rows in set_joinrel_size_estimates() inside build_*_join_rel() and set the width of the join when building the targetlist. For foreign join, the size estimates may not be correct but width estimate should be. So updating the number of rows looks good since it would be better than what set_joinrel_size_etimates() might come up with but here are the problems with this code
1. The rows estimated by estimate_path_cost_size() are better only when use_remote_estimates is true. So, we should be doing this only when use_remote_estimate is true.
2. This function gets called after local paths for the first pair for this join have been added. So those paths are not being judged fairly and perhaps we might be throwing away better paths just because the local estimates with which they were created were very different from the remote estimates.

A better way would be to get the estimates and setup fpinfo for a joinrel in build_join_rel() and later add paths similar to what we do for base relations. That means we split the current hook GetForeignJoinPaths into two - one to get estimates and the other to setup fpinfo.

Comments?
--
Best Wishes,
Ashutosh Bapat

Re: Updating row and width estimates in postgres_fdw

From
Etsuro Fujita
Date:
Hi Ashutosh,

Long time no see!

On Thu, Feb 13, 2020 at 1:17 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> In postgresGetForeignJoinPaths(), I see
>
>    /* Estimate costs for bare join relation */
>     estimate_path_cost_size(root, joinrel, NIL, NIL, NULL,
>                             &rows, &width, &startup_cost, &total_cost);
>     /* Now update this information in the joinrel */
>     joinrel->rows = rows;
>     joinrel->reltarget->width = width;
>
> This code is good as well as bad.
>
> For a join relation, we estimate the number of rows in set_joinrel_size_estimates() inside build_*_join_rel() and set
thewidth of the join when building the targetlist. For foreign join, the size estimates may not be correct but width
estimateshould be. So updating the number of rows looks good since it would be better than what
set_joinrel_size_etimates()might come up with but here are the problems with this code 
> 1. The rows estimated by estimate_path_cost_size() are better only when use_remote_estimates is true. So, we should
bedoing this only when use_remote_estimate is true. 

I think it's actually harmless to do that even when
use_remote_estimate=false because in that case we get the rows
estimate from joinrel->rows in estimate_path_cost_size() and return to
the caller the estimate as-is, IIRC.

> 2. This function gets called after local paths for the first pair for this join have been added. So those paths are
notbeing judged fairly and perhaps we might be throwing away better paths just because the local estimates with which
theywere created were very different from the remote estimates. 

Yeah, but I'm not sure we really need to fix that because I think the
remote-join path would usually win against any of local-join paths.
Could you show me an example causing an issue?

Best regards,
Etsuro Fujita