Thread: Rowcounts marked by create_foreignscan_path()

Rowcounts marked by create_foreignscan_path()

From
Etsuro Fujita
Date:
Why does create_foreignscan_path() not set the rowcounts based on
ParamPathInfo when the path is a parameterized path?  Please find
attached a patch.

Thanks,

Best regards,
Etsuro Fujita

Attachment

Re: Rowcounts marked by create_foreignscan_path()

From
Tom Lane
Date:
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:
> Why does create_foreignscan_path() not set the rowcounts based on
> ParamPathInfo when the path is a parameterized path?

The calling FDW is supposed to do that; note the header comment.
I'm not sure that it'd be an improvement to change the API spec
to be "create_foreignscan_path has no intelligence, except that
sometimes it will decide to override your rows estimate anyway;
nonetheless, it takes your cost estimate as gospel".
        regards, tom lane



Re: Rowcounts marked by create_foreignscan_path()

From
Etsuro Fujita
Date:
(2014/02/18 12:03), Tom Lane wrote:
> Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:
>> Why does create_foreignscan_path() not set the rowcounts based on
>> ParamPathInfo when the path is a parameterized path?

> The calling FDW is supposed to do that; note the header comment.

Understood.  However, ISTM postgresGetForeignPaths() doesn't work like
that.  It uses the same rowcount for all paths of a same parameterization?

Thanks,

Best regards,
Etsuro Fujita



Re: Rowcounts marked by create_foreignscan_path()

From
Tom Lane
Date:
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:
> (2014/02/18 12:03), Tom Lane wrote:
>> The calling FDW is supposed to do that; note the header comment.

> Understood.  However, ISTM postgresGetForeignPaths() doesn't work like
> that.  It uses the same rowcount for all paths of a same parameterization?

That's what we want no?

Anyway, the point of using ppi_rows would be to enforce that all the
rowcount estimates for a given parameterized relation are the same.
In the FDW case, all those estimates are the FDW's responsibility,
and so making them consistent is also its responsibility IMO.

Another way of looking at this is that none of the pathnode creation
routines in pathnode.c are responsible for setting rowcount estimates.
That's done by whatever is setting the cost estimate; this must be so,
else the cost estimate is surely bogus.  So any way you slice it, the
FDW has to get it right.
        regards, tom lane



Re: Rowcounts marked by create_foreignscan_path()

From
Etsuro Fujita
Date:
(2014/02/18 12:37), Tom Lane wrote:
> Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:
>> (2014/02/18 12:03), Tom Lane wrote:
>>> The calling FDW is supposed to do that; note the header comment.
>
>> However, ISTM postgresGetForeignPaths() doesn't work like
>> that.  It uses the same rowcount for all paths of a same parameterization?
>
> That's what we want no?

Maybe I'm missing something.  But I don't think
postgresGetForeignPaths() marks the parameterized path with the correct
row estimate.  Also, that function doesn't seem to estimate the cost of
the parameterized path correctly.  Please find attached a patch.

> Anyway, the point of using ppi_rows would be to enforce that all the
> rowcount estimates for a given parameterized relation are the same.
> In the FDW case, all those estimates are the FDW's responsibility,
> and so making them consistent is also its responsibility IMO.
>
> Another way of looking at this is that none of the pathnode creation
> routines in pathnode.c are responsible for setting rowcount estimates.
> That's done by whatever is setting the cost estimate; this must be so,
> else the cost estimate is surely bogus.  So any way you slice it, the
> FDW has to get it right.

Understood.  Thank you for the analysis.

Sorry for the delay.

Best regards,
Etsuro Fujita

Attachment

Re: Rowcounts marked by create_foreignscan_path()

From
Tom Lane
Date:
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes:
> Maybe I'm missing something.  But I don't think
> postgresGetForeignPaths() marks the parameterized path with the correct
> row estimate.  Also, that function doesn't seem to estimate the cost of
> the parameterized path correctly.  Please find attached a patch.

[ looks at that... ]  Oh, I see what you're worried about: we select
a specific "safe" join clause to drive creation of the parameterized
path, but then there might be additional join clauses that come along
with that because they're also movable to the parameterized path.
You're right, the current postgres_fdw code is not accounting for that.

I experimented with this patch using this test case in the database
created by the postgres_fdw regression test:

explain verbose SELECT * FROM ft2 a, ft2 b WHERE a.c2 = 47 AND b.c1 = a.c1 and a.c6 = 'fooo' and b.c7=a.c7;

What I saw in this example was that with the patch, the join clause got
counted *twice* in the cost estimate, because the clause returned by
generate_implied_equalities_for_column() isn't pointer-equal to the one in
the ppi_clauses list (note the comment about such cases in
postgresGetForeignPlan).  So that's not right.

However, we've got bigger problems than that: with or without the patch,

explain verbose SELECT * FROM ft2 a, ft2 b WHERE a.c2 = 47 AND b.c1 = a.c1 and a.c6 = 'fooo' and b.c7=upper(a.c7);

fails with
TRAP: FailedAssertion("!(is_foreign_expr(root, baserel, rinfo->clause))", File: "postgres_fdw.c", Line: 759)

This happens because we select the "safe" joinclause b.c1=a.c1, and then
the PPI machinery adds all the other movable join clauses for that outer
relation, including the unsafe one involving an upper() call.
postgresGetForeignPlan believes that the only joinclauses it can find in
scan_clauses are the safe one(s) selected by postgresGetForeignPaths, but
that's completely wrong.  I think this is probably relatively simple to
fix: in postgresGetForeignPlan, we need to actually test whether a join
clause is safe or not, and shove it into the appropriate list.

So far as the original issue goes, I think what this consideration shows
is that postgresGetForeignPaths is going about things incorrectly.
I thought that considering only one join clause per path would be valid,
if perhaps sometimes dumb.  But the way the movable-clause machinery works,
we don't get to consider only one join clause; it's all-or-nothing for a
given outer relation.  So what we really ought to be doing here is
generating one parameterized path per valid outer relation.

What we could possibly do to resolve this without much extra code is to
keep using the same logic as a guide to which outer relations are
interesting, but for each such relation, do get_baserel_parampathinfo()
and then use the ppi_clauses list as the quals-to-apply for purposes of
cost/selectivity estimation.  postgresGetForeignPaths would have to check
each of those clauses for safety rather than assuming anything.  One
possible benefit is that we could record the results in the path and not
have to repeat the is_foreign_expr tests in postgresGetForeignPlan.
I'm not entirely sure if that's worth the trouble though.
        regards, tom lane