Thread: Rowcounts marked by create_foreignscan_path()
Why does create_foreignscan_path() not set the rowcounts based on ParamPathInfo when the path is a parameterized path? Please find attached a patch. Thanks, Best regards, Etsuro Fujita
Attachment
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes: > Why does create_foreignscan_path() not set the rowcounts based on > ParamPathInfo when the path is a parameterized path? The calling FDW is supposed to do that; note the header comment. I'm not sure that it'd be an improvement to change the API spec to be "create_foreignscan_path has no intelligence, except that sometimes it will decide to override your rows estimate anyway; nonetheless, it takes your cost estimate as gospel". regards, tom lane
(2014/02/18 12:03), Tom Lane wrote: > Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes: >> Why does create_foreignscan_path() not set the rowcounts based on >> ParamPathInfo when the path is a parameterized path? > The calling FDW is supposed to do that; note the header comment. Understood. However, ISTM postgresGetForeignPaths() doesn't work like that. It uses the same rowcount for all paths of a same parameterization? Thanks, Best regards, Etsuro Fujita
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes: > (2014/02/18 12:03), Tom Lane wrote: >> The calling FDW is supposed to do that; note the header comment. > Understood. However, ISTM postgresGetForeignPaths() doesn't work like > that. It uses the same rowcount for all paths of a same parameterization? That's what we want no? Anyway, the point of using ppi_rows would be to enforce that all the rowcount estimates for a given parameterized relation are the same. In the FDW case, all those estimates are the FDW's responsibility, and so making them consistent is also its responsibility IMO. Another way of looking at this is that none of the pathnode creation routines in pathnode.c are responsible for setting rowcount estimates. That's done by whatever is setting the cost estimate; this must be so, else the cost estimate is surely bogus. So any way you slice it, the FDW has to get it right. regards, tom lane
(2014/02/18 12:37), Tom Lane wrote: > Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes: >> (2014/02/18 12:03), Tom Lane wrote: >>> The calling FDW is supposed to do that; note the header comment. > >> However, ISTM postgresGetForeignPaths() doesn't work like >> that. It uses the same rowcount for all paths of a same parameterization? > > That's what we want no? Maybe I'm missing something. But I don't think postgresGetForeignPaths() marks the parameterized path with the correct row estimate. Also, that function doesn't seem to estimate the cost of the parameterized path correctly. Please find attached a patch. > Anyway, the point of using ppi_rows would be to enforce that all the > rowcount estimates for a given parameterized relation are the same. > In the FDW case, all those estimates are the FDW's responsibility, > and so making them consistent is also its responsibility IMO. > > Another way of looking at this is that none of the pathnode creation > routines in pathnode.c are responsible for setting rowcount estimates. > That's done by whatever is setting the cost estimate; this must be so, > else the cost estimate is surely bogus. So any way you slice it, the > FDW has to get it right. Understood. Thank you for the analysis. Sorry for the delay. Best regards, Etsuro Fujita
Attachment
Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> writes: > Maybe I'm missing something. But I don't think > postgresGetForeignPaths() marks the parameterized path with the correct > row estimate. Also, that function doesn't seem to estimate the cost of > the parameterized path correctly. Please find attached a patch. [ looks at that... ] Oh, I see what you're worried about: we select a specific "safe" join clause to drive creation of the parameterized path, but then there might be additional join clauses that come along with that because they're also movable to the parameterized path. You're right, the current postgres_fdw code is not accounting for that. I experimented with this patch using this test case in the database created by the postgres_fdw regression test: explain verbose SELECT * FROM ft2 a, ft2 b WHERE a.c2 = 47 AND b.c1 = a.c1 and a.c6 = 'fooo' and b.c7=a.c7; What I saw in this example was that with the patch, the join clause got counted *twice* in the cost estimate, because the clause returned by generate_implied_equalities_for_column() isn't pointer-equal to the one in the ppi_clauses list (note the comment about such cases in postgresGetForeignPlan). So that's not right. However, we've got bigger problems than that: with or without the patch, explain verbose SELECT * FROM ft2 a, ft2 b WHERE a.c2 = 47 AND b.c1 = a.c1 and a.c6 = 'fooo' and b.c7=upper(a.c7); fails with TRAP: FailedAssertion("!(is_foreign_expr(root, baserel, rinfo->clause))", File: "postgres_fdw.c", Line: 759) This happens because we select the "safe" joinclause b.c1=a.c1, and then the PPI machinery adds all the other movable join clauses for that outer relation, including the unsafe one involving an upper() call. postgresGetForeignPlan believes that the only joinclauses it can find in scan_clauses are the safe one(s) selected by postgresGetForeignPaths, but that's completely wrong. I think this is probably relatively simple to fix: in postgresGetForeignPlan, we need to actually test whether a join clause is safe or not, and shove it into the appropriate list. So far as the original issue goes, I think what this consideration shows is that postgresGetForeignPaths is going about things incorrectly. I thought that considering only one join clause per path would be valid, if perhaps sometimes dumb. But the way the movable-clause machinery works, we don't get to consider only one join clause; it's all-or-nothing for a given outer relation. So what we really ought to be doing here is generating one parameterized path per valid outer relation. What we could possibly do to resolve this without much extra code is to keep using the same logic as a guide to which outer relations are interesting, but for each such relation, do get_baserel_parampathinfo() and then use the ppi_clauses list as the quals-to-apply for purposes of cost/selectivity estimation. postgresGetForeignPaths would have to check each of those clauses for safety rather than assuming anything. One possible benefit is that we could record the results in the path and not have to repeat the is_foreign_expr tests in postgresGetForeignPlan. I'm not entirely sure if that's worth the trouble though. regards, tom lane