Thread: [COMMITTERS] pgsql: Fix cardinality estimates for parallel joins.
Fix cardinality estimates for parallel joins. For a partial path, the cardinality estimate needs to reflect the number of rows we think each worker will see, rather than the total number of rows; otherwise, costing will go wrong. The previous coding got this completely wrong for parallel joins. Unfortunately, this change may destabilize plans for users of 9.6 who have enabled parallel query, but since 9.6 is still fairly new I'm hoping expectations won't be too settled yet. Also, this is really a brown-paper-bag bug, so leaving it unfixed for the entire lifetime of 9.6 seems unwise. Related reports (whose import I initially failed to recognize) by Tomas Vondra and Tom Lane. Discussion: http://postgr.es/m/CA+TgmoaDxZ5z5Kw_oCQoymNxNoVaTCXzPaODcOuao=CzK8dMZw@mail.gmail.com Branch ------ REL9_6_STABLE Details ------- http://git.postgresql.org/pg/commitdiff/2d443ae1b0121e15265864d2b2143509fa70e8e4 Modified Files -------------- src/backend/optimizer/path/costsize.c | 74 +++++++++++++++++++++++------------ 1 file changed, 48 insertions(+), 26 deletions(-)
On Sat, Jan 14, 2017 at 12:07 AM, Robert Haas <rhaas@postgresql.org> wrote: > Fix cardinality estimates for parallel joins. > + /* + * In the case of a parallel plan, the row count needs to represent + * the number of tuples processed per worker. + */ + path->rows = clamp_row_est(path->rows / parallel_divisor); } path->startup_cost = startup_cost; @@ -2014,6 +1996,10 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path, else path->path.rows = path->path.parent->rows; + /* For partial paths, scale row estimate. */ + if (path->path.parallel_workers > 0) + path->path.rows /= get_parallel_divisor(&path->path); Isn't it better to call clamp_row_est in join costing functions as we are doing in cost_seqscan()? Is there a reason to keep those different? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com