Thread: [COMMITTERS] pgsql: Fix cardinality estimates for parallel joins.

[COMMITTERS] pgsql: Fix cardinality estimates for parallel joins.

From
Robert Haas
Date:
Fix cardinality estimates for parallel joins.

For a partial path, the cardinality estimate needs to reflect the
number of rows we think each worker will see, rather than the total
number of rows; otherwise, costing will go wrong.  The previous coding
got this completely wrong for parallel joins.

Unfortunately, this change may destabilize plans for users of 9.6 who
have enabled parallel query, but since 9.6 is still fairly new I'm
hoping expectations won't be too settled yet.  Also, this is really a
brown-paper-bag bug, so leaving it unfixed for the entire lifetime of
9.6 seems unwise.

Related reports (whose import I initially failed to recognize) by
Tomas Vondra and Tom Lane.

Discussion: http://postgr.es/m/CA+TgmoaDxZ5z5Kw_oCQoymNxNoVaTCXzPaODcOuao=CzK8dMZw@mail.gmail.com

Branch
------
REL9_6_STABLE

Details
-------
http://git.postgresql.org/pg/commitdiff/2d443ae1b0121e15265864d2b2143509fa70e8e4

Modified Files
--------------
src/backend/optimizer/path/costsize.c | 74 +++++++++++++++++++++++------------
1 file changed, 48 insertions(+), 26 deletions(-)


Re: [COMMITTERS] pgsql: Fix cardinality estimates for parallel joins.

From
Amit Kapila
Date:
On Sat, Jan 14, 2017 at 12:07 AM, Robert Haas <rhaas@postgresql.org> wrote:
> Fix cardinality estimates for parallel joins.
>

+       /*
+        * In the case of a parallel plan, the row count needs to represent
+        * the number of tuples processed per worker.
+        */
+       path->rows = clamp_row_est(path->rows / parallel_divisor);
    }

    path->startup_cost = startup_cost;
@@ -2014,6 +1996,10 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
    else
        path->path.rows = path->path.parent->rows;

+   /* For partial paths, scale row estimate. */
+   if (path->path.parallel_workers > 0)
+       path->path.rows /= get_parallel_divisor(&path->path);


Isn't it better to call clamp_row_est in join costing functions as we
are doing in cost_seqscan()?  Is there a reason to keep those
different?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com