Re: [HACKERS] Re: Improve OR conditions on joined columns (commonstar schema problem) - Mailing list pgsql-hackers

From Noah Misch
Subject Re: [HACKERS] Re: Improve OR conditions on joined columns (commonstar schema problem)
Date
Msg-id 20180904015910.GA1797012@rfd.leadboat.com
Whole thread Raw
In response to Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Re: Improve OR conditions on joined columns (commonstar schema problem)
List pgsql-hackers
On Sun, Feb 12, 2017 at 09:32:36AM -0800, Tom Lane wrote:
> It's not so much poor choices as the cost of the optimization attempt ---
> if there's a K-relation OR clause, this will increase the cost of planning
> by a factor approaching K+1, whether or not you get a better plan out of
> it.  I ran the regression tests with some instrumentation and determined
> that this logic fires a dozen or two times, and fails to produce a plan
> that looks cheaper than the standard plan in any of those cases.  So if we
> go down this road, not only do we need a GUC but I suspect it had better
> default to off; only people using star schemas are really likely to get a
> win out of it.

I benchmarked using the attached script.  Highlights:

$ perl mk-bench-union-or.pl --dimensions=11 --tests=const | psql -X
# TRAP: FailedAssertion("!(uniq_exprs != ((List *) ((void *)0)))", File: "planunionor.c", Line: 335)

$ perl mk-bench-union-or.pl --dimensions=11 --tests=var --exhaustive --values-per-dimension=0 | psql -X
# TRAP: FailedAssertion("!(list_length(dest_tlist) == list_length(src_tlist))", File: "tlist.c", Line: 344)

$ perl mk-bench-union-or.pl --dimensions=7 --or-clauses=20 --exhaustive --tests=const,rowmark | psql -X
... (with planunionor.c plan)
 Planning Time: 1902.013 ms
 Execution Time: 291.629 ms
... (without planunionor.c plan)
 Planning Time: 11.100 ms
 Execution Time: 64.452 ms
# planning 170x slower, chosen plan 4.5x slower

I agree this warrants a GUC, but I propose a goal of making it fitting to
enable by default.  The is_suitable_join_or_clause() test is a good indicator
of promising queries, and I suspect it's cheap enough to run all the time.  As
a DBA, I'd struggle to predict when is_suitable_join_or_clause() will pass
while the optimization as a whole will lose; I would resort to testing each
important query both ways.  In other words, the GUC would boil down to a
planner hint, not to a declaration about the table schema.

On Thu, Aug 23, 2018 at 02:10:46PM -0400, Tom Lane wrote:
> Rebased up to HEAD, per cfbot nagging.  Still no substantive change from
> v2.

> +  * is retty mechanical, but we can't do it until we have a RelOptInfo for the

Jim Nasby had pointed out this s/retty/pretty/ typo.

> + void
> + finish_union_or_paths(PlannerInfo *root, RelOptInfo *joinrel,
> +                       List *union_or_subpaths)
> + {
...
> +         pathnode = makeNode(UniquePath);
...
> +         /* Estimate number of output rows */
> +         pathnode->path.rows = appendpath->rows;

If you're going to keep this highly-simplified estimate, please expand the
comment to say why it doesn't matter or what makes it hard to do better.  The
non-planunionor.c path for the same query computes its own estimate of the
same underlying quantity.  Though it may be too difficult in today's system,
one could copy the competing path's row count estimate here.  Perhaps it
doesn't matter because higher-level processing already assumes equal row count
among competitors?

Attachment

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: pointless check in RelationBuildPartitionDesc
Next
From: Amit Kapila
Date:
Subject: Re: pg_verify_checksums failure with hash indexes