Home > mailing lists

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not? - Mailing list pgsql-hackers

From	Andy Fan
Subject	Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Date	February 2, 2022 10:37:13
Msg-id	CAKU4AWrecbiuXaF4rVTji4=ow9VL-WBoZdjSCyQSDVmcgTL=Vw@mail.gmail.com Whole thread Raw
In response to	Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not? (Justin Pryzby <pryzby@telsasoft.com>)
Responses	Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
List	pgsql-hackers

Tree view

Hi Justin:

Thanks for your attention.

On Wed, Feb 2, 2022 at 1:13 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

> Subject: [PATCH v1 1/6] Rebaee David's patch against the latest code.

If you use git-am, then the author/commit information is preserved.
It's probably good to include a link to the patch in any case.

Thanks for this reminder, I didn't pay enough attention to this. Fixed.

(The original patch looks like a diff file not a commit, I wrote a simple commit
message for this and link to the origin discussion link.)

> Subject: [PATCH v1 4/6] remove duplicated qual executing.

---
src/backend/optimizer/path/equivclass.c | 22 +++++++++++++++++++
src/backend/optimizer/plan/createplan.c | 29 +++++++++++++++++++++++--
src/include/optimizer/paths.h | 2 ++
src/test/regress/parallel_schedule | 2 ++
4 files changed, 53 insertions(+), 2 deletions(-)

I think the ./ec_filter test is missing from from this patch.

Indeed..

> Subject: [PATCH v1 6/6] adding some test cases for this feature and fix the existing case
The tests should be updated with the corresponding patches. It's common for
the patches to be commited separately, like if 0001 is ready but the others are
still evolving.

Yes, I agree with this. Just that in this case, the commit split is just for easy
review/discussion. they are unlikely to be able to commit separately. so I keep

it as it was and improve each commit message.

I'm not sure whether you think this patch is ready to be added to a commitfest,
but do you know about the CI infrastructure ? It allows running all the cfbot
tests for a github branch against 4 OS, which helps catch portability issues,
including memory errors and unstable explain output. See: src/tools/ci/README.

Added. https://commitfest.postgresql.org/37/3524/

There's an failure in postgres_fdw, probably the output needs to be updated.

For the postgres_fdw, I just refreshed the content. with this patch, the plan changed

from

Foreign Scan
Output: ft5.*, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2
Relations: (public.ft5) INNER JOIN (public.ft4)
Remote SQL: SELECT CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1.c1, r1.c2, r1.c3) END, r1.c1, r1.c2, r1.c3, r2.c1, r2.c2 FROM ("S 1"."T 4" r1 INNER JOIN "S 1"."T 3" r2 ON (((r1.c1 = r2.c1)) AND ((r2.c1 >= 10)) AND ((r2.c1 <= 30)))) ORDER BY r1.c1 ASC NULLS LAST
(4 rows)

Sort (cost=108.02..108.04 rows=7 width=62)
Output: ft5.*, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2
Sort Key: ft5.c1
-> Foreign Scan (cost=100.00..107.92 rows=7 width=62)
Output: ft5.*, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2
Relations: (public.ft5) INNER JOIN (public.ft4)
Remote SQL: SELECT CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1.c1, r1.c2, r1.c3) END, r1.c1, r1.c2, r1.c3, r2.c1, r2.c2 FROM ("S 1"."T 4" r1 INNER JOIN "S 1"."T 3" r2 ON (((r1.c1 = r2.c1)) AND ((r2.c1 >= 10)) AND ((r2.c1 <= 30)) AND ((r1.c1 >= 10)) AND ((r1.
c1 <= 30))))

But if I set enable_sort = off, we can still get the previous plan, which proves that

this patch doesn't make the above path unavailable, it is just not cheaper than

the new one. Here is the new commit messages:

commit e0a7838a09e73f831eecb23b5e7884cc34d71301
Author: David Rowley <dgrowleyml@gmail.com>
Date: Tue Feb 1 20:56:40 2022 +0800

Introudce ec_filters in EquivalenceClass struct, the semantics is the quals can

be applied to any EquivalenceMember in this EC. Later this information is used
to generate new RestrictInfo and was distributed to related RelOptInfo very
soon.

Author: David Rowley at 2015-12 [1]
Andy Fan rebase this patch to current latest code.

https://www.postgresql.org/message-id/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A%40mail.gmail.com

commit 73f52d0909374446cd689457f0a4ef52addb035e
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Tue Feb 1 14:54:07 2022 +0800

After distributing the new derived RestrictInfo into RelOptInfo, then the rows
estimation is wrong at the joinrel part. The reason is well described at [1] and
[2], To fix this issue, I added a new field "EquivalenceClass *derived" in
RestrictInfo struct to indicate how this qual is generated. we would ignore such
qual during estimate of the rows size. All the set_xx_size should be taken care of, but
for now, just set_plain_rel_size is taken care of for the PoC purpose.

[1]
https://www.postgresql.org/message-id/flat/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/flat/1727507.1620948117%40sss.pgh.pa.us#52ac3f46cf614acb0bdbddb7128f5bd2

commit 8439b4818410d860a4ca4be3458b54c04c6f8648
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Tue Feb 1 15:20:10 2022 +0800

Introduce RelOptInfo.filtered_rows.

Previously the Path.rows (shown in the explain output) and RelOptInfo.rows
which would be used to calculating joinrel's estimated rows are same
at many scan paths, like SeqScan, IndexScan, BitmapHeapScan and so on. But
they would be different after distributing a new restrictinfo from ec_filter.
So I developed RelOptInfo.filtered_rows to take some duty out of RelOptInfo.rows.

commit 11b3395bb5bcc4a2bcff6fed8078dbbf3cda81b1
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Tue Feb 1 17:37:27 2022 +0800

Remove duplicated qual executing for executor.

Take the SELECT * FROM t1, t2 WHERE t1.a = t2.a and t2.a > 3 for example,
we can derive t1.a > 3 with EC filter infrastructure. However if it generate a
plan like below, the new generated qual does not deserve to execute.

Nest Loop
Seq Scan (t1.a > 3)
Index Scan t2_a
(a = t1.a) (t2.a > 3)

This patch removes the "t2.a > 3" for the above case.

commit 2875a76136293589b6e409cb6be4defab87ade59
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Wed Feb 2 11:54:24 2022 +0800

Support ScalarArrayOpExpr and perudoconstant on ef_filter.

commit a4b21ab6fd0fd57902f5471ec962a77b59085158 (HEAD -> cf_v4)
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Wed Feb 2 11:59:53 2022 +0800

Added the testcase for this feature and fix the previous test case

as well. The new added test case needs outputting some runtime
statistics, which will probably be different at each run. I can think
of a way to make the test case stable if the patchsets are not wrong
at the first step.

Best Regards

Andy Fan

pgsql-hackers by date:

From: "David G. Johnston"
Date: 02 February 2022, 10:36:08
Subject: Re: Design of pg_stat_subscription_workers vs pgstats

From: Kyotaro Horiguchi
Date: 02 February 2022, 10:49:57
Subject: Re: RFC: Logging plan of the running query

Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not? - Mailing list pgsql-hackers

Previous

Next