Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not? - Mailing list pgsql-hackers
From | Andy Fan |
---|---|
Subject | Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not? |
Date | |
Msg-id | CAKU4AWrecbiuXaF4rVTji4=ow9VL-WBoZdjSCyQSDVmcgTL=Vw@mail.gmail.com Whole thread Raw |
In response to | Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not? (Justin Pryzby <pryzby@telsasoft.com>) |
Responses |
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
|
List | pgsql-hackers |
Hi Justin:
Thanks for your attention.
On Wed, Feb 2, 2022 at 1:13 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> Subject: [PATCH v1 1/6] Rebaee David's patch against the latest code.
If you use git-am, then the author/commit information is preserved.
It's probably good to include a link to the patch in any case.
Thanks for this reminder, I didn't pay enough attention to this. Fixed.
(The original patch looks like a diff file not a commit, I wrote a simple commit
message for this and link to the origin discussion link.)
message for this and link to the origin discussion link.)
> Subject: [PATCH v1 4/6] remove duplicated qual executing.
---
src/backend/optimizer/path/equivclass.c | 22 +++++++++++++++++++
src/backend/optimizer/plan/createplan.c | 29 +++++++++++++++++++++++--
src/include/optimizer/paths.h | 2 ++
src/test/regress/parallel_schedule | 2 ++
4 files changed, 53 insertions(+), 2 deletions(-)
I think the ./ec_filter test is missing from from this patch.
Indeed..
> Subject: [PATCH v1 6/6] adding some test cases for this feature and fix the existing case
The tests should be updated with the corresponding patches. It's common for
the patches to be commited separately, like if 0001 is ready but the others are
still evolving.
review/discussion. they are unlikely to be able to commit separately. so I keep
it as it was and improve each commit message.
I'm not sure whether you think this patch is ready to be added to a commitfest,
but do you know about the CI infrastructure ? It allows running all the cfbot
tests for a github branch against 4 OS, which helps catch portability issues,
including memory errors and unstable explain output. See: src/tools/ci/README.
There's an failure in postgres_fdw, probably the output needs to be updated.
For the postgres_fdw, I just refreshed the content. with this patch, the plan changed
from
Foreign Scan
Output: ft5.*, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2
Relations: (public.ft5) INNER JOIN (public.ft4)
Remote SQL: SELECT CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1.c1, r1.c2, r1.c3) END, r1.c1, r1.c2, r1.c3, r2.c1, r2.c2 FROM ("S 1"."T 4" r1 INNER JOIN "S 1"."T 3" r2 ON (((r1.c1 = r2.c1)) AND ((r2.c1 >= 10)) AND ((r2.c1 <= 30)))) ORDER BY r1.c1 ASC NULLS LAST
(4 rows)
Output: ft5.*, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2
Relations: (public.ft5) INNER JOIN (public.ft4)
Remote SQL: SELECT CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1.c1, r1.c2, r1.c3) END, r1.c1, r1.c2, r1.c3, r2.c1, r2.c2 FROM ("S 1"."T 4" r1 INNER JOIN "S 1"."T 3" r2 ON (((r1.c1 = r2.c1)) AND ((r2.c1 >= 10)) AND ((r2.c1 <= 30)))) ORDER BY r1.c1 ASC NULLS LAST
(4 rows)
to
Sort (cost=108.02..108.04 rows=7 width=62)
Output: ft5.*, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2
Sort Key: ft5.c1
-> Foreign Scan (cost=100.00..107.92 rows=7 width=62)
Output: ft5.*, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2
Relations: (public.ft5) INNER JOIN (public.ft4)
Remote SQL: SELECT CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1.c1, r1.c2, r1.c3) END, r1.c1, r1.c2, r1.c3, r2.c1, r2.c2 FROM ("S 1"."T 4" r1 INNER JOIN "S 1"."T 3" r2 ON (((r1.c1 = r2.c1)) AND ((r2.c1 >= 10)) AND ((r2.c1 <= 30)) AND ((r1.c1 >= 10)) AND ((r1.
c1 <= 30))))
Output: ft5.*, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2
Sort Key: ft5.c1
-> Foreign Scan (cost=100.00..107.92 rows=7 width=62)
Output: ft5.*, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2
Relations: (public.ft5) INNER JOIN (public.ft4)
Remote SQL: SELECT CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1.c1, r1.c2, r1.c3) END, r1.c1, r1.c2, r1.c3, r2.c1, r2.c2 FROM ("S 1"."T 4" r1 INNER JOIN "S 1"."T 3" r2 ON (((r1.c1 = r2.c1)) AND ((r2.c1 >= 10)) AND ((r2.c1 <= 30)) AND ((r1.c1 >= 10)) AND ((r1.
c1 <= 30))))
But if I set enable_sort = off, we can still get the previous plan, which proves that
this patch doesn't make the above path unavailable, it is just not cheaper than
the new one. Here is the new commit messages:
commit e0a7838a09e73f831eecb23b5e7884cc34d71301
Author: David Rowley <dgrowleyml@gmail.com>
Date: Tue Feb 1 20:56:40 2022 +0800
Introudce ec_filters in EquivalenceClass struct, the semantics is the quals can
be applied to any EquivalenceMember in this EC. Later this information is used
to generate new RestrictInfo and was distributed to related RelOptInfo very
soon.
Author: David Rowley at 2015-12 [1]
Andy Fan rebase this patch to current latest code.
https://www.postgresql.org/message-id/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A%40mail.gmail.com
commit 73f52d0909374446cd689457f0a4ef52addb035e
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Tue Feb 1 14:54:07 2022 +0800
After distributing the new derived RestrictInfo into RelOptInfo, then the rows
estimation is wrong at the joinrel part. The reason is well described at [1] and
[2], To fix this issue, I added a new field "EquivalenceClass *derived" in
RestrictInfo struct to indicate how this qual is generated. we would ignore such
qual during estimate of the rows size. All the set_xx_size should be taken care of, but
for now, just set_plain_rel_size is taken care of for the PoC purpose.
[1]
https://www.postgresql.org/message-id/flat/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/flat/1727507.1620948117%40sss.pgh.pa.us#52ac3f46cf614acb0bdbddb7128f5bd2
commit 8439b4818410d860a4ca4be3458b54c04c6f8648
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Tue Feb 1 15:20:10 2022 +0800
Introduce RelOptInfo.filtered_rows.
Previously the Path.rows (shown in the explain output) and RelOptInfo.rows
which would be used to calculating joinrel's estimated rows are same
at many scan paths, like SeqScan, IndexScan, BitmapHeapScan and so on. But
they would be different after distributing a new restrictinfo from ec_filter.
So I developed RelOptInfo.filtered_rows to take some duty out of RelOptInfo.rows.
commit 11b3395bb5bcc4a2bcff6fed8078dbbf3cda81b1
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Tue Feb 1 17:37:27 2022 +0800
Remove duplicated qual executing for executor.
Take the SELECT * FROM t1, t2 WHERE t1.a = t2.a and t2.a > 3 for example,
we can derive t1.a > 3 with EC filter infrastructure. However if it generate a
plan like below, the new generated qual does not deserve to execute.
Nest Loop
Seq Scan (t1.a > 3)
Index Scan t2_a
(a = t1.a) (t2.a > 3)
This patch removes the "t2.a > 3" for the above case.
commit 2875a76136293589b6e409cb6be4defab87ade59
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Wed Feb 2 11:54:24 2022 +0800
Support ScalarArrayOpExpr and perudoconstant on ef_filter.
commit a4b21ab6fd0fd57902f5471ec962a77b59085158 (HEAD -> cf_v4)
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Wed Feb 2 11:59:53 2022 +0800
Added the testcase for this feature and fix the previous test case
as well. The new added test case needs outputting some runtime
statistics, which will probably be different at each run. I can think
of a way to make the test case stable if the patchsets are not wrong
at the first step.
Author: David Rowley <dgrowleyml@gmail.com>
Date: Tue Feb 1 20:56:40 2022 +0800
Introudce ec_filters in EquivalenceClass struct, the semantics is the quals can
be applied to any EquivalenceMember in this EC. Later this information is used
to generate new RestrictInfo and was distributed to related RelOptInfo very
soon.
Author: David Rowley at 2015-12 [1]
Andy Fan rebase this patch to current latest code.
https://www.postgresql.org/message-id/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A%40mail.gmail.com
commit 73f52d0909374446cd689457f0a4ef52addb035e
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Tue Feb 1 14:54:07 2022 +0800
After distributing the new derived RestrictInfo into RelOptInfo, then the rows
estimation is wrong at the joinrel part. The reason is well described at [1] and
[2], To fix this issue, I added a new field "EquivalenceClass *derived" in
RestrictInfo struct to indicate how this qual is generated. we would ignore such
qual during estimate of the rows size. All the set_xx_size should be taken care of, but
for now, just set_plain_rel_size is taken care of for the PoC purpose.
[1]
https://www.postgresql.org/message-id/flat/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/flat/1727507.1620948117%40sss.pgh.pa.us#52ac3f46cf614acb0bdbddb7128f5bd2
commit 8439b4818410d860a4ca4be3458b54c04c6f8648
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Tue Feb 1 15:20:10 2022 +0800
Introduce RelOptInfo.filtered_rows.
Previously the Path.rows (shown in the explain output) and RelOptInfo.rows
which would be used to calculating joinrel's estimated rows are same
at many scan paths, like SeqScan, IndexScan, BitmapHeapScan and so on. But
they would be different after distributing a new restrictinfo from ec_filter.
So I developed RelOptInfo.filtered_rows to take some duty out of RelOptInfo.rows.
commit 11b3395bb5bcc4a2bcff6fed8078dbbf3cda81b1
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Tue Feb 1 17:37:27 2022 +0800
Remove duplicated qual executing for executor.
Take the SELECT * FROM t1, t2 WHERE t1.a = t2.a and t2.a > 3 for example,
we can derive t1.a > 3 with EC filter infrastructure. However if it generate a
plan like below, the new generated qual does not deserve to execute.
Nest Loop
Seq Scan (t1.a > 3)
Index Scan t2_a
(a = t1.a) (t2.a > 3)
This patch removes the "t2.a > 3" for the above case.
commit 2875a76136293589b6e409cb6be4defab87ade59
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Wed Feb 2 11:54:24 2022 +0800
Support ScalarArrayOpExpr and perudoconstant on ef_filter.
commit a4b21ab6fd0fd57902f5471ec962a77b59085158 (HEAD -> cf_v4)
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Wed Feb 2 11:59:53 2022 +0800
Added the testcase for this feature and fix the previous test case
as well. The new added test case needs outputting some runtime
statistics, which will probably be different at each run. I can think
of a way to make the test case stable if the patchsets are not wrong
at the first step.
Best Regards
Andy Fan
pgsql-hackers by date: