Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not? - Mailing list pgsql-hackers

From Andy Fan
Subject Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Date
Msg-id CAKU4AWrecbiuXaF4rVTji4=ow9VL-WBoZdjSCyQSDVmcgTL=Vw@mail.gmail.com
Whole thread Raw
In response to Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
Hi Justin:

   Thanks for your attention. 

On Wed, Feb 2, 2022 at 1:13 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> Subject: [PATCH v1 1/6] Rebaee David's patch against the latest code.

If you use git-am, then the author/commit information is preserved.
It's probably good to include a link to the patch in any case.


Thanks for this reminder, I didn't pay enough attention to this.  Fixed.

(The original patch looks like a diff file not a commit, I wrote a simple commit
message for this and link to the origin discussion link.)

 
> Subject: [PATCH v1 4/6] remove duplicated qual executing.

---                                                                                                                                                                                             
 src/backend/optimizer/path/equivclass.c | 22 +++++++++++++++++++                                                                                                                               
 src/backend/optimizer/plan/createplan.c | 29 +++++++++++++++++++++++--                                                                                                                         
 src/include/optimizer/paths.h           |  2 ++                                                                                                                                                 
 src/test/regress/parallel_schedule      |  2 ++                                                                                                                                                 
 4 files changed, 53 insertions(+), 2 deletions(-)                                                                                                                                               

I think the ./ec_filter test is missing from from this patch.


Indeed..
 
> Subject: [PATCH v1 6/6] adding some test cases for this feature and fix the existing case                                                                                                                                                                                   
The tests should be updated with the corresponding patches.  It's common for
the patches to be commited separately, like if 0001 is ready but the others are
still evolving.

Yes, I agree with this. Just that in this case, the commit split is just for easy
review/discussion. they are unlikely to be able to commit separately. so I keep
it as it was and improve each commit message.
 

I'm not sure whether you think this patch is ready to be added to a commitfest,
but do you know about the CI infrastructure ?  It allows running all the cfbot
tests for a github branch against 4 OS, which helps catch portability issues,
including memory errors and unstable explain output.  See: src/tools/ci/README.

 
There's an failure in postgres_fdw, probably the output needs to be updated.

For the postgres_fdw, I just refreshed the content. with this patch, the plan changed
from 

 Foreign Scan
   Output: ft5.*, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2
   Relations: (public.ft5) INNER JOIN (public.ft4)
   Remote SQL: SELECT CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1.c1, r1.c2, r1.c3) END, r1.c1, r1.c2, r1.c3, r2.c1, r2.c2 FROM ("S 1"."T 4" r1 INNER JOIN "S 1"."T 3" r2 ON (((r1.c1 = r2.c1)) AND ((r2.c1 >= 10)) AND ((r2.c1 <= 30)))) ORDER BY r1.c1 ASC NULLS LAST
(4 rows)

to

 Sort  (cost=108.02..108.04 rows=7 width=62)
   Output: ft5.*, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2
   Sort Key: ft5.c1
   ->  Foreign Scan  (cost=100.00..107.92 rows=7 width=62)
         Output: ft5.*, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2
         Relations: (public.ft5) INNER JOIN (public.ft4)
         Remote SQL: SELECT CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1.c1, r1.c2, r1.c3) END, r1.c1, r1.c2, r1.c3, r2.c1, r2.c2 FROM ("S 1"."T 4" r1 INNER JOIN "S 1"."T 3" r2 ON (((r1.c1 = r2.c1)) AND ((r2.c1 >= 10)) AND ((r2.c1 <= 30)) AND ((r1.c1 >= 10)) AND ((r1.
c1 <= 30))))

But if I set enable_sort = off,  we can still get the previous plan, which proves that
this patch doesn't make the above path unavailable, it is just not cheaper than
the new one.  Here is the new commit messages:

commit e0a7838a09e73f831eecb23b5e7884cc34d71301
Author: David Rowley <dgrowleyml@gmail.com>
Date:   Tue Feb 1 20:56:40 2022 +0800

    Introudce ec_filters in EquivalenceClass struct, the semantics is the quals can

    be applied to any EquivalenceMember in this EC. Later this information is used
    to generate new RestrictInfo and was distributed to related RelOptInfo very
    soon.

    Author: David Rowley at 2015-12 [1]
    Andy Fan rebase this patch to current latest code.

    https://www.postgresql.org/message-id/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A%40mail.gmail.com

commit 73f52d0909374446cd689457f0a4ef52addb035e
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date:   Tue Feb 1 14:54:07 2022 +0800

    After distributing the new derived RestrictInfo into RelOptInfo, then the rows
    estimation is wrong at the joinrel part. The reason is well described at [1] and
    [2],  To fix this issue, I added a new field "EquivalenceClass *derived" in
    RestrictInfo struct to indicate how this qual is generated. we would ignore such
    qual during estimate of the rows size. All the set_xx_size should be taken care of, but
    for now, just set_plain_rel_size is taken care of for the PoC purpose.

    [1]
    https://www.postgresql.org/message-id/flat/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A%40mail.gmail.com
    [2]
    https://www.postgresql.org/message-id/flat/1727507.1620948117%40sss.pgh.pa.us#52ac3f46cf614acb0bdbddb7128f5bd2

commit 8439b4818410d860a4ca4be3458b54c04c6f8648
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date:   Tue Feb 1 15:20:10 2022 +0800

    Introduce RelOptInfo.filtered_rows.

    Previously the Path.rows (shown in the explain output) and RelOptInfo.rows
    which would be used to calculating joinrel's estimated rows are same
    at many scan paths, like SeqScan, IndexScan, BitmapHeapScan and so on. But
    they would be different after distributing a new restrictinfo from ec_filter.
    So I developed RelOptInfo.filtered_rows to take some duty out of RelOptInfo.rows.

commit 11b3395bb5bcc4a2bcff6fed8078dbbf3cda81b1
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date:   Tue Feb 1 17:37:27 2022 +0800

    Remove duplicated qual executing for executor.

    Take the SELECT * FROM t1, t2 WHERE t1.a = t2.a and t2.a > 3 for example,
    we can derive t1.a > 3 with EC filter infrastructure.  However if it generate a
    plan like below, the new generated qual does not deserve to execute.

    Nest Loop
       Seq Scan (t1.a > 3)
       Index Scan t2_a
          (a = t1.a) (t2.a > 3)

    This patch removes the "t2.a > 3" for the above case.

commit 2875a76136293589b6e409cb6be4defab87ade59
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date:   Wed Feb 2 11:54:24 2022 +0800

    Support ScalarArrayOpExpr and perudoconstant on ef_filter.

commit a4b21ab6fd0fd57902f5471ec962a77b59085158 (HEAD -> cf_v4)
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date:   Wed Feb 2 11:59:53 2022 +0800

    Added the testcase for this feature and fix the previous test case

    as well. The new added test case needs outputting some runtime
    statistics, which will probably be different at each run. I can think
    of a way to make the test case stable if the patchsets are not wrong
    at the first step.


--
Best Regards
Andy Fan

pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Design of pg_stat_subscription_workers vs pgstats
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: RFC: Logging plan of the running query