Thread: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Dmitry Astapov <dastapov@gmail.com> writes: > I am trying to understand the behaviour of the query planner regarding the > push-down of the conditions "through" the join. I think your mental model is wrong. What's actually happening here is that the planner uses equivalence classes to deduce implied conditions. That is, we have the join condition a.adate = b.bdate and then you've added the where condition a.adate = '2021-05-12'. Transitivity implies that b.bdate = '2021-05-12', so we deduce that condition and are able to apply it at the relation scan of b. Furthermore, having restricted both a.adate and b.bdate to the same constant value at the scan level, we no longer need to apply the join condition a.adate = b.bdate at all. This is important not only to avoid the (probably minor) inefficiency of rechecking the join condition, but because if we believed that all three conditions were independently applicable, we'd come out with a serious underestimate of the size of the join result. > In my experiments, I was never able to get an execution plan that "pushes > down" any condition apart from (=) through to the right side of the join, None of the argument sketched above works for non-equality conditions. There are some situations where you could probably figure out how to use transitivity to deduce some implied condition, but cleaning things up so that you don't have redundant conditions fouling up the join size estimates seems like a hard problem. Another issue is that we could easily expend a lot of cycles on deductions that lead nowhere, because once you try to open up the mechanism to consider operators other than equality, there will be a lot of things that it looks at and then fails to do anything with. The equivalence class mechanism is tied into the same logic that considers merge and hash joins, so we are expending lots of cycles anytime we see an equality operator, and not so much for other operators. > Equally surprising is that I was unable to find documentation or past > mailing list discussions of this or similar topic, which leads me to > believe that I am just not familiar with the proper terminology and can't > come up with the right search terms. src/backend/optimizer/README has a discussion of equivalence classes. regards, tom lane
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Dmitry Astapov <dastapov@gmail.com> writes:
> I am trying to understand the behaviour of the query planner regarding the
> push-down of the conditions "through" the join.
I think your mental model is wrong. What's actually happening here is
that the planner uses equivalence classes to deduce implied conditions.
That is, we have the join condition a.adate = b.bdate and then you've
added the where condition a.adate = '2021-05-12'. Transitivity implies
that b.bdate = '2021-05-12', so we deduce that condition and are able
to apply it at the relation scan of b. Furthermore, having restricted
both a.adate and b.bdate to the same constant value at the scan level,
we no longer need to apply the join condition a.adate = b.bdate at all.
This is important not only to avoid the (probably minor) inefficiency
of rechecking the join condition, but because if we believed that all
three conditions were independently applicable, we'd come out with a
serious underestimate of the size of the join result.
> In my experiments, I was never able to get an execution plan that "pushes
> down" any condition apart from (=) through to the right side of the join,
None of the argument sketched above works for non-equality conditions.
There are some situations where you could probably figure out how to
use transitivity to deduce some implied condition, but cleaning things
up so that you don't have redundant conditions fouling up the join
size estimates seems like a hard problem.
> Equally surprising is that I was unable to find documentation or past
> mailing list discussions of this or similar topic, which leads me to
> believe that I am just not familiar with the proper terminology and can't
> come up with the right search terms.
src/backend/optimizer/README has a discussion of equivalence classes.
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Dmitry Astapov <dastapov@gmail.com> writes: > Am I right in thinking that elimination the join condition is actually > quite important part of the process? > Could it possibly be the main reason for =ANY/(x IN (..)) not to be > optimized the same way? Yup. > Is it still hard when one thinks about =ANY or (column in (val1, val2, > val3, ...)) as well? Yeah. For instance, if you have WHERE a = b AND a IN (1,2,3) then yes, you could deduce "b IN (1,2,3)", but this would not give you license to drop the "a = b" condition. So now you have to figure out what the selectivity of that is after the application of the partially redundant IN clauses. I recall somebody (David Rowley, maybe? Too lazy to check archives.) working on this idea awhile ago, but he didn't get to the point of a committable patch. regards, tom lane
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
On Fri, 14 May 2021 at 11:22, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I recall somebody (David Rowley, maybe? Too lazy to check archives.) > working on this idea awhile ago, but he didn't get to the point of > a committable patch. Yeah. Me. The discussion is in [1]. David [1] https://www.postgresql.org/message-id/flat/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A%40mail.gmail.com
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
So now you have to figure out
what the selectivity of that is after the application of the partially
redundant IN clauses.
* If the clause is marked redundant, always return 1.0.
*/
if (rinfo->norm_selec > 1)
return (Selectivity) 1.0;
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
On Mon, 17 May 2021 at 14:52, Andy Fan <zhihui.fan1213@gmail.com> wrote: > Would marking the new added RestrictInfo.norm_selec > 1 be OK? There would be cases you'd want to not count the additional clauses in the selectivity estimation and there would be cases you would want to. For example: SELECT ... FROM t1 INNER JOIN t2 ON t1.dt = t2.dt WHERE t1.dt BETWEEN 'date1' AND 'date2'; If you derived that t2.dt is also BETWEEN 'date1' AND 'date2' then you'd most likely want to include those quals for scans feeding merge, hash and non-parameterized nested loop joins, so you'd also want to count them in your selectivity estimations, else you'd feed junk values into the join selectivity estimations. Parameterized nested loop joins might be different as if you were looping up an index for t1.dt values on some index on t2.dt, then you'd likely not want to bother also filtering out the between clause values too. They're redundant in that case. I imagined we'd have some functions in equivclass.c that allows you to choose if you wanted the additional filters or not. Tom's example, WHERE a = b AND a IN (1,2,3), if a and b were in the same relation then you'd likely never want to include the additional quals. The only reason I could think that it would be a good idea is if "b" had an index but "a" didn't. I've not checked the code, but the index matching code might already allow that to work anyway. David
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
On Mon, 17 May 2021 at 14:52, Andy Fan <zhihui.fan1213@gmail.com> wrote:
> Would marking the new added RestrictInfo.norm_selec > 1 be OK?
There would be cases you'd want to not count the additional clauses in
the selectivity estimation and there would be cases you would want to.
For example:
SELECT ... FROM t1 INNER JOIN t2 ON t1.dt = t2.dt WHERE t1.dt BETWEEN
'date1' AND 'date2';
If you derived that t2.dt is also BETWEEN 'date1' AND 'date2' then
you'd most likely want to include those quals for scans feeding merge,
hash and non-parameterized nested loop joins, so you'd also want to
count them in your selectivity estimations, else you'd feed junk
values into the join selectivity estimations.
Parameterized nested loop joins might be different as if you were
looping up an index for t1.dt values on some index on t2.dt, then
you'd likely not want to bother also filtering out the between clause
values too. They're redundant in that case.
I imagined we'd have some functions in equivclass.c that allows you to
choose if you wanted the additional filters or not.
Tom's example, WHERE a = b AND a IN (1,2,3), if a and b were in the
same relation then you'd likely never want to include the additional
quals. The only reason I could think that it would be a good idea is
if "b" had an index but "a" didn't. I've not checked the code, but
the index matching code might already allow that to work anyway.
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
On Fri, 14 May 2021 at 11:22, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I recall somebody (David Rowley, maybe? Too lazy to check archives.)
> working on this idea awhile ago, but he didn't get to the point of
> a committable patch.
Yeah. Me. The discussion is in [1].
David
[1] https://www.postgresql.org/message-id/flat/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A%40mail.gmail.com
set enable_mergejoin to off;
set enable_seqscan to on;
regression=# explain analyze select * from tenk1 a, tenk1 b where a.thousand = b.thousand and a.thousand < 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=27.14..1090.67 rows=10740 width=488) (actual time=0.404..15.006 rows=10000 loops=1)
-> Bitmap Heap Scan on tenk1 b (cost=26.84..385.26 rows=10000 width=244) (actual time=0.350..1.419 rows=1000 loops=1)
Recheck Cond: (thousand < 100)\
Heap Blocks: exact=324
-> Bitmap Index Scan on tenk1_thous_tenthous (cost=0.00..24.34 rows=1074 width=0) (actual time=0.238..0.240 rows=1000 loops=1)
Index Cond: (thousand < 100)
-> Memoize (cost=0.30..0.47 rows=1 width=244) (actual time=0.002..0.006 rows=10 loops=1000)
Cache Key: b.thousand
Cache Mode: logical
Hits: 900 Misses: 100 Evictions: 0 Overflows: 0 Memory Usage: 277kB
-> Index Scan using tenk1_thous_tenthous on tenk1 a (cost=0.29..0.46 rows=1 width=244) (actual time=0.010..0.032 rows=10 loops=100)
Index Cond: ((thousand = b.thousand) AND (thousand < 100))
Planning Time: 2.459 ms
Execution Time: 15.964 ms
(14 rows)
This is because RelOptInfo->rows is not just used to calculate the joinrel.rows but also be used to
show the set Path.rows at many places. I can't think of a better way than adding a new filtered_rows
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=24.90..459.16 rows=10740 width=488) (actual time=0.440..16.966 rows=10000 loops=1)
-> Bitmap Heap Scan on tenk1 b (cost=24.61..383.03 rows=1074 width=244) (actual time=0.383..1.546 rows=1000 loops=1)
Recheck Cond: (thousand < 100)
Heap Blocks: exact=324
-> Bitmap Index Scan on tenk1_thous_tenthous (cost=0.00..24.34 rows=1074 width=0) (actual time=0.270..0.272 rows=1000 loops=1)
Index Cond: (thousand < 100)
-> Memoize (cost=0.30..0.47 rows=1 width=244) (actual time=0.002..0.008 rows=10 loops=1000)
Cache Key: b.thousand
Cache Mode: logical
Hits: 900 Misses: 100 Evictions: 0 Overflows: 0 Memory Usage: 277kB
-> Index Scan using tenk1_thous_tenthous on tenk1 a (cost=0.29..0.46 rows=1 width=244) (actual time=0.012..0.050 rows=10 loops=100)
Index Cond: ((thousand = b.thousand) AND (thousand < 100))
Planning Time: 2.578 ms
Execution Time: 17.929 ms
(14 rows)
There is something wrong below.
Index Cond: ((thousand = b.thousand) AND (thousand < 100))
it is hard for planners to distinguish which quals deserves adding or not. Instead
I just removed the quals execution during create_plan stage to remove the obviously
duplicated qual executions. I only handled the case that the derived quals is executed
at the same time with the restrinctInfo who's parent_ec is used to generate the
derived quals. If I understand the RestrictInfo.parent_ec correctly, The cost of
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=10000000000.30..10000002799.78 rows=20020 width=488) (actual time=0.051..26.080 rows=20000 loops=1)
-> Seq Scan on tenk1 a (cost=10000000000.00..10000000470.00 rows=1001 width=244) (actual time=0.018..3.902 rows=1000 loops=1)
Filter: (thousand < 100)
Rows Removed by Filter: 9000
-> Memoize (cost=0.30..3.18 rows=20 width=244) (actual time=0.002..0.008 rows=20 loops=1000)
Cache Key: a.thousand
Cache Mode: logical
Hits: 900 Misses: 100 Evictions: 0 Overflows: 0 Memory Usage: 546kB
-> Index Scan using d_tenk2_thousand_idx on d_tenk2 b (cost=0.29..3.17 rows=20 width=244) (actual time=0.008..0.037 rows=20 loops=100)
Index Cond: (thousand = a.thousand)
Planning Time: 0.596 ms
Execution Time: 27.502 ms
(12 rows)
select 'create table p_' || i || ' partition of p for values from (' || (i-1) * 100000 || ') to (' || i * 100000 || ');' from generate_series(1, 50)i; \gexec
insert into p select i, i from generate_series(1, 50 * 100000 -1) i;
create index on p(a);
create table q (a int, b int) partition by range(a);
select 'create table q_' || i || ' partition of q for values from (' || (i-1) * 100000 || ') to (' || i * 100000 || ');' from generate_series(1, 50)i; \gexec
insert into q select * from p;
create index on q(a);
select * from p, q where p.a = q.a and p.a in (3, 200000);
|--------------+-------------------+------------------|
| Prepared | 0.25 ms | 0.8 ms |
| Non Prepared | 0.890 ms | 4.207 ms |
Attachment
- v1-0003-introduce-RelOptInfo.filtered_rows.patch
- v1-0001-Rebaee-David-s-patch-against-the-latest-code.patch
- v1-0005-Support-ScalarArrayOpExpr-and-perudoconstant-on-e.patch
- v1-0004-remove-duplicated-qual-executing.patch
- v1-0002-support-set_xxx_size-with-derived_clauses-ignored.patch
- v1-0006-adding-some-test-cases-for-this-feature-and-fix-t.patch
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
> Subject: [PATCH v1 1/6] Rebaee David's patch against the latest code. If you use git-am, then the author/commit information is preserved. It's probably good to include a link to the patch in any case. > Subject: [PATCH v1 4/6] remove duplicated qual executing. --- src/backend/optimizer/path/equivclass.c | 22 +++++++++++++++++++ src/backend/optimizer/plan/createplan.c | 29 +++++++++++++++++++++++-- src/include/optimizer/paths.h | 2 ++ src/test/regress/parallel_schedule | 2 ++ 4 files changed, 53 insertions(+), 2 deletions(-) I think the ./ec_filter test is missing from from this patch. > Subject: [PATCH v1 6/6] adding some test cases for this feature and fix the existing case The tests should be updated with the corresponding patches. It's common for the patches to be commited separately, like if 0001 is ready but the others are still evolving. I'm not sure whether you think this patch is ready to be added to a commitfest, but do you know about the CI infrastructure ? It allows running all the cfbot tests for a github branch against 4 OS, which helps catch portability issues, including memory errors and unstable explain output. See: src/tools/ci/README. There's an failure in postgres_fdw, probably the output needs to be updated. -- Justin
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
> Subject: [PATCH v1 1/6] Rebaee David's patch against the latest code.
If you use git-am, then the author/commit information is preserved.
It's probably good to include a link to the patch in any case.
message for this and link to the origin discussion link.)
> Subject: [PATCH v1 4/6] remove duplicated qual executing.
---
src/backend/optimizer/path/equivclass.c | 22 +++++++++++++++++++
src/backend/optimizer/plan/createplan.c | 29 +++++++++++++++++++++++--
src/include/optimizer/paths.h | 2 ++
src/test/regress/parallel_schedule | 2 ++
4 files changed, 53 insertions(+), 2 deletions(-)
I think the ./ec_filter test is missing from from this patch.
> Subject: [PATCH v1 6/6] adding some test cases for this feature and fix the existing case
The tests should be updated with the corresponding patches. It's common for
the patches to be commited separately, like if 0001 is ready but the others are
still evolving.
review/discussion. they are unlikely to be able to commit separately. so I keep
I'm not sure whether you think this patch is ready to be added to a commitfest,
but do you know about the CI infrastructure ? It allows running all the cfbot
tests for a github branch against 4 OS, which helps catch portability issues,
including memory errors and unstable explain output. See: src/tools/ci/README.
There's an failure in postgres_fdw, probably the output needs to be updated.
Output: ft5.*, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2
Relations: (public.ft5) INNER JOIN (public.ft4)
Remote SQL: SELECT CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1.c1, r1.c2, r1.c3) END, r1.c1, r1.c2, r1.c3, r2.c1, r2.c2 FROM ("S 1"."T 4" r1 INNER JOIN "S 1"."T 3" r2 ON (((r1.c1 = r2.c1)) AND ((r2.c1 >= 10)) AND ((r2.c1 <= 30)))) ORDER BY r1.c1 ASC NULLS LAST
(4 rows)
Output: ft5.*, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2
Sort Key: ft5.c1
-> Foreign Scan (cost=100.00..107.92 rows=7 width=62)
Output: ft5.*, ft5.c1, ft5.c2, ft5.c3, ft4.c1, ft4.c2
Relations: (public.ft5) INNER JOIN (public.ft4)
Remote SQL: SELECT CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1.c1, r1.c2, r1.c3) END, r1.c1, r1.c2, r1.c3, r2.c1, r2.c2 FROM ("S 1"."T 4" r1 INNER JOIN "S 1"."T 3" r2 ON (((r1.c1 = r2.c1)) AND ((r2.c1 >= 10)) AND ((r2.c1 <= 30)) AND ((r1.c1 >= 10)) AND ((r1.
c1 <= 30))))
Author: David Rowley <dgrowleyml@gmail.com>
Date: Tue Feb 1 20:56:40 2022 +0800
Introudce ec_filters in EquivalenceClass struct, the semantics is the quals can
be applied to any EquivalenceMember in this EC. Later this information is used
to generate new RestrictInfo and was distributed to related RelOptInfo very
soon.
Author: David Rowley at 2015-12 [1]
Andy Fan rebase this patch to current latest code.
https://www.postgresql.org/message-id/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A%40mail.gmail.com
commit 73f52d0909374446cd689457f0a4ef52addb035e
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Tue Feb 1 14:54:07 2022 +0800
After distributing the new derived RestrictInfo into RelOptInfo, then the rows
estimation is wrong at the joinrel part. The reason is well described at [1] and
[2], To fix this issue, I added a new field "EquivalenceClass *derived" in
RestrictInfo struct to indicate how this qual is generated. we would ignore such
qual during estimate of the rows size. All the set_xx_size should be taken care of, but
for now, just set_plain_rel_size is taken care of for the PoC purpose.
[1]
https://www.postgresql.org/message-id/flat/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/flat/1727507.1620948117%40sss.pgh.pa.us#52ac3f46cf614acb0bdbddb7128f5bd2
commit 8439b4818410d860a4ca4be3458b54c04c6f8648
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Tue Feb 1 15:20:10 2022 +0800
Introduce RelOptInfo.filtered_rows.
Previously the Path.rows (shown in the explain output) and RelOptInfo.rows
which would be used to calculating joinrel's estimated rows are same
at many scan paths, like SeqScan, IndexScan, BitmapHeapScan and so on. But
they would be different after distributing a new restrictinfo from ec_filter.
So I developed RelOptInfo.filtered_rows to take some duty out of RelOptInfo.rows.
commit 11b3395bb5bcc4a2bcff6fed8078dbbf3cda81b1
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Tue Feb 1 17:37:27 2022 +0800
Remove duplicated qual executing for executor.
Take the SELECT * FROM t1, t2 WHERE t1.a = t2.a and t2.a > 3 for example,
we can derive t1.a > 3 with EC filter infrastructure. However if it generate a
plan like below, the new generated qual does not deserve to execute.
Nest Loop
Seq Scan (t1.a > 3)
Index Scan t2_a
(a = t1.a) (t2.a > 3)
This patch removes the "t2.a > 3" for the above case.
commit 2875a76136293589b6e409cb6be4defab87ade59
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Wed Feb 2 11:54:24 2022 +0800
Support ScalarArrayOpExpr and perudoconstant on ef_filter.
commit a4b21ab6fd0fd57902f5471ec962a77b59085158 (HEAD -> cf_v4)
Author: Andy Fan <yizhi.fzh@alibaba-inc.com>
Date: Wed Feb 2 11:59:53 2022 +0800
Added the testcase for this feature and fix the previous test case
as well. The new added test case needs outputting some runtime
statistics, which will probably be different at each run. I can think
of a way to make the test case stable if the patchsets are not wrong
at the first step.
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Hi, there's been an interesting case [1] of a slow query on pgsql-general, related to the topic discussed in this thread. It causes an order the query to run slower by multiple orders of magnitude, and I think it's interesting, so let me present a simple example demonstrating it. ------------------------------------------------------------------------ create table t1 (a int); create table t2 (a int); insert into t1 select i from generate_series(1,100000) s(i); insert into t2 select mod(i,100000) from generate_series(1,10000000) s(i); create index on t1(a); create index on t2(a); vacuum analyze t1, t2; -- we need to force mergejoin set enable_nestloop = off; ------------------------------------------------------------------------ Now, let's run a simple query: SELECT t1.a, t2.a FROM t1 JOIN t2 USING (a) WHERE (t1.a > 99000) ORDER BY t1.a LIMIT 100; QUERY PLAN ------------------------------------------------------------------------ Limit (cost=4.63..224.57 rows=100 width=8) (actual time=8999.487..8999.707 rows=100 loops=1) -> Merge Join (cost=4.63..209447.97 rows=95226 width=8) (actual time=8999.485..8999.620 rows=100 loops=1) Merge Cond: (t1.a = t2.a) -> Index Only Scan using t1_a_idx on t1 (cost=0.29..29.25 rows=969 width=4) (actual time=0.010..0.011 rows=1 loops=1) Index Cond: (a > 99000) Heap Fetches: 0 -> Index Only Scan using t2_a_idx on t2 (cost=0.43..183464.09 rows=9999977 width=4) (actual time=0.026..4594.757 rows=9900200 loops=1) Heap Fetches: 0 Planning Time: 0.338 ms Execution Time: 8999.768 ms (10 rows) Now, let's do a simple trick and add condition on t2.a, "implied" by the join condition (t1.a = t2.a) and inequality (t1.a > 99000). SELECT t1.a, t2.a FROM t1 JOIN t2 USING (a) WHERE (t1.a > 99000) AND (t2.a > 99000) ORDER BY t1.a LIMIT 100; QUERY PLAN ------------------------------------------------------------------------ Limit (cost=0.77..250.39 rows=100 width=8) (actual time=0.040..0.294 rows=100 loops=1) -> Merge Join (cost=0.77..2297.23 rows=920 width=8) (actual time=0.039..0.172 rows=100 loops=1) Merge Cond: (t1.a = t2.a) -> Index Only Scan using t1_a_idx on t1 (cost=0.29..29.25 rows=969 width=4) (actual time=0.031..0.031 rows=1 loops=1) Index Cond: (a > 99000) Heap Fetches: 0 -> Index Only Scan using t2_a_idx on t2 (cost=0.43..2014.87 rows=96596 width=4) (actual time=0.005..0.052 rows=100 loops=1) Index Cond: (a > 99000) Heap Fetches: 0 Planning Time: 0.222 ms Execution Time: 0.414 ms (11 rows) Well, that's quite a difference! From 9000ms to 1ms, pretty good. What is happening in the first plan is the merge join needs t2 sorted by t2.a, and the index-only-scan looks like a great way to do that, as it has low startup cost (because LIMIT likes that). But this completely misses that (t1.a > 9900) implies (t2.a > 9900) through the equality in join condition. So we start scanning t2_a_idx, only to throw the first 99% of tuples away. In the original report this is particularly egregious, because the index only scan looks like this: -> Index Only Scan using data_class_pkey on data_class ta (cost=0.57..4935483.78 rows=216964862 width=8) (actual time=0.018..35022.908 rows=151321889 loops=1) Heap Fetches: 151321889 So yeah, 151M heap fetches, that's bound to be expensive :-/ Adding the condition on t2.a allows just skipping the first chunk of the index, eliminating the expensive part. Of course, this breaks the estimates in the faster query, because we now apply the condition twice - once for the index scan, one as the join clause. So instead of ~100k rows the join is estimated as ~1000 rows. I'm also not claiming this is 100% worth it - queries with a suitable combination of clauses (conditions on the join keys) seems rather uncommon. But it seems like an interesting example, because it may be seen either as missed execution optimization (failing to skip the initial chunk of rows), or an costing issue due to not accounting for having to process the rows (which would likely result in picking a different plan). regards [1] https://www.postgresql.org/message-id/CA%2B1Wm9U_sP9237f7OH7O%3D-UTab71DWOO4Qc-vnC78DfsJQBCwQ%40mail.gmail.com -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
I'm also not claiming this is 100% worth it - queries with a suitable
combination of clauses (conditions on the join keys) seems rather
uncommon.
> apply the condition twice - once for the index scan, one as the join
> clause. So instead of ~100k rows the join is estimated as ~1000 rows.
SET
postgres=# explain analyze SELECT t1.a, t2.a FROM t1 JOIN t2 USING (a)
WHERE (t1.a > 99000) and t2.a > 99000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=0.73..2408.37 rows=990 width=8)
(actual time=0.032..21.350 rows=99900 loops=1)
Merge Cond: (t1.a = t2.a)
-> Index Only Scan using t1_a_idx on t1 (cost=0.29..29.64 rows=991 width=4)
(actual time=0.014..0.121 rows=1000 loops=1)
Index Cond: (a > 99000)
Heap Fetches: 0
-> Index Only Scan using t2_a_idx on t2 (cost=0.43..2113.20 rows=101301 width=4)
(actual time=0.013..9.854 rows=99900 loops=1)
Index Cond: (a > 99000)
Heap Fetches: 0
Planning Time: 0.282 ms
Execution Time: 24.823 ms
(10 rows)
postgres=# set geqo to on; -- enable this feature and let planner derive the qual by itself, the estimation
SET
postgres=# explain analyze SELECT t1.a, t2.a FROM t1 JOIN t2 USING (a)
WHERE (t1.a > 99000) ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=0.73..2408.37 rows=97680 width=8)
(actual time=0.031..21.296 rows=99900 loops=1)
Merge Cond: (t1.a = t2.a)
-> Index Only Scan using t1_a_idx on t1 (cost=0.29..29.64 rows=991 width=4)
(actual time=0.014..0.116 rows=1000 loops=1)
Index Cond: (a > 99000)
Heap Fetches: 0
-> Index Only Scan using t2_a_idx on t2 (cost=0.43..2113.20 rows=101301 width=4)
(actual time=0.012..9.751 rows=99900 loops=1)
Index Cond: (a > 99000)
Heap Fetches: 0
Planning Time: 0.269 ms
Execution Time: 24.749 ms
(10 rows)
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
> Of course, this breaks the estimates in the faster query, because we now
> apply the condition twice - once for the index scan, one as the join
> clause. So instead of ~100k rows the join is estimated as ~1000 rows.I think my patch has addressed this. Here is the example: ...
So I think knowing what bad it is to have this feature is the key point to discussion now.
Attachment
- v2-0005-Support-ScalarArrayOpExpr-and-perudoconstant-on-e.patch
- v2-0003-Introduce-RelOptInfo.filtered_rows.patch
- v2-0002-After-distributing-the-new-derived-RestrictInfo-i.patch
- v2-0004-Remove-duplicated-qual-executing-for-executor.patch
- v2-0001-Introudce-ec_filters-in-EquivalenceClass-struct-t.patch
- v2-0006-Added-the-testcase-for-this-feature-and-fix-the-p.patch
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
So I think knowing what bad it is to have this feature is the key point to discussion now.
> is because, first of all, it's possible that enforcing the condition
> at a particular table costs more to filter out the rows that we save
> in execution time at higher levels of the plan tree. For example,
> consider A JOIN B ON A.X = B.X WHERE A.X > 1000000. It might be that
> the range of A.X is [0,1000001] but the range of B.X is
> [1000000,2000000]; so enforcing the inequality against A is very
> selective but enforcing it against B filters out basically nothing.
> where it's believed to be most selective, equivalence-class column
> selectivity is thought to be above some threshold ... but I'm not sure
> this is very principled,
>> enforcing the inequality multiple times is definitely bad: for
>> example, if we've got a nested loop where the outer side is a seq scan
>> that enforces the condition and the inner side is an index probe, it
>> is just a waste to retest it on the inner side. We already know that
>> the outer row passes the inequality, so the inner row will necessarily
>> pass also. This doesn't apply to merge or hash joins, and it also
>> doesn't apply to all nested loops: scans that aren't paramaterized by
>> the equivalence-class column can still benefit from separate
>> enforcement of the inequality.
>>
> I guess that could be fixed by somehow marking these pushed quals as
> optional and having parameterised scans ignore optional quals.
This has been done by committing 4.
> optimization. Lots of people have asked for it, and I think it would
> be worth expending some brainpower to try to figure out a way to be
> smarter than we are now, which is, in a nutshell, as dumb as possible.
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
So I think knowing what bad it is to have this feature is the key point to discussion now.
> While I've only read your description of the patch not the patch itself,
This comment applies to me also.
Is the join selectivity properly calculated in all cases, e.g. in the n:m join case in particular, or in general when you’re not joining to a unique key? (this would be the usual situation here, since it adds a range qual to a join qual)
>> Furthermore, there are some cases involving parameterized paths where
>> enforcing the inequality multiple times is definitely bad
- This has been done by committing 4.
What remaining cases are there where the qual is evaluated redundantly?
- Anyone still have interest in this? Or is a better solution really possible?
Or is the current method too bad to rescue?
As you’ve shown, this can potentially be very important, though I don’t think you’ll often see equijoins with an additional range restriction on the join keys. When it happens, though, it could be especially important for joins to partitioned tables with many remote fdw partitions when the join can’t be pushed down to the remote server.
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
On Tue, Feb 1, 2022 at 10:08 AM Andy Fan <zhihui.fan1213@gmail.com> wrote: > To address the row estimation issue, The most straightforward way to fix this is to > ignore the derived clauses when figuring out the RelOptInfo->rows on base relation. > To note which clause is derived from this patch, I added a new field "EquivalenceClass * > derived" in RestrictInfo. and then added a included_derived option in clauselist_selectivity_ext, > during the set_xxx_rel_size function, we can pass the included_derived=false. This strategy > should be used in get_parameterized_baserel_size. In all the other cases, include_derived=true > is used. which are finished in commit 2. (Commit 1 is Daivd's patch, I just rebased it) That doesn't sound correct to me. Suppose that we have A.x = B.x and also A.x < 42. We can choose to enforce A.x < 42 or we can choose to enforce B.x < 42 or we can do both. In general, any of those could be right: if either one of those two is highly selective while the other is not very selective at all, it's going to be fastest to enforce only the more selective qual. But if both are selective then it may be best to enforce both, so let's suppose we do that. If we don't adopt the proposal above and just do nothing, then our row count estimates for both A and B will include the effect of checking x < 42, and so they will be correct, but the row count estimate for join(A, B) will include the effect of checking x < 42 twice, and so it will be too low, which can mess up the plan at higher levels. But discounting the effect of B.x < 42 when estimating the size of B is also incorrect. Now, the row count estimate for join(A, B) will include the effect of x < 42 only once, which is good. However, the row count estimate for B will be too high, because it will not include the effect of B.x < 42. And that means that the cost estimate for join(A, B) will be wrong. It will be too high, because it's going to think that it has more rows coming from the B side of the join than what is actually the case. And that can also mess up the plan at higher levels. I think we could get around this problem by having multiple RelOptInfos (or something similar that is lighter-weight) for each relation. Today, we'd create a RelOptInfo for A, one for B, and one for join(A, B), and the paths for the join are created by joining a path for A to a path for B. Now imagine that we have instead 5 RelOptInfos, for {A}, {A|x<42}, {B}, {B|x<42}, and join(A, B). The legal paths for that last one can be created by joining {A} to {B|x<42} or {A|x<42} to {B} or {A|x<42} to {B|x<42}. Each of those 5 RelOptInfos can have its own cardinality estimate, and it seems pretty straightforward to see how to get both the scan cardinality and the join cardinality correct. Now I think this is decidedly non-trivial to implement, and I also hear the voice of Tom Lane saying that's going to be expensive in both time and memory, and he's not wrong. On the other hand, I completely agree with David's comments on the other thread to the effect that holding our breath is not getting us anywhere. People don't keep asking for this feature because it's a stupid thing that nobody really wants, and when Tom alleges that it will rarely pay off, I think he's pretty far off the mark. The only time we need to consider doing any extra work is when we have something like the example discussed here, namely A.x = B.x and A.x < 42. If there is a variable that is part of an equivalence class and also is used in a scan qual, what are the chances that the implied inequality is useful? There's no way to estimate that mathematically - it's all about what you think human beings are typically going to do - but I'd say it's probably better than 50%. I know that when I was regularly doing application programming on top of PostgreSQL I was VERY aware of this limitation of the optimizer and habitually thought about which table to write the inequality against. That kept me out of trouble most of the time, but it sure seems like we're punting the optimizer's job to the end user. And even then, I still sometimes couldn't stay out of trouble, because sometimes I knew that the implied inequality really ought to be enforced against both sides of the join to get a decent plan. In that case, the only way to get the optimizer to do what I wanted was to duplicate the qual. But that runs headlong into the exact problem that we're talking about here: now the join selectivity is going to be messed up, and then some other part of the plan would get messed up. I still remember the frustration associated with that scenario more than 10 years later. You can't even fix it by uglifying your query with a planner hint, because we don't support those either. Which brings me to another point: it's incoherent to simultaneously argue that we shouldn't have planner hints but rather focus on improving the planner, and at the same time refuse to improve the planner because it would make planning too expensive. I actually think we should do both, because I neither believe that it's impossible to fix this particular problem nor that it is possible to create a planner so good that it always makes the right decisions without any explicit input from a human being. But the only way you can think this problem is unfixable and at the same time think we don't need hints is if you think this problem is fake. It's not. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
On 2/17/22 21:15, Robert Haas wrote: > On Tue, Feb 1, 2022 at 10:08 AM Andy Fan <zhihui.fan1213@gmail.com> wrote: >> To address the row estimation issue, The most straightforward way to fix this is to >> ignore the derived clauses when figuring out the RelOptInfo->rows on base relation. >> To note which clause is derived from this patch, I added a new field "EquivalenceClass * >> derived" in RestrictInfo. and then added a included_derived option in clauselist_selectivity_ext, >> during the set_xxx_rel_size function, we can pass the included_derived=false. This strategy >> should be used in get_parameterized_baserel_size. In all the other cases, include_derived=true >> is used. which are finished in commit 2. (Commit 1 is Daivd's patch, I just rebased it) > > That doesn't sound correct to me. > > Suppose that we have A.x = B.x and also A.x < 42. We can choose to > enforce A.x < 42 or we can choose to enforce B.x < 42 or we can do > both. In general, any of those could be right: if either one of those > two is highly selective while the other is not very selective at all, > it's going to be fastest to enforce only the more selective qual. But > if both are selective then it may be best to enforce both, so let's > suppose we do that. If we don't adopt the proposal above and just do > nothing, then our row count estimates for both A and B will include > the effect of checking x < 42, and so they will be correct, but the > row count estimate for join(A, B) will include the effect of checking > x < 42 twice, and so it will be too low, which can mess up the plan at > higher levels. > > But discounting the effect of B.x < 42 when estimating the size of B > is also incorrect. Now, the row count estimate for join(A, B) will > include the effect of x < 42 only once, which is good. However, the > row count estimate for B will be too high, because it will not include > the effect of B.x < 42. And that means that the cost estimate for > join(A, B) will be wrong. It will be too high, because it's going to > think that it has more rows coming from the B side of the join than > what is actually the case. And that can also mess up the plan at > higher levels. > > I think we could get around this problem by having multiple > RelOptInfos (or something similar that is lighter-weight) for each > relation. Today, we'd create a RelOptInfo for A, one for B, and one > for join(A, B), and the paths for the join are created by joining a > path for A to a path for B. Now imagine that we have instead 5 > RelOptInfos, for {A}, {A|x<42}, {B}, {B|x<42}, and join(A, B). The > legal paths for that last one can be created by joining {A} to > {B|x<42} or {A|x<42} to {B} or {A|x<42} to {B|x<42}. Each of those 5 > RelOptInfos can have its own cardinality estimate, and it seems pretty > straightforward to see how to get both the scan cardinality and the > join cardinality correct. Now I think this is decidedly non-trivial to > implement, and I also hear the voice of Tom Lane saying that's going > to be expensive in both time and memory, and he's not wrong. > IMHO the whole problem is we're unable to estimate the join clause as a conditional probability, i.e. P(A.x = B.x | (A.x < 42) & (B.x < 42)) so maybe instead of trying to generate additional RelOptInfo items we should think about improving that. The extra RelOptInfos don't really solve this, because even if you decide to join A|x<42 to B|x<42 it does nothing to improve the join clause estimate. With equality clauses we don't have this issue, because if you derive clauses at the baserel level, the join clause becomes no-op with selecitivity 1.0. But for inequalities that does not work ... Interestingly enough, the patch [1] tries to do something like this by applying extended statistics to joins, and using baserestrictinfos as "conditions" for statistics on both sides. It actually deals with a more general form of this case, because the clauses don't need to reference the same attribute - so for example this would work too, assuming there is extended stats object on the columns on each side: P(A.c = B.d | (A.e < 42) & (B.f < 42)) [1] https://commitfest.postgresql.org/36/3055/ > On the other hand, I completely agree with David's comments on the > other thread to the effect that holding our breath is not getting us > anywhere. People don't keep asking for this feature because it's a > stupid thing that nobody really wants, and when Tom alleges that it > will rarely pay off, I think he's pretty far off the mark. The only > time we need to consider doing any extra work is when we have > something like the example discussed here, namely A.x = B.x and A.x < > 42. If there is a variable that is part of an equivalence class and > also is used in a scan qual, what are the chances that the implied > inequality is useful? There's no way to estimate that mathematically - > it's all about what you think human beings are typically going to do - > but I'd say it's probably better than 50%. I know that when I was > regularly doing application programming on top of PostgreSQL I was > VERY aware of this limitation of the optimizer and habitually thought > about which table to write the inequality against. That kept me out of > trouble most of the time, but it sure seems like we're punting the > optimizer's job to the end user. > Not sure. In my experience queries with both a join clause and other clauses referencing the same attribute are pretty rare. But I agree if we can do the expensive stuff only when actually needed, with no cost in the 99.999% other cases, I don't see why not. Of course, code complexity is a cost too. > And even then, I still sometimes couldn't stay out of trouble, because > sometimes I knew that the implied inequality really ought to be > enforced against both sides of the join to get a decent plan. In that > case, the only way to get the optimizer to do what I wanted was to > duplicate the qual. But that runs headlong into the exact problem that > we're talking about here: now the join selectivity is going to be > messed up, and then some other part of the plan would get messed up. I > still remember the frustration associated with that scenario more than > 10 years later. You can't even fix it by uglifying your query with a > planner hint, because we don't support those either. Which brings me > to another point: it's incoherent to simultaneously argue that we > shouldn't have planner hints but rather focus on improving the > planner, and at the same time refuse to improve the planner because it > would make planning too expensive. I actually think we should do both, > because I neither believe that it's impossible to fix this particular > problem nor that it is possible to create a planner so good that it > always makes the right decisions without any explicit input from a > human being. But the only way you can think this problem is unfixable > and at the same time think we don't need hints is if you think this > problem is fake. > IMHO to deal with the estimates it'd be enough to allow calculating conditional probabilities. No comment regarding hints ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
On Thu, Feb 17, 2022 at 4:17 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > IMHO the whole problem is we're unable to estimate the join clause as a > conditional probability, i.e. > > P(A.x = B.x | (A.x < 42) & (B.x < 42)) > > so maybe instead of trying to generate additional RelOptInfo items we > should think about improving that. The extra RelOptInfos don't really > solve this, because even if you decide to join A|x<42 to B|x<42 it does > nothing to improve the join clause estimate. I guess I hadn't considered that angle. I think the extra RelOptInfos (or whatever) actually do solve a problem, because enforcing a high-selectivity join qual against both sides is potentially quite wasteful, and you need some way to decide whether to do it on one side, the other, or both. But it's also true that I was wrong to assume independence ... and if we could avoid assuming that, then the join selectivity would work itself out without any of the machinery that I just proposed. > It actually deals with a more general form of this case, because the > clauses don't need to reference the same attribute - so for example this > would work too, assuming there is extended stats object on the columns > on each side: > > P(A.c = B.d | (A.e < 42) & (B.f < 42)) That'd be cool. > Not sure. In my experience queries with both a join clause and other > clauses referencing the same attribute are pretty rare. But I agree if > we can do the expensive stuff only when actually needed, with no cost in > the 99.999% other cases, I don't see why not. Of course, code complexity > is a cost too. Right. I mean, we could have a planner GUC to control whether the optimization is used even in cases where we see that it's possible. But Tom keeps arguing that it is possible in many queries and would benefit few queries, and I'm not seeing why that should be so. I think it's likely to benefit many of the queries to which it applies. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
On Tue, Feb 1, 2022 at 10:08 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
> To address the row estimation issue, The most straightforward way to fix this is to
> ignore the derived clauses when figuring out the RelOptInfo->rows on base relation.
> To note which clause is derived from this patch, I added a new field "EquivalenceClass *
> derived" in RestrictInfo. and then added a included_derived option in clauselist_selectivity_ext,
> during the set_xxx_rel_size function, we can pass the included_derived=false. This strategy
> should be used in get_parameterized_baserel_size. In all the other cases, include_derived=true
> is used. which are finished in commit 2. (Commit 1 is Daivd's patch, I just rebased it)
That doesn't sound correct to me.
Suppose that we have A.x = B.x and also A.x < 42. We can choose to
enforce A.x < 42 or we can choose to enforce B.x < 42 or we can do
both. In general, any of those could be right:
.., which is good. However, the
row count estimate for B will be too high, because it will not include
the effect of B.x < 42. And that means that the cost estimate for
join(A, B) will be wrong. It will be too high, because it's going to
think that it has more rows coming from the B side of the join than
what is actually the case. And that can also mess up the plan at
higher levels.
In commit 3, the real rows for the scan path are adjusted by RelOptInfo->filter_rows,
which take the effect of B.x < 42. It is used in cost_{ascan}_path lately.
Here is an example:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=24.90..459.16 rows=10740 width=488) (actual time=0.416..17.459 rows=10000 loops=1)
-> Bitmap Heap Scan on tenk1 b (cost=24.61..383.03 rows=1074 width=244) (actual time=0.369..1.801 rows=1000 loops=1)
Recheck Cond: (thousand < 100)
Heap Blocks: exact=324
-> Bitmap Index Scan on tenk1_thous_tenthous (cost=0.00..24.34 rows=1074 width=0) (actual time=0.251..0.251 rows=1000 loops=1)
Index Cond: (thousand < 100)
-> Memoize (cost=0.30..0.47 rows=1 width=244) (actual time=0.002..0.006 rows=10 loops=1000)
Cache Key: b.thousand
Cache Mode: logical
Hits: 900 Misses: 100 Evictions: 0 Overflows: 0 Memory Usage: 277kB
-> Index Scan using tenk1_thous_tenthous on tenk1 a (cost=0.29..0.46 rows=1 width=244) (actual time=0.012..0.033 rows=10 loops=100)
Index Cond: ((thousand = b.thousand) AND (thousand < 100))
Planning Time: 0.934 ms
Execution Time: 18.496 ms
(14 rows)
Recheck Cond: (thousand < 100)
Heap Blocks: exact=324
-> Bitmap Index Scan on tenk1_thous_tenthous (cost=0.00..24.34 rows=1074 width=0) (actual time=0.251..0.251 rows=1000 loops=1)
Index Cond: (thousand < 100)
Introduce RelOptInfo.filtered_rows.
Previously the Path.rows (shown in the explain output) and RelOptInfo.rows
(which would be used to calculating joinrel's estimated rows) are same
at many scan paths, like SeqScan, IndexScan, BitmapHeapScan and so on. But
they would be different after distributing a new restrictinfo from ec_filter.
So I developed RelOptInfo.filtered_rows to take some duty out of RelOptInfo.rows.
On the other hand, I completely agree with David's comments on the
other thread to the effect that holding our breath is not getting us
anywhere.
in the world, and mostly because the authority figures in this community
has set up a good example for the following people. and it is not
easy. But if the authority figures are too restrict with code quality, this
is not good for the community as well, we should encourage more people
to have a try, to some extent.
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
On Fri, Feb 18, 2022 at 12:56 AM Andy Fan <zhihui.fan1213@gmail.com> wrote: > What do you think about moving on this feature? The items known by me > are: 1). Make sure the estimation error can be fixed or discuss if my current > solution is workable. b). Just distribute some selectivity restrictinfo to > RelOptInfo. c). See how hard it is to treat the original / derived qual equally. > d). Reduce the extra planner cost at much as possible. Any other important > items I missed? I think it's not realistic to do anything here for PostgreSQL 15. Considering that it's almost the end of February and feature freeze will probably be in perhaps 5-6 weeks, in order to get something committed at this point, you would need to have (1) sufficient consensus on the design, (2) a set of reasonably complete patches implementing that design at an acceptable level of quality, and (3) a committer interested in putting in the necessary time to help you get this over the finish line. As far as I can see, you have none of those things. Tom doesn't think we need this at all, and you and I and Tomas all have somewhat different ideas on what approach we ought to be taking, and the patches appear to be at a POC level at this point rather than something that's close to being ready to ship, and no committer has expressed interest in trying to get them into this release. It seems to me that the thing to do here is see if you can build consensus on an approach. Just saying that we ought to think the patches you've already got are good enough is not going to get you anywhere. I do understand that the political element of this problem is frustrating to you, as it is to many people. But consider the alternative: suppose the way things worked around here is that any committer could commit anything they liked without needing the approval of any other committer, or even over their objections. Well, it would be chaos. People would be constantly reverting or rewriting things that other people had done, and everybody would probably be pissed off at each other all the time, and the quality would go down the tubes and nobody would use PostgreSQL any more. I'm not saying the current system is perfect, not at all. It's frustrating as all get out at times. But the reason it's frustrating is because the PostgreSQL community is a community of human beings, and there's nothing more frustrating in the world than the stuff other human beings do. However, it's equally true that we get further working together than we would individually. I think Tom is wrong about the merits of doing something in this area, but I also think he's incredibly smart and thoughtful and one of the best technologists I've ever met, and probably just one of the absolute best technologists on Planet Earth. And I also have to consider, and this is really important, the possibility that Tom is right about this issue and I am wrong. So far Tom hasn't replied to what I wrote, but I hope he does. Maybe he'll admit that I have some valid points. Maybe he'll tell me why he thinks I'm wrong. Maybe I'll learn about some problem that I haven't considered from his response, and maybe that will lead to a refinement of the idea that will make it better. I don't know, but it's certainly happened in plenty of other cases. And that's how PostgreSQL gets to be this pretty amazing database that it is. So, yeah, building consensus is frustrating and it takes a long time and sometimes it feels like other people are obstructing you needlessly and sometimes that's probably true. But there's not a realistic alternative. Nobody here is smart enough to create software that is as good as what all of us create together. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
On 2/17/22 23:16, Robert Haas wrote: > On Thu, Feb 17, 2022 at 4:17 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> IMHO the whole problem is we're unable to estimate the join clause as a >> conditional probability, i.e. >> >> P(A.x = B.x | (A.x < 42) & (B.x < 42)) >> >> so maybe instead of trying to generate additional RelOptInfo items we >> should think about improving that. The extra RelOptInfos don't really >> solve this, because even if you decide to join A|x<42 to B|x<42 it does >> nothing to improve the join clause estimate. > > I guess I hadn't considered that angle. I think the extra RelOptInfos > (or whatever) actually do solve a problem, because enforcing a > high-selectivity join qual against both sides is potentially quite > wasteful, and you need some way to decide whether to do it on one > side, the other, or both. But it's also true that I was wrong to > assume independence ... and if we could avoid assuming that, then the > join selectivity would work itself out without any of the machinery > that I just proposed. > True. We kinda already have this issue for the equality clauses, and having paths with the condition pushed down (or not) seems like a natural approach. >> It actually deals with a more general form of this case, because the >> clauses don't need to reference the same attribute - so for example this >> would work too, assuming there is extended stats object on the columns >> on each side: >> >> P(A.c = B.d | (A.e < 42) & (B.f < 42)) > > That'd be cool. > Yeah, but the patch implementing this still needs more work. >> Not sure. In my experience queries with both a join clause and other >> clauses referencing the same attribute are pretty rare. But I agree if >> we can do the expensive stuff only when actually needed, with no cost in >> the 99.999% other cases, I don't see why not. Of course, code complexity >> is a cost too. > > Right. I mean, we could have a planner GUC to control whether the > optimization is used even in cases where we see that it's possible. > But Tom keeps arguing that it is possible in many queries and would > benefit few queries, and I'm not seeing why that should be so. I think > it's likely to benefit many of the queries to which it applies. > Maybe. Although the example I linked some time ago shows a pretty dramatic improvement, due to picking merge join + index scan, and not realizing we'll have to skip a lot of data. But that's just one anecdotal example. Anyway, I think the best way to deal with these (perfectly legitimate) concerns is to show how expensive it is for queries not not having such join/restriction clauses, with the cost being close to 0. And then for queries with such clauses but not benefiting from the change (a bit like a worst case). regards [1] https://www.postgresql.org/message-id/CA%2B1Wm9U_sP9237f7OH7O%3D-UTab71DWOO4Qc-vnC78DfsJQBCwQ%40mail.gmail.com -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
On Fri, Feb 18, 2022 at 12:56 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
> What do you think about moving on this feature? The items known by me
> are: 1). Make sure the estimation error can be fixed or discuss if my current
> solution is workable. b). Just distribute some selectivity restrictinfo to
> RelOptInfo. c). See how hard it is to treat the original / derived qual equally.
> d). Reduce the extra planner cost at much as possible. Any other important
> items I missed?
I think it's not realistic to do anything here for PostgreSQL 15.
Considering that it's almost the end of February and feature freeze
will probably be in perhaps 5-6 weeks, in order to get something
committed at this point,
Tom doesn't think we need this at all, and you and I and
Tomas all have somewhat different ideas on what approach we ought to
be taking,
and the patches appear to be at a POC level at this point rather than
something that's close to being ready to ship,
It seems to me that the thing to do here is see if you can build
consensus on an approach. Just saying that we ought to think the
patches you've already got are good enough is not going to get you
anywhere.
I do understand that the political element of this problem
is frustrating to you, as it is to many people. But consider the
alternative: suppose the way things worked around here is that any
committer could commit anything they liked without needing the
approval of any other committer, or even over their objections. Well,
it would be chaos.
People would be constantly reverting or rewriting
things that other people had done, and everybody would probably be
pissed off at each other all the time, and the quality would go down
the tubes and nobody would use PostgreSQL any more.
But the reason it's frustrating is because the PostgreSQL
community is a community of human beings, and there's nothing more
frustrating in the world than the stuff other human beings do.
However, it's equally true that we get further working together than
we would individually. I think Tom is wrong about the merits of doing
something in this area, but I also think he's incredibly smart and
thoughtful and one of the best technologists I've ever met, and
probably just one of the absolute best technologists on Planet Earth.
And I also have to consider, and this is really important, the
possibility that Tom is right about this issue and I am wrong. So far
Tom hasn't replied to what I wrote, but I hope he does. Maybe he'll
admit that I have some valid points. Maybe he'll tell me why he thinks
I'm wrong. Maybe I'll learn about some problem that I haven't
considered from his response, and maybe that will lead to a refinement
of the idea that will make it better.
I don't know, but it's certainly
happened in plenty of other cases. And that's how PostgreSQL gets to
be this pretty amazing database that it is. So, yeah, building
consensus is frustrating and it takes a long time and sometimes it
feels like other people are obstructing you needlessly and sometimes
that's probably true. But there's not a realistic alternative. Nobody
here is smart enough to create software that is as good as what all of
us create together.
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
>> It actually deals with a more general form of this case, because the
>> clauses don't need to reference the same attribute - so for example this
>> would work too, assuming there is extended stats object on the columns
>> on each side:
>>
>> P(A.c = B.d | (A.e < 42) & (B.f < 42))
>
> That'd be cool.
>
Yeah, but the patch implementing this still needs more work.
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
On Mon, Feb 21, 2022 at 2:31 AM Andy Fan <zhihui.fan1213@gmail.com> wrote: > +1. Just to be more precise, are you also confused about why this > should not be done at all. IIUC, I get 3 reasons from Tom's reply. > a). Planning cost. b). estimation error. c) extra qual execution is bad. This topic has been discussed a number of times, and Tom has basically always said that he thinks this would be expensive to plan (which I think is true) and that we wouldn't get much benefit (which I think is false). -- Robert Haas EDB: http://www.enterprisedb.com
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Robert Haas <robertmhaas@gmail.com> writes: > This topic has been discussed a number of times, and Tom has basically > always said that he thinks this would be expensive to plan (which I > think is true) and that we wouldn't get much benefit (which I think is > false). I think the trick here, as in so many other places, is to not impose significant extra planning cost on queries that don't end up getting any benefit. I'm not in favor of complicating the EquivalenceClass mechanism for this, because (a) I don't think that such an approach will lead to success on that metric, and (b) what it definitely will do is make ECs harder to understand and reason about. If we develop a separate mechanism that can infer things from inequalities, and it only kicks in when there are some inequalities, that might work out okay. But because of that, I don't even like the 0001 patch in this series. I've not looked at the subsequent ones. regards, tom lane
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
On Tue, Mar 1, 2022 at 5:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > This topic has been discussed a number of times, and Tom has basically > > always said that he thinks this would be expensive to plan (which I > > think is true) and that we wouldn't get much benefit (which I think is > > false). > > I think the trick here, as in so many other places, is to not impose > significant extra planning cost on queries that don't end up getting > any benefit. I agree. My question is: why shouldn't every case where we can deduce an implied inequality be reasonably likely to show a benefit? If the query specifies that a.x = b.x and also that a.x < 42, the only reason to suppose that evaluating a.x < 42 rather than b.x < 42 or in addition to b.x < 42 is likely to be better is if we assume the user knows how the query optimizer works and has employed that knowledge in crafting the query. And admittedly, sophisticated users are probably likely to do that, and even unsophisticated users may do it more likely than chance would dictate. But it still feels like we have a good chance of landing of coming out ahead pretty often unless the user really knows what they are doing. And even then, any mechanism we add here can have an off switch. > I'm not in favor of complicating the EquivalenceClass > mechanism for this, because (a) I don't think that such an approach > will lead to success on that metric, and (b) what it definitely will do > is make ECs harder to understand and reason about. If we develop a > separate mechanism that can infer things from inequalities, and it only > kicks in when there are some inequalities, that might work out okay. > But because of that, I don't even like the 0001 patch in this series. > I've not looked at the subsequent ones. I don't think 0001 is right either, although maybe for somewhat different reasons. First, I think it only considers VAR OP CONST style clauses, but that is leaving money on the table, because given a.x = b.x AND mumble(a.x), we can decide to instead test mumble(b.x) if the equality operator in question has is-binary-identical semantics. It does not seem necessary for a first patch to deal with both that and the somewhat more pleasing case where we're making deductions based on operator families ... but we shouldn't commit to a design for the VAR OP CONST case without understanding how it could be generalized. Second, it looks to me like the patch takes the rather naive strategy of enforcing the derived clauses everywhere that they can legally be put, which seems certain not to be optimal. I don't know whether attaching something to the equivalence class data structure is the right idea or not. Presumably, we don't want to make an extra pass over the query tree to gather the information needed for this kind of optimization, and it feels like we need to know which vars are EMs before we try to derive alternate/additional quals. So I guess we'd want to study clauses for possible use by this kind of mechanism after we've derived ECs but before we do any costing stuff, yet without introducing a whole new pass. Once we do derive that information, where are we going to put it? We have to be able to tell efficiently when looking at a baserel whether there are any implied inequalities that we should be thinking about ... and there's nothing obvious tying all of the relevant places together other than the EM. But I'm kind of blathering here: I feel like there are a lot of complexities I haven't thought hard enough about to have an intelligent opinion. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Robert Haas <robertmhaas@gmail.com> writes: > I agree. My question is: why shouldn't every case where we can deduce > an implied inequality be reasonably likely to show a benefit? Maybe it will be, if we can deal with the issue you already mentioned about not misestimating the resulting partially-redundant conditions. > Second, it looks to me like the patch takes the rather naive strategy > of enforcing the derived clauses everywhere that they can legally be > put, which seems certain not to be optimal. I'm not sure about that ... it's basically what we do with derived equalities. However, there's enough structure in the equivalence-class case that we don't end up enforcing redundant quals. It's not clear to me whether the same can be said here. > I don't know whether attaching something to the equivalence class data > structure is the right idea or not. Presumably, we don't want to make > an extra pass over the query tree to gather the information needed for > this kind of optimization, and it feels like we need to know which > vars are EMs before we try to derive alternate/additional quals. Yeah, we don't want to make an additional pass over the tree, and we also would rather not add an additional set of per-operator catalog lookups. We might be able to generalize the code that looks for equality operators so that it looks for "any btree operator" with the same number of lookups, and then have it feed the results down either the EquivalenceClass path or the inequality path as appropriate. At the end, after we've formed all the ECs, we could have a go at matching up the inequality structures with the ECs. But I don't agree that ECs are a necessary prerequisite. Here are a couple of other patterns that might be worth looking for: * "a > b AND b > c" allows deducing "a > c", whether or not any of those values appears in an EC. * "a > const1 AND a > const2" can be simplified to either "a > const1" or "a > const2" depending on which constant is larger. (The predicate proof mechanism already has a form of this, but we don't typically apply it in a way that would result in dropping the redundant qual.) It's entirely possible that one or both of these patterns is not worth looking for. But I would say that it's equally unproven that deriving "a > c" from "a = b AND b > c" is worth the cycles. I'll grant that it's most likely going to be a win if we can use any of these patterns to generate a restriction clause from what had been join clauses. Beyond that it's much less clear. regards, tom lane
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
I'm not in favor of complicating the EquivalenceClass
mechanism for this, because .... (b) what it definitely will do
is make ECs harder to understand and reason about.
If we develop a
separate mechanism that can infer things from inequalities, and it
_only_
kicks in when there are some inequalities, that might work out okay.
But because of that, I don't even like the 0001 patch in this series.
I've not looked at the subsequent ones.
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
I don't think 0001 is right either, although maybe for somewhat
different reasons. First, I think it only considers VAR OP CONST style
clauses, but that is leaving money on the table, because given a.x =
b.x AND mumble(a.x), we can decide to instead test mumble(b.x) if the
equality operator in question has is-binary-identical semantics. It
does not seem necessary for a first patch to deal with both that and
the somewhat more pleasing case where we're making deductions based on
operator families ... but we shouldn't commit to a design for the VAR
OP CONST case without understanding how it could be generalized.
Second, it looks to me like the patch takes the rather naive strategy
of enforcing the derived clauses everywhere that they can legally be
put, which seems certain not to be optimal.
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
On Tue, Mar 1, 2022 at 9:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > I agree. My question is: why shouldn't every case where we can deduce > > an implied inequality be reasonably likely to show a benefit? > > Maybe it will be, if we can deal with the issue you already mentioned > about not misestimating the resulting partially-redundant conditions. OK. > > Second, it looks to me like the patch takes the rather naive strategy > > of enforcing the derived clauses everywhere that they can legally be > > put, which seems certain not to be optimal. > > I'm not sure about that ... it's basically what we do with derived > equalities. However, there's enough structure in the equivalence-class > case that we don't end up enforcing redundant quals. It's not clear > to me whether the same can be said here. I mean, to go back to the example of a.x < 42 and a.x = b.x, there are three possible choices as to where to enforce the qual (a, b, both). That's a meaningful choice, independent of any estimation issue. I think it is reasonably common to have cases where a.x < 42 is very selective and b.x < 42 hardly filters out anything at all, or the other way around. Certainly, that kind of situation came up a lot in PostgreSQL-based applications that I wrote myself back in the day. If we're just talking about btree operators, *maybe* we can say it's cheap enough that we don't care, but color me a tad skeptical. > > I don't know whether attaching something to the equivalence class data > > structure is the right idea or not. Presumably, we don't want to make > > an extra pass over the query tree to gather the information needed for > > this kind of optimization, and it feels like we need to know which > > vars are EMs before we try to derive alternate/additional quals. > > Yeah, we don't want to make an additional pass over the tree, and > we also would rather not add an additional set of per-operator > catalog lookups. We might be able to generalize the code that looks > for equality operators so that it looks for "any btree operator" > with the same number of lookups, and then have it feed the results > down either the EquivalenceClass path or the inequality path > as appropriate. At the end, after we've formed all the ECs, we > could have a go at matching up the inequality structures with the > ECs. Interesting idea. > But I don't agree that ECs are a necessary prerequisite. > Here are a couple of other patterns that might be worth looking for: > > * "a > b AND b > c" allows deducing "a > c", whether or not any > of those values appears in an EC. > > * "a > const1 AND a > const2" can be simplified to either "a > const1" > or "a > const2" depending on which constant is larger. (The predicate > proof mechanism already has a form of this, but we don't typically > apply it in a way that would result in dropping the redundant qual.) > > It's entirely possible that one or both of these patterns is not > worth looking for. But I would say that it's equally unproven > that deriving "a > c" from "a = b AND b > c" is worth the cycles. > I'll grant that it's most likely going to be a win if we can use > any of these patterns to generate a restriction clause from what > had been join clauses. Beyond that it's much less clear. Pretty much all of the cases that I've run across involve an equijoin plus an inequality, so if somebody asked me which problem we ought to put most effort into solving, I'd say that one. Cases like "a>1 and a>2" or a same-table case like "a=b and b>3" haven't been as common in my experience, and haven't caused as much trouble when they do happen. Part of that is because if you have something like "a>1 and a>2" in your query, it may be easy for you to just tweak the query generation to avoid it, and if "a=b and b>3" is coming up a lot, you may choose to adjust your data model (e.g. choose to store NULL in b to indicate same-as-a), whereas if you have something like "orders.orderno=order_lines.orderno and order_lines.orderno<10000," what are you going to do to avoid that exactly? If you normalize your order data and then want to find the old orders, this problem arises ineluctably. But having said that, I'm not *against* doing something about those cases if it's cheap or falls out naturally. If we could detect for free that the user had written a>1 and a>2, it'd certainly be beneficial to drop the former qual and keep only the latter. If the user writes a>b and b>c and all those columns are in one table I don't see how it helps to derive a>c, because we're still going to need to check the other two quals anyway so we've just created more work. But if those columns are not all in the same table then I'd say chances are really pretty good. Like, suppose it's x.a>y.b and y.b>x.c. Well, like I say, I don't really see people writing queries like that myself, but if they do, it seems pretty obvious that deriving x.a>x.c has the potential to save a LOT of trouble. If it's x.a>y.b and y.b>z.c I don't feel quite so optimistic, but it may be that we would like to do the x-z join first, and if we do, enforcing x.a>z.c at that level to shrink the join product seems like a very strong idea. It is a slight loss if we run that qual on lots of rows and it never fails, but it is a gigantic win if it filters out a bunch of stuff. I bet a lot of users would be VERY happy to pay the cost of testing x.a>z.c at the x-z join level even on queries where the statistics suggest that it will be entirely useless, because it won't cost that much to check it, and if by some chance the statistics are misleading, it might prevent a really bad outcome where the query runs for a super-long time and they get paged. So the questions in my mind here are all about whether we can detect this stuff cheaply and whether anybody wants to do the work to make it happen, not whether we'd get a benefit in the cases where it kicks in. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Robert Haas <robertmhaas@gmail.com> writes: > So the questions in my mind here are all > about whether we can detect this stuff cheaply and whether anybody > wants to do the work to make it happen, not whether we'd get a benefit > in the cases where it kicks in. Right, my worries are mostly about the first point. regards, tom lane
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
On Wed, Mar 2, 2022 at 11:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > So the questions in my mind here are all > > about whether we can detect this stuff cheaply and whether anybody > > wants to do the work to make it happen, not whether we'd get a benefit > > in the cases where it kicks in. > > Right, my worries are mostly about the first point. OK, cool. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
On Wed, Mar 2, 2022 at 11:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > So the questions in my mind here are all
> > about whether we can detect this stuff cheaply and whether anybody
> > wants to do the work to make it happen, not whether we'd get a benefit
> > in the cases where it kicks in.
>
> Right, my worries are mostly about the first point.
OK, cool.
named btreeineqopfamilies is added in RestictInfo and it is set
with the same round syscache search for check_mergejoinable. Because
of this, check_mergejoinable is renamed to check_btreeable.
The bad part of this is it only works for opclause so far.
be applied to any EquivalenceMember in this EC. Later this information is used
to generate new RestrictInfo and was distributed to related RelOptInfo very
soon. There are 3 major steps here:
a). In distribute_qual_to_rels to gather the ineq quallist.
b). After deconstruct_jointree, distribute_filter_quals_to_eclass distributes
these ineq-quallist to the related EC's ef_filters.
c). generate_base_implied_equalities_no_const scan the ec_filters and distribute
the restrictinfo to related RelOptInfo.
Patch 3: Reduce some planning cost for deriving qual for EC filter feature.
1. Check if the qual is simple enough by checking rinfo->right_relids and
info->right_relids, save the pull_varnos of rinfo->clause calls.
2. check contain_volatile_functions against RestrictInfo, so that
the result can be shared with following calls.
3. By employing the RestictInfo->btreeineqfamility which is calculating.
with the same round of calculating RestrictInfo->mergeopfamilies. In this
way we save some calls for syscache.
4. Calculating the opfamility and amstrategy at
distribute_filter_quals_to_eclass and cache the results in EquivalenceFilter.
if no suitable opfamility and amstrategy are found, bypass the qual immediately
and at last using the cached value generate_base_implied_equalities_no_const.
After this change, there is a testcase changed unexpectedly in equivclass.out
(compared with Patch-2 expectation file.)
create user regress_user_ectest;
grant select on ec0 to regress_user_ectest;
grant select on ec1 to regress_user_ectest;
set session authorization regress_user_ectest;
-- with RLS active, the non-leakproof a.ff = 43 clause is not treated
-- as a suitable source for an EquivalenceClass; currently, this is true
-- even though the RLS clause has nothing to do directly with the EC
explain (costs off)
regression-> select * from ec0 a, ec1 b
regression-> where a.ff = b.ff and a.ff = 43::bigint::int8alias1;
The b.ff = 43 has disappeared from ec1 b. But since it even didn't shown
before the EC filter, so I'm not sure my changes here make something wrong,
maybe fix an issue by accident?
insert into ec_t1000 select i from generate_series(1, 1000)i;
create table ec_t110 (a int);
insert into ec_t110 select i from generate_series(1, 110) i;
create table ec_t200 (a int);
insert into ec_t200 select i from generate_series(1, 200) i;
analyze ec_t1000, ec_t110, ec_t200;
query 1: explain select * from ec_t1000, ec_t110 where ec_t1000.a = ec_t110.a and ec_t1000.a > 100; -- (0.9)
query 2: explain select * from ec_t1000, ec_t110 where ec_t1000.a = ec_t110.a and ec_t110.a > 100; -- (0.1)
query 3: explain select * from ec_t1000, ec_t110 , ec_t200 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t1000.a > 100;
query 4: explain select * from ec_t1000, ec_t110 , ec_t200 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t110.a > 100;
query 5: explain select * from ec_t1000, ec_t110 , ec_t200 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t200.a > 100;
|----------+-----------+---------------------+------------------------|
| 1 | 10 | 99 | 10 |
| 2 | 10 | 10 | 10 |
| 3 | 10 | 20 | 11 |
| 4 | 10 | 2 | 11 |
| 5 | 10 | 11 | 11 |
generate_base_implied_equalities_no_const, no other things are changed.
PlannerInfo.correlative_quals is added to manage all the CorrectiveQuals at
subquery level. RelOptInfo.cqual_indexes is a List * to indicate a which
CorrectiveQuals this relation is related to. This is designed for easy to check if
the both sides of joinrel correlated to the same CorrectiveQuals. The reason why
The overall design of handing the joinrel size estimation is:
a). At the base relation level, we just count everything with the correlative
quals. b). During any level joinrel size estimation, we just keep 1 side's
cqual (short for corrective qual) selectivity by eliminating the other one. so
the size estimation for a mergeable join selectivity becomes to:
rows = R1.rows X r2.rows X 1 / Max (ndistval_of_colA, ndistinval_of_colB) X 1 /
Selectivity(R1's CorrectiveQual).
r1.rows X 1 / Selectivity(R1's CorrectiveQual) eliminates the impact of
CorrectiveQual on R1. After this, the JoinRel of (R1, R2) still be impacted by
this CorrectiveQual but "just once" in this level. Later if JoinRel(R1, R2) needs
to join with R3, and R3 is impacted by this CorectiveQuals as well. We
need to keep one and eliminate the other one as above again.
The algorithm for which Selectivity should be eliminated and which one should be
kept is:
When we join 2 inner_rel and outer_rel with a mergeable join restrictinfo, if
both sides is impacted with the same CorrectiveQual, we first choose which "side"
to eliminating based on which side of the restrictinfo has a higher distinct
value. The reason for this is more or less because we used "Max"(ndistinctValT1,
ndistinctValT2). After deciding which "side" to eliminate, the real eliminating
selectivity is RelOptInfo->cqual_selectivity[n], The left one still takes effect
Introduction of RelOptInfo->cqual_selectivity:
The number of elements in cqual_selecitity equals
the length of cqual_indexes. The semantics is which
Selectivity in the corresponding CorectiveQuals's qual
list is taking effect. At any time, only 1 Qual
Selectivity is counted for any-level of joinrel size estimation.
In reality, it is possible to have many CorrectiveQuals, but for design
discussion, the current implementation only takes care of the 1 CorrectiveQuals.
This would be helpful for PoC/review/discussion.
Some flow for the key data:
1. root->corrective_quals is initialized at
generate_base_implied_equalities_no_const stage. we create a CorrectiveQual in
this list for each ec_filter and fill the RestrictInfo part for it. At
the same time, we note that which RelOptInfo (cqual_indexes) is related to this cqual.
2. RelOptInfo->cqual_selecitity for baserel is set at the end of set_rel_size,
at this time, the selectivity for every RestrictInfo is calculated, we can just
fetch the cached value. As for joinrel, it is maintained in
calc_join_cqual_selectivity, this function would return the Selectivity to
eliminate and set the above value.
Limitation in this PoC:
1. Only support 1 CorrectiveQual in root->correlative_quals
2. Only tested with INNER_JOIN.
3. Inherited tables are not supported.
Attachment
- v3-0003-Reduce-some-planning-cost-for-deriving-qual-for-E.patch
- v3-0004-Prepare-the-code-for-CorrectiveQual-structure.patch
- v3-0001-expand-the-duties-of-check_mergejoinable-to-check.patch
- v3-0002-Introudce-ec_filters-in-EquivalenceClass-struct-t.patch
- v3-0005-CorrectiveQuals-is-as-simple-as-a-List-of-Restric.patch
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
I just tested more cases for the estimation issue for this feature, and we can
find **we get a more accurate/stable estimation than before**. Here is the test
create table ec_t110 as select i::int as a from generate_series(1, 110) i;
create table ec_t200 as select i::int as a from generate_series(1, 200) i;
create table ec_t500 as select i::int as a from generate_series(1, 500) i;
create table ec_t800 as select i::int as a from generate_series(1, 800) i;
create table ec_t1000 as select i::int as a from generate_series(1, 1000) i;
analyze;
-- 2 table joins.
explain analyze select * from ec_t1000, ec_t110 where ec_t1000.a = ec_t110.a and ec_t1000.a > 100; -- (0.9)
explain analyze select * from ec_t1000, ec_t110 where ec_t1000.a = ec_t110.a and ec_t110.a > 100; -- (0.1)
-- 3 table joins.
explain analyze select * from ec_t1000, ec_t110 , ec_t200 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t1000.a > 100;
explain analyze select * from ec_t1000, ec_t110 , ec_t200 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t110.a > 100;
explain analyze select * from ec_t1000, ec_t110 , ec_t200 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t200.a > 100;
-- 4 table joins.
explain analyze select * from ec_t1000, ec_t110 , ec_t200, ec_t500 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t500.a = ec_t200.a and ec_t1000.a > 100;
explain analyze select * from ec_t1000, ec_t110 , ec_t200, ec_t500 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t500.a = ec_t200.a and ec_t110.a > 100;
explain analyze select * from ec_t1000, ec_t110 , ec_t200, ec_t500 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t500.a = ec_t200.a and ec_t200.a > 100;
explain analyze select * from ec_t1000, ec_t110 , ec_t200, ec_t500 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t500.a = ec_t200.a and ec_t500.a > 100;
-- 5 table joins.
explain analyze select * from ec_t1000, ec_t110 , ec_t200, ec_t500, ec_t800 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t500.a = ec_t200.a and ec_t500.a = ec_t800.a and ec_t1000.a > 100;
explain analyze select * from ec_t1000, ec_t110 , ec_t200, ec_t500, ec_t800 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t500.a = ec_t200.a and ec_t500.a = ec_t800.a and ec_t110.a > 100;
explain analyze select * from ec_t1000, ec_t110 , ec_t200, ec_t500, ec_t800 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t500.a = ec_t200.a and ec_t500.a = ec_t800.a and ec_t200.a > 100;
explain analyze select * from ec_t1000, ec_t110 , ec_t200, ec_t500, ec_t800 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t500.a = ec_t200.a and ec_t500.a = ec_t800.a and ec_t500.a > 100;
explain analyze select * from ec_t1000, ec_t110 , ec_t200, ec_t500, ec_t800 where ec_t1000.a = ec_t110.a and ec_t110.a = ec_t200.a and ec_t500.a = ec_t200.a and ec_t500.a = ec_t800.a and ec_t800.a > 100;
| Query Id | Real rows | Est. Rows at master | Est. rows with patched | table # |
|----------+-----------+---------------------+------------------------+---------|
| 1 | 10 | 99 | 10 | 2 |
| 2 | 10 | 10 | 10 | 2 |
| 3 | 10 | 20 | 11 | 3 |
| 4 | 10 | 2 | 11 | 3 |
| 5 | 10 | 11 | 11 | 3 |
| 6 | 10 | 10 | 9 | 4 |
| 7 | 10 | 1 | 9 | 4 |
| 8 | 10 | 6 | 9 | 4 |
| 9 | 10 | 9 | 9 | 4 |
| 10 | 10 | 8 | 8 | 5 |
| 11 | 10 | 1 | 8 | 5 |
| 12 | 10 | 5 | 8 | 5 |
| 13 | 10 | 7 | 8 | 5 |
| 14 | 10 | 8 | 8 | 5 |
In the past, we can just use the qual user provided to do estimation. As for
now, since we introduce the CorrectiveQuals design, we still keep just only 1
qual counted, but we can choose the best one in CorrectiveQuals no matter which
one is provided by the user. we gain a better and stable estimation because of this.
I'm happy about the overall design but not pretty confident about the method to
"choose the best one to keep". So I did some test case as many as I can to find
something is wrong, so far so good.
I'm also happy with how to keep only one qual in CorrectiveQuals (not choose the
best one). Assume we just have 1 EC filter in this query for simplicity. At the
beginning, all the baserel have been impacted by CorrectiveQual. When join 2
relations, we rollback 1 side and keep the other one. when we join this joinrel
with another rel, we rollback 1 side and keep the other one and so forth.
The patchset can be applied cleanly with 9e98583898c347e007958c8a09911be2ea4acfb9.
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Hi, On 2022-03-08 21:44:37 +0800, Andy Fan wrote: > I have finished the PoC for planning timing improvement and joinrel rows > estimation. This currently crashes on cfbot: https://api.cirrus-ci.com/v1/task/6158455839916032/logs/cores.log https://cirrus-ci.com/task/6158455839916032 As this is clearly not 15 material, I've set the target version as 16. But it might be good to just move the whole entry to the next CF... Greetings, Andres Freund
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Hi,
On 2022-03-08 21:44:37 +0800, Andy Fan wrote:
> I have finished the PoC for planning timing improvement and joinrel rows
> estimation.
This currently crashes on cfbot:
https://api.cirrus-ci.com/v1/task/6158455839916032/logs/cores.log
https://cirrus-ci.com/task/6158455839916032
for a RestrictInfo after set_rel_size, however this is not true for foreign
table with use_remote_estimate=true. Since we are in a design discussion stage,
I just disable this feature for foreign tables and can fix it later. Would this be the
As this is clearly not 15 material, I've set the target version as 16. But it
might be good to just move the whole entry to the next CF...
Attachment
- v4-0004-Prepare-the-code-for-CorrectiveQual-structure.patch
- v4-0005-CorrectiveQuals-is-as-simple-as-a-List-of-Restric.patch
- v4-0003-Reduce-some-planning-cost-for-deriving-qual-for-E.patch
- v4-0002-Introudce-ec_filters-in-EquivalenceClass-struct-t.patch
- v4-0001-expand-the-duties-of-check_mergejoinable-to-check.patch
- v4-0006-Disable-ec-filter-for-foregin-table-for-now.patch
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
On Thu, Mar 24, 2022 at 3:22 PM Andy Fan <zhihui.fan1213@gmail.com> wrote: > Here is the latest code. I have rebased the code with the latest master a1bc4d3590b. FYI this is failing with an unexpected plan in the partition_join test: https://api.cirrus-ci.com/v1/artifact/task/6090435050340352/log/src/test/regress/regression.diffs
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
On Thu, Mar 24, 2022 at 3:22 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
> Here is the latest code. I have rebased the code with the latest master a1bc4d3590b.
FYI this is failing with an unexpected plan in the partition_join test:
https://api.cirrus-ci.com/v1/artifact/task/6090435050340352/log/src/test/regress/regression.diffs
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
On 17/5/2022 05:00, Andy Fan wrote: > Thanks. But I will wait to see if anyone will show interest with this. > Or else > Moving alone is not a great experience. To move forward I've rebased your patchset onto new master, removed annoying tailing backspaces and applied two regression test changes, caused by second patch: first of changes are legal, second looks normal but should be checked on optimality. As I see, a consensus should be found for the questions: 1. Case of redundant clauses (x < 100 and x < 1000) 2. Planning time degradation for trivial OLTP queries -- regards, Andrey Lepikhov Postgres Professional
Attachment
- v5-0001-Expand-the-duties-of-check_mergejoinable-to-check-no.patch
- v5-0002-Introudce-ec_filters-in-EquivalenceClass-struct-the-.patch
- v5-0003-Reduce-some-planning-cost-for-deriving-qual-for-EC-f.patch
- v5-0004-Prepare-the-code-for-CorrectiveQual-structure.patch
- v5-0005-CorrectiveQuals-is-as-simple-as-a-List-of-RestrictIn.patch
- v5-0006-Disable-ec-filter-for-foregin-table-for-now.-We-do-n.patch
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
2022年7月7日(木) 20:11 Andrey Lepikhov <a.lepikhov@postgrespro.ru>: > > On 17/5/2022 05:00, Andy Fan wrote: > > Thanks. But I will wait to see if anyone will show interest with this. > > Or else > > Moving alone is not a great experience. > To move forward I've rebased your patchset onto new master, removed > annoying tailing backspaces and applied two regression test changes, > caused by second patch: first of changes are legal, second looks normal > but should be checked on optimality. > As I see, a consensus should be found for the questions: > 1. Case of redundant clauses (x < 100 and x < 1000) > 2. Planning time degradation for trivial OLTP queries Hi cfbot reports the patch no longer applies [1]. As CommitFest 2022-11 is currently underway, this would be an excellent time to update the patch. [1] http://cfbot.cputube.org/patch_40_3524.log Thanks Ian Barwick
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
cfbot reports the patch no longer applies [1]. As CommitFest 2022-11 is
currently underway, this would be an excellent time to update the patch.