Thread: BUG #18442: Unnecessary Sort operator in indexScan Plan
The following bug has been logged on the website: Bug reference: 18442 Logged by: yajun Hu Email address: hu_yajun@qq.com PostgreSQL version: 14.11 Operating system: CentOS7 with kernel version 5.10 Description: I have reproduced this problem in REL_14_11 and the latest master branch (84db9a0eb10dd1dbee6db509c0e427fa237177dc). The steps to reproduce are as follows. 1. ./configure --enable-debug --enable-depend --enable-cassert CFLAGS=-O0 2. make -j; make install -j; initdb -D ./primary; pg_ctl -D ../primary -l logfile start 3. run SQL: ``` create table t( a int, b int); insert into t select null,i from generate_series(1,100)i; insert into t select i,i from generate_series(1,100000)i; analyze t; create index on t(a,b); postgres=# explain select * from t where a is null order by b; -- need sort QUERY PLAN -------------------------------------------------------------------------------- Sort (cost=9.54..9.80 rows=103 width=8) Sort Key: b -> Index Only Scan using t_a_b_idx on t (cost=0.29..6.10 rows=103 width=8) Index Cond: (a IS NULL) (4 rows) postgres=# explain select * from t where a is null order by a, b; -- no need sort QUERY PLAN -------------------------------------------------------------------------- Index Only Scan using t_a_b_idx on t (cost=0.29..6.10 rows=103 width=8) Index Cond: (a IS NULL) (2 rows) postgres=# explain select * from t where a = 1 order by b; -- no need sort QUERY PLAN ------------------------------------------------------------------------ Index Only Scan using t_a_b_idx on t (cost=0.29..4.31 rows=1 width=8) Index Cond: (a = 1) (2 rows) ``` In my understanding, in the first SELECT, because a is always NULL, the scanned data access by IndexOnlyScan is sorted according to b, which means that the upper Sort operator is unnecessary overhead.The second and third SELECT are both as expected. I tried to analyze the code and found that the EquivalenceClass of column a and NULL was missing, which caused build_index_pathkeys to return NIL. No pathkeys makes the optimizer decide that the upper layer needed Sort to ensure that the data was in order. I roughly know that it may be because NullTest in the check_mergejoinable function is not OpExpr. Is it possible here to generate special EquivalenceClass for column a and NULL to solve this problem? I’m looking forward to someone answering my confusion, thank you very much!
PG Bug reporting form <noreply@postgresql.org> writes: > create index on t(a,b); > postgres=# explain select * from t where a is null order by b; -- need > sort > QUERY PLAN > -------------------------------------------------------------------------------- > Sort (cost=9.54..9.80 rows=103 width=8) > Sort Key: b > -> Index Only Scan using t_a_b_idx on t (cost=0.29..6.10 rows=103 > width=8) > Index Cond: (a IS NULL) > (4 rows) Postgres doesn't detect that it could do this because "a IS NULL" is not an equivalence condition. You're not the first to suggest that it could be treated as one, but I believe there are semantic difficulties that would ensue. One example is that given "a IS NULL" and "a = b", the EquivalenceClass machinery would think it can discard "a = b" and instead emit "b IS NULL", which would not give the same answers. regards, tom lane
On Sat, 20 Apr 2024 at 02:03, Tom Lane <tgl@sss.pgh.pa.us> wrote: > One example is that given > "a IS NULL" and "a = b", the EquivalenceClass machinery would > think it can discard "a = b" and instead emit "b IS NULL", > which would not give the same answers. While this is relatively fresh, for the sake of the archives... Presumably, if a=b is strict then effectively nothing could match as the strict qual ensures NULLs never match and the IS NULL only allows NULLs. Couldn't strict equality conditions be handled using the same method that we use to handle an Eclass with two distinct Consts. e.g a = 1 and a=b and b=2? If the equality condition isn't strict then won't "b" be NULL if "a" is? David
David Rowley <dgrowleyml@gmail.com> writes: > Couldn't strict equality conditions be handled using the same method > that we use to handle an Eclass with two distinct Consts. e.g a = 1 > and a=b and b=2? Perhaps, but it'd be a bit odd and confusing to treat two NULLs as distinct rather than equal. On the whole I'm disinclined to add complexity around this. My gut says there are more semantic gotchas than this one, and I think the use-case is at best pretty hokey. If your data design requires using NULL as though it's a normal data value, you're in a state of sin already, and you're going to find SQL fighting you all the way. (A question closely related to this is whether IS NOT DISTINCT FROM could be optimized more like regular equality. I'm not convinced about the use-case there either, although perhaps it's worth looking into.) regards, tom lane
On Mon, Apr 22, 2024 at 9:53 AM David Rowley <dgrowleyml@gmail.com> wrote:
Presumably, if a=b is strict then effectively nothing could match as
the strict qual ensures NULLs never match and the IS NULL only allows
NULLs.
Yeah, in this case the two restrictions are self-inconsistent and that
makes the relation dummy and need not be scanned. I think the planner
would figure that out in relation_excluded_by_constraints when the GUC
constraint_exclusion is on, but in a different way than ECs.
Thanks
Richard
makes the relation dummy and need not be scanned. I think the planner
would figure that out in relation_excluded_by_constraints when the GUC
constraint_exclusion is on, but in a different way than ECs.
Thanks
Richard
On Mon, 22 Apr 2024 at 14:25, Tom Lane <tgl@sss.pgh.pa.us> wrote: > (A question closely related to this is whether IS NOT DISTINCT FROM > could be optimized more like regular equality. I'm not convinced > about the use-case there either, although perhaps it's worth looking > into.) This came up for me recently when adjusting the UNION planner to allow the use of Merge Append. I'd considered taking that work further to expand it into INTERSECT and EXCEPT to transform those to INNER and ANTI joins. I quickly realised that we'd need to have better optimisation of joins with IS NOT DISTINCT FROM to make this a worthwhile transformation. The join transformation cannot be done with simple equality conditions as that would eliminate NULLs. I was unexcited about only making it work for non-nullable Vars as it seemed like a backwater within a backwater. David