Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers
From | Alena Rybakina |
---|---|
Subject | Re: POC, WIP: OR-clause support for indexes |
Date | |
Msg-id | 036d64e6-4d47-485c-b047-6664d452b3e8@postgrespro.ru Whole thread Raw |
In response to | Re: POC, WIP: OR-clause support for indexes (Alexander Korotkov <aekorotkov@gmail.com>) |
List | pgsql-hackers |
Thank you for updated version! I agree for your version of the code.
While analyzing the modified query plan from the regression test, I noticed that despite using a full seqscan for table t2 in the original plan,On Fri, Jan 31, 2025 at 4:31 PM Alena Rybakina <a.rybakina@postgrespro.ru> wrote:I started reviewing at the patch and saw some output "ERROR" in the output of the test and is it okay here? SELECT * FROM tenk1 t1 WHERE t1.thousand = 42 OR t1.thousand = (SELECT t2.tenthous FROM tenk1 t2 WHERE t2.thousand = t1.tenthous); ERROR: more than one row returned by a subquery used as an expressionThe output is correct for this query. But the query is very unfortunate for the regression test. I've revised query in the v47 revision [1]. Links. 1. https://www.postgresql.org/message-id/CAPpHfdsBZmNt9qUoJBqsQFiVDX1%3DyCKpuVAt1YnR7JCpP%3Dk8%2BA%40mail.gmail.com
its results are cached by Materialize node, and this can significantly speed up the execution of the NestedLoop algorithm.
For example, after running the query several times, I got results that show that the query execution time was twice as bad.
Original plan:
EXPLAIN ANALYZE SELECT * FROM bitmap_split_or t1, bitmap_split_or t2 WHERE t1.a=t2.b OR t1.a=1; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..70067.00 rows=2502499 width=24) (actual time=0.015..1123.247 rows=2003000 loops=1) Join Filter: ((t1.a = t2.b) OR (t1.a = 1)) Rows Removed by Join Filter: 1997000 Buffers: shared hit=22 -> Seq Scan on bitmap_split_or t1 (cost=0.00..31.00 rows=2000 width=12) (actual time=0.006..0.372 rows=2000 loops=1) Buffers: shared hit=11 -> Materialize (cost=0.00..41.00 rows=2000 width=12) (actual time=0.000..0.111 rows=2000 loops=2000) Storage: Memory Maximum Storage: 110kB Buffers: shared hit=11 -> Seq Scan on bitmap_split_or t2 (cost=0.00..31.00 rows=2000 width=12) (actual time=0.003..0.188 rows=2000 loops=1) Buffers: shared hit=11 Planning Time: 0.118 ms Execution Time: 1204.874 ms (13 rows)
Query plan after the patch:
EXPLAIN ANALYZE SELECT * FROM bitmap_split_or t1, bitmap_split_or t2 WHERE t1.a=t2.b OR t1.a=1; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.28..56369.00 rows=2502499 width=24) (actual time=0.121..2126.606 rows=2003000 loops=1) Buffers: shared hit=16009 read=2 -> Seq Scan on bitmap_split_or t2 (cost=0.00..31.00 rows=2000 width=12) (actual time=0.017..0.652 rows=2000 loops=1) Buffers: shared hit=11 -> Index Scan using t_a_b_idx on bitmap_split_or t1 (cost=0.28..18.15 rows=1002 width=12) (actual time=0.044..0.627 rows=1002 loops=2000) Index Cond: (a = ANY (ARRAY[t2.b, 1])) Buffers: shared hit=15998 read=2 Planning Time: 0.282 ms Execution Time: 2344.367 ms (9 rows)
I'm afraid that we may lose this with this optimization. Maybe this can be taken into account somehow, what do you think?
-- Regards, Alena Rybakina Postgres Professional
pgsql-hackers by date: