Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers

From a.rybakina
Subject Re: POC, WIP: OR-clause support for indexes
Date
Msg-id 055cb3c5-38a5-1c6e-cf97-ebc9b56dae56@postgrespro.ru
Whole thread Raw
In response to Re: POC, WIP: OR-clause support for indexes  ("a.rybakina" <a.rybakina@postgrespro.ru>)
List pgsql-hackers

Sorry for the duplicates, I received a letter that my letter did not reach the addressee, I thought the design was incorrect.

On 26.09.2023 12:21, a.rybakina wrote:

I'm sorry I didn't write for a long time, but I really had a very difficult month, now I'm fully back to work.I was able to implement the patches to the end and moved the transformation of "OR" expressions to ANY. I haven't seen a big difference between them yet, one has a transformation before calculating selectivity (v7.1-Replace-OR-clause-to-ANY.patch), the other after (v7.2-Replace-OR-clause-to-ANY.patch). Regression tests are passing, I don't see any problems with selectivity, nothing has fallen into the coredump, but I found some incorrect transformations. What is the reason for these inaccuracies, I have not found, but, to be honest, they look unusual). Gave the error below.

In the patch, I don't like that I had to drag three libraries from parsing until I found a way around it.The advantage of this approach compared to the other ([1]) is that at this stage all possible or transformations are performed, compared to the patch, where the transformation was done at the parsing stage. That is, here, for example, there are such optimizations in the transformation:
I took the common element out of the bracket and the rest is converted to ANY, while, as noted by Peter Geoghegan, we did not have several bitmapscans, but only one scan through the array.

postgres=# explain analyze SELECT p1.oid, p1.proname
FROM pg_proc as p1
WHERE prolang = 13 AND prolang=1 OR prolang = 13 AND prolang = 2 OR prolang = 13 AND prolang = 3;
                                              QUERY PLAN                                               
-------------------------------------------------------------------------------------------------------
 Seq Scan on pg_proc p1  (cost=0.00..151.66 rows=1 width=68) (actual time=1.167..1.168 rows=0 loops=1)
   Filter: ((prolang = '13'::oid) AND (prolang = ANY (ARRAY['1'::oid, '2'::oid, '3'::oid])))
   Rows Removed by Filter: 3302
 Planning Time: 0.146 ms
 Execution Time: 1.191 ms
(5 rows)

While I was testing, I found some transformations that don't work, although in my opinion, they should:

1. First case:
explain analyze SELECT p1.oid, p1.proname
FROM pg_proc as p1
WHERE prolang = 13 AND prolang=1 OR prolang = 2 AND prolang = 2 OR prolang = 13 AND prolang = 13;
                                                                              QUERY PLAN                                                                              
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on pg_proc p1  (cost=0.00..180.55 rows=2 width=68) (actual time=2.959..3.335 rows=89 loops=1)
   Filter: (((prolang = '13'::oid) AND (prolang = '1'::oid)) OR ((prolang = '2'::oid) AND (prolang = '2'::oid)) OR ((prolang = '13'::oid) AND (prolang = '13'::oid)))
   Rows Removed by Filter: 3213
 Planning Time: 1.278 ms
 Execution Time: 3.486 ms
(5 rows)

Should have left only prolang = '13'::oid:

                                              QUERY PLAN                                               
-------------------------------------------------------------------------------------------------------
 Seq Scan on pg_proc p1  (cost=0.00..139.28 rows=1 width=68) (actual time=2.034..2.034 rows=0 loops=1)
   Filter: ((prolang = '13'::oid ))
   Rows Removed by Filter: 3302
 Planning Time: 0.181 ms
 Execution Time: 2.079 ms
(5 rows)

2. Also does not work:
postgres=# explain analyze SELECT p1.oid, p1.proname
FROM pg_proc as p1
WHERE prolang = 13 OR prolang = 2 AND prolang = 2 OR prolang = 13;
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Seq Scan on pg_proc p1  (cost=0.00..164.04 rows=176 width=68) (actual time=2.422..2.686 rows=89 loops=1)
   Filter: ((prolang = '13'::oid) OR ((prolang = '2'::oid) AND (prolang = '2'::oid)) OR (prolang = '13'::oid))
   Rows Removed by Filter: 3213
 Planning Time: 1.370 ms
 Execution Time: 2.799 ms
(5 rows)

Should have left:
Filter: ((prolang = '13'::oid) OR (prolang = '2'::oid))

3. Or another:

explain analyze SELECT p1.oid, p1.proname
FROM pg_proc as p1
WHERE prolang = 13 OR prolang=13 OR prolang = 2 AND prolang = 2;
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Seq Scan on pg_proc p1  (cost=0.00..164.04 rows=176 width=68) (actual time=2.350..2.566 rows=89 loops=1)
   Filter: ((prolang = '13'::oid) OR (prolang = '13'::oid) OR ((prolang = '2'::oid) AND (prolang = '2'::oid)))
   Rows Removed by Filter: 3213
 Planning Time: 0.215 ms
 Execution Time: 2.624 ms
(5 rows)

Should have left:
Filter: ((prolang = '13'::oid) OR (prolang = '2'::oid))


Falls into coredump at me:
explain analyze SELECT p1.oid, p1.proname
FROM pg_proc as p1
WHERE prolang = 13 OR prolang = 2 AND prolang = 2 OR prolang = 13;

explain analyze SELECT p1.oid, p1.proname
FROM pg_proc as p1
WHERE prolang = 13 OR prolang=13 OR prolang = 2 AND prolang = 2;
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Seq Scan on pg_proc p1  (cost=0.00..164.04 rows=176 width=68) (actual time=2.350..2.566 rows=89 loops=1)
   Filter: ((prolang = '13'::oid) OR (prolang = '13'::oid) OR ((prolang = '2'::oid) AND (prolang = '2'::oid)))
   Rows Removed by Filter: 3213
 Planning Time: 0.215 ms
 Execution Time: 2.624 ms

(5 rows)


I remind that initially the task was to find an opportunity to optimize the case of processing a large number of "or" expressions to optimize memory consumption. The FlameGraph for executing 50,000 "or" expressionshas grown 1.4Gb and remains in this state until exiting the psql session (flamegraph1.png) and it sagged a lot in execution time. If this case is converted to ANY, the query is executed much faster and memory is optimized (flamegraph2.png). It may be necessary to use this approach if there is no support for the framework to process ANY, IN expressions.

Peter Geoghegan also noticed some development of this patch in terms of preparing some transformations to optimize the query at the stage of its execution [0].

[0] https://www.postgresql.org/message-id/CAH2-Wz%3D9N_4%2BEyhtyFqYQRx4OgVbP%2B1aoYU2JQPVogCir61ZEQ%40mail.gmail.com

[1] https://www.postgresql.org/message-id/attachment/149105/v7-Replace-OR-clause-to-ANY-expressions.patch

pgsql-hackers by date:

Previous
From: "a.rybakina"
Date:
Subject: Re: POC, WIP: OR-clause support for indexes
Next
From: David Rowley
Date:
Subject: Re: Avoid a possible out-of-bounds access (src/backend/optimizer/util/relnode.c)