Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers
From | Alena Rybakina |
---|---|
Subject | Re: POC, WIP: OR-clause support for indexes |
Date | |
Msg-id | 919bfbcb-f812-758d-d687-71f89f0d9a68@postgrespro.ru Whole thread Raw |
In response to | Re: POC, WIP: OR-clause support for indexes (Andrey Lepikhov <a.lepikhov@postgrespro.ru>) |
Responses |
Re: POC, WIP: OR-clause support for indexes
|
List | pgsql-hackers |
I agree with your idea and try to implement it and will soon attach a patch with a solution.
I also have a really practical example confirming that such optimization can be useful.
A query was written that consisted of 50000 conditions due to the fact that the ORM framework couldn't work with a query having an ANY operator. In summary, we got a better plan that contained 50000 Bitmap Index Scan nodes with 50000 different conditions. Since approximately 27336 Bite of memory were required to initialize one BitmapOr Index Scan node, therefore, about 1.27 GB of memory was spent at the initialization step of the plan execution and query execution time was about 55756,053 ms (00:55,756).
psql -U postgres -c "CREATE DATABASE test_db" pgbench -U postgres -d test_db -i -s 10
SELECT FORMAT('prepare x %s AS SELECT * FROM pgbench_accounts a WHERE %s', '(' || string_agg('int', ',') || ')', string_agg(FORMAT('aid = $%s', g.id), ' or ') ) AS cmd FROM generate_series(1, 50000) AS g(id) \gexec
SELECT FORMAT('execute x %s;', '(' || string_agg(g.id::text, ',') || ')') AS cmd FROM generate_series(1, 50000) AS g(id) \gexec
I got the plan of this query:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on pgbench_accounts a (cost=44.35..83.96 rows=10 width=97)
Recheck Cond: ((aid = 1) OR (aid = 2) OR (aid = 3) OR (aid = 4) OR (aid = 5) OR (aid = 6) OR (aid = 7) OR (aid = 8) OR (aid = 9) OR (aid = 10))
-> BitmapOr (cost=44.35..44.35 rows=10 width=0)
-> Bitmap Index Scan on pgbench_accounts_pkey (cost=0.00..4.43 rows=1 width=0)
Index Cond: (aid = 1)
-> Bitmap Index Scan on pgbench_accounts_pkey (cost=0.00..4.43 rows=1 width=0)
Index Cond: (aid = 2)
-> Bitmap Index Scan on pgbench_accounts_pkey (cost=0.00..4.43 rows=1 width=0)
Index Cond: (aid = 3)
-> Bitmap Index Scan on pgbench_accounts_pkey (cost=0.00..4.43 rows=1 width=0)
Index Cond: (aid = 4)
-> Bitmap Index Scan on pgbench_accounts_pkey (cost=0.00..4.43 rows=1 width=0)
Index Cond: (aid = 5)
-> Bitmap Index Scan on pgbench_accounts_pkey (cost=0.00..4.43 rows=1 width=0)
Index Cond: (aid = 6)
-> Bitmap Index Scan on pgbench_accounts_pkey (cost=0.00..4.43 rows=1 width=0)
Index Cond: (aid = 7)
-> Bitmap Index Scan on pgbench_accounts_pkey (cost=0.00..4.43 rows=1 width=0)
Index Cond: (aid = 8)
-> Bitmap Index Scan on pgbench_accounts_pkey (cost=0.00..4.43 rows=1 width=0)
Index Cond: (aid = 9)
-> Bitmap Index Scan on pgbench_accounts_pkey (cost=0.00..4.43 rows=1 width=0)
Index Cond: (aid = 10)
If I rewrite this query using ANY operator,
SELECT FORMAT('prepare x %s AS SELECT * FROM pgbench_accounts a WHERE aid = ANY(SELECT g.id FROM generate_series(1, 50000) AS g(id))', '(' || string_agg('int', ',') || ')' ) AS cmd FROM generate_series(1, 50000) AS g(id) \gexec
I will get a plan where the array comparison operator is used through ANY operator at the index scan stage. It's execution time is significantly lower as 339,764 ms.
QUERY PLAN
--------------------------------------------------------------------------------------------------- Index Scan using pgbench_accounts_pkey on pgbench_accounts a (cost=0.42..48.43 rows=10 width=97) Index Cond: (aid = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))
(2 rows)
SELECT FORMAT('prepare x %s AS SELECT * FROM pgbench_accounts a WHERE aid IN(%s)', '(' || string_agg('int', ',') || ')', string_agg(FORMAT('%s', g.id), ', ') ) AS cmd FROM generate_series(1, 50000) AS g(id) \gexec
QUERY PLAN
--------------------------------------------------------------------------------------------------- Index Scan using pgbench_accounts_pkey on pgbench_accounts a (cost=0.42..48.43 rows=10 width=97) Index Cond: (aid = ANY ('{1,2,3,4,5,6,7,8,9,10}'::integer[]))
(2 rows)
On 12/26/15 23:04, Teodor Sigaev wrote:I'd like to present OR-clause support for indexes. Although OR-clauses could be supported by bitmapOR index scan it isn't very effective and such scan lost any order existing in index. We (with Alexander Korotkov) presented results on Vienna's conference this year. In short, it provides performance improvement:I support such a cunning approach. But this specific case, you demonstrated above, could be optimized independently at an earlier stage. If to convert:
EXPLAIN ANALYZE
SELECT count(*) FROM tst WHERE id = 5 OR id = 500 OR id = 5000;
...
The problems on the way which I see for now:
1 Calculating cost. Right now it's just a simple transformation of costs computed for BitmapOr path. I'd like to hope that's possible and so index's estimation function could be non-touched. So, they could believe that all clauses are implicitly-ANDed
2 I'd like to add such support to btree but it seems that it should be a separated patch. Btree search algorithm doesn't use any kind of stack of pages and algorithm to walk over btree doesn't clear for me for now.
3 I could miss some places which still assumes implicitly-ANDed list of clauses although regression tests passes fine.
(F(A) = ConstStableExpr_1) OR (F(A) = ConstStableExpr_2)
to
F(A) IN (ConstStableExpr_1, ConstStableExpr_2)
it can be seen significant execution speedup. For example, using the demo.sql to estimate maximum positive effect we see about 40% of execution and 100% of planning speedup.
To avoid unnecessary overhead, induced by the optimization, such transformation may be made at the stage of planning (we have cardinality estimations and have pruned partitions) but before creation of a relation scan paths. So, we can avoid planning overhead and non-optimal BitmapOr in the case of many OR's possibly aggravated by many indexes on the relation.
For example, such operation can be executed in create_index_paths() before passing rel->indexlist.
-- Alena Rybakina Postgres Professional
pgsql-hackers by date: