Thread: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Hi! On 03.10.2024 22:52, Ivan Kush wrote: > > Hello, hackers! I with my friends propose the patch to replace IN > VALUES to ANY in WHERE clauses. > > # Intro > > The `VALUES` in the `IN VALUES` construct is replaced with with an > array of values when `VALUES` contains 1 column. In the end it will be > replaced with ANY by the existing function makeA_Expr > (src/backend/nodes/makefuncs.c) > > This improves performance, especially if the values are small. > > # Patch > > v1-in_values_to_array.patch > > # How realized > > `VALUES` statement corresponds to `values_clause` nonterminal symbol > in gram.y, where it's parsed to `SelectStmt` node. > > `IN` is parsed in `a_expr` symbol. When it contains `VALUES` with 1 > column, parser extracts data from `SelectStmt` and passes it > > to function call `makeSimpleA_Expr` where simple `A_Expr` is created. > > Later during optimizations of parser tree this `A_Expr` will be > transformed to `ArrayExpr` (already realized in Postgres) > > > # Authors. > Author: Ivan Kush <ivan.kush@tantorlabs.com> > Author: Vadim Yacenko <vadim.yacenko@tantorlabs.com> > Author: Alexander Simonov <alexander.simonov@tantorlabs.com> > > # Tests > Implementation contains many regression tests of varying complexity, > which check supported features. > > # Platform > This patch was checkouted from tag REL_17_STABLE. Code is developed in > Linux, doesn't contain platfrom-specific code, only Postgres internal > data structures and functions. > > # Documentation > Regression tests contain many examples > > # Performance > It increases performance > > # Example > Let's compare result. With path the execution time is significantly > lower. > > We have a table table1 with 10000 rows. > > postgres=# \d table1; > Table "public.table1" > Column | Type | Collation | Nullable | Default > --------+-----------------------------+-----------+----------+--------- > fld1 | timestamp without time zone | | not null | > fld2 | bytea | | not null | > Indexes: > "table1index" btree (fld2) > > Let's execute several commands > see commands.sql > > Plan no patch > see plan_no_patch.txt > > > Plan with patch > see plan_with_patch.txt I think you should think about putting these constants in ANY Array EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) select * from t where x in (VALUES(1200), (1)); QUERY PLAN --------------------------------------------------- Seq Scan on t (actual rows=1 loops=1) Filter: (x = ANY ('{1200,1}'::integer[])) (3 rows) Anlrey Lepikhov and I recently described this in an article [0] here and the implementation already exists, but for now it was posted a binary application for testing. The acceleration is significant I agree. [0] https://danolivo.substack.com/p/7456653e-9716-4e91-ad09-83737784c665 -- Regards, Alena Rybakina Postgres Professional
On Thu, 2024-10-03 at 23:10 +0300, Alena Rybakina wrote: > On 03.10.2024 22:52, Ivan Kush wrote: > > > > Hello, hackers! I with my friends propose the patch to replace IN > > VALUES to ANY in WHERE clauses. > > > > # Intro > > > > The `VALUES` in the `IN VALUES` construct is replaced with with an > > array of values when `VALUES` contains 1 column. In the end it will be > > replaced with ANY by the existing function makeA_Expr > > (src/backend/nodes/makefuncs.c) > > > > This improves performance, especially if the values are small. > > Anlrey Lepikhov and I recently described this in an article [0] here and > the implementation already exists, but for now it was posted a binary > application for testing. The acceleration is significant I agree. > > [0] https://danolivo.substack.com/p/7456653e-9716-4e91-ad09-83737784c665 I believe that the speed improvement is significant, but who writes a query like ... WHERE col IN (VALUES (1), (2), (3)) when they could write the much shorter ... WHERE col IN (1, 2, 3) which is already converted to "= ANY"? I wonder if it is worth the extra planning time to detect and improve such queries. Yours, Laurenz Albe
Laurenz Albe <laurenz.albe@cybertec.at> writes: > I wonder if it is worth the extra planning time to detect and improve > such queries. I'm skeptical too. I'm *very* skeptical of implementing it in the grammar as shown here --- I'd go so far as to say that that approach cannot be accepted. That's far too early, and it risks all sorts of problems. An example is that the code as given seems to assume that all the sublists are the same length ... but we haven't checked that yet. I also suspect that this does not behave the same as the original construct for purposes like resolving dissimilar types in the VALUES list. (In an ideal world, perhaps it'd behave the same, but that ship sailed a couple decades ago.) regards, tom lane
Do you mean, that I should try to execute such command? In this patch it gives ANY postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) select * from table1 where fld2 in (VALUES('\\230\\211\\030f\\332\\261R\\333\\021\\356\\337z5\\336\\032\\372'::bytea), ('\\235\\204 \\004\\017\\353\\301\\200\\021\\355a&d}\\245\\312'::byte a)); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------- Bitmap Heap Scan on table1 (actual rows=0 loops=1) Recheck Cond: (fld2 = ANY ('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c3335365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c30303 45c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[])) -> Bitmap Index Scan on table1index (actual rows=0 loops=1) Index Cond: (fld2 = ANY ('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c3335365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c3 030345c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[])) (4 rows) Do you plan to send your implementation to the hackers? On 10/3/24 23:10, Alena Rybakina wrote: > I think you should think about putting these constants in ANY Array > > EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) > select * from t > where x in (VALUES(1200), (1)); > QUERY PLAN > --------------------------------------------------- > Seq Scan on t (actual rows=1 loops=1) > Filter: (x = ANY ('{1200,1}'::integer[])) > (3 rows) -- Best wishes, Ivan Kush Tantor Labs LLC
Some ORMs or proprietary software may write it mistakenly. In these cases this idea may be helpful. This patch contains GUC to enable/disable this optimization On 10/3/24 23:19, Laurenz Albe wrote: > On Thu, 2024-10-03 at 23:10 +0300, Alena Rybakina wrote: >> On 03.10.2024 22:52, Ivan Kush wrote: >>> Hello, hackers! I with my friends propose the patch to replace IN >>> VALUES to ANY in WHERE clauses. >>> >>> # Intro >>> >>> The `VALUES` in the `IN VALUES` construct is replaced with with an >>> array of values when `VALUES` contains 1 column. In the end it will be >>> replaced with ANY by the existing function makeA_Expr >>> (src/backend/nodes/makefuncs.c) >>> >>> This improves performance, especially if the values are small. >> Anlrey Lepikhov and I recently described this in an article [0] here and >> the implementation already exists, but for now it was posted a binary >> application for testing. The acceleration is significant I agree. >> >> [0] https://danolivo.substack.com/p/7456653e-9716-4e91-ad09-83737784c665 > I believe that the speed improvement is significant, but who writes a > query like > > ... WHERE col IN (VALUES (1), (2), (3)) > > when they could write the much shorter > > ... WHERE col IN (1, 2, 3) > > which is already converted to "= ANY"? > > I wonder if it is worth the extra planning time to detect and improve > such queries. > > Yours, > Laurenz Albe -- Best wishes, Ivan Kush Tantor Labs LLC
On 04.10.2024 11:43, Ivan Kush wrote: > Do you mean, that I should try to execute such command? > > In this patch it gives ANY > > postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) > select * from table1 > where fld2 in > (VALUES('\\230\\211\\030f\\332\\261R\\333\\021\\356\\337z5\\336\\032\\372'::bytea), > ('\\235\\204 \\004\\017\\353\\301\\200\\021\\355a&d}\\245\\312'::byte > a)); > QUERY > PLAN > > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > -------------------------------------------------------------------------------------------- > > Bitmap Heap Scan on table1 (actual rows=0 loops=1) > Recheck Cond: (fld2 = ANY > ('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c3335365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c30303 > 45c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[])) > > -> Bitmap Index Scan on table1index (actual rows=0 loops=1) > Index Cond: (fld2 = ANY > ('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c3335365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c3 > 030345c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[])) > > (4 rows) Yes I meant it. > > Do you plan to send your implementation to the hackers? > It was sent here [0]. [0] https://www.postgresql.org/message-id/21d5fca5-0c02-4afd-8c98-d0930b298a8d%40gmail.com -- Regards, Alena Rybakina Postgres Professional