Thread: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Hi! On 03.10.2024 22:52, Ivan Kush wrote: > > Hello, hackers! I with my friends propose the patch to replace IN > VALUES to ANY in WHERE clauses. > > # Intro > > The `VALUES` in the `IN VALUES` construct is replaced with with an > array of values when `VALUES` contains 1 column. In the end it will be > replaced with ANY by the existing function makeA_Expr > (src/backend/nodes/makefuncs.c) > > This improves performance, especially if the values are small. > > # Patch > > v1-in_values_to_array.patch > > # How realized > > `VALUES` statement corresponds to `values_clause` nonterminal symbol > in gram.y, where it's parsed to `SelectStmt` node. > > `IN` is parsed in `a_expr` symbol. When it contains `VALUES` with 1 > column, parser extracts data from `SelectStmt` and passes it > > to function call `makeSimpleA_Expr` where simple `A_Expr` is created. > > Later during optimizations of parser tree this `A_Expr` will be > transformed to `ArrayExpr` (already realized in Postgres) > > > # Authors. > Author: Ivan Kush <ivan.kush@tantorlabs.com> > Author: Vadim Yacenko <vadim.yacenko@tantorlabs.com> > Author: Alexander Simonov <alexander.simonov@tantorlabs.com> > > # Tests > Implementation contains many regression tests of varying complexity, > which check supported features. > > # Platform > This patch was checkouted from tag REL_17_STABLE. Code is developed in > Linux, doesn't contain platfrom-specific code, only Postgres internal > data structures and functions. > > # Documentation > Regression tests contain many examples > > # Performance > It increases performance > > # Example > Let's compare result. With path the execution time is significantly > lower. > > We have a table table1 with 10000 rows. > > postgres=# \d table1; > Table "public.table1" > Column | Type | Collation | Nullable | Default > --------+-----------------------------+-----------+----------+--------- > fld1 | timestamp without time zone | | not null | > fld2 | bytea | | not null | > Indexes: > "table1index" btree (fld2) > > Let's execute several commands > see commands.sql > > Plan no patch > see plan_no_patch.txt > > > Plan with patch > see plan_with_patch.txt I think you should think about putting these constants in ANY Array EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) select * from t where x in (VALUES(1200), (1)); QUERY PLAN --------------------------------------------------- Seq Scan on t (actual rows=1 loops=1) Filter: (x = ANY ('{1200,1}'::integer[])) (3 rows) Anlrey Lepikhov and I recently described this in an article [0] here and the implementation already exists, but for now it was posted a binary application for testing. The acceleration is significant I agree. [0] https://danolivo.substack.com/p/7456653e-9716-4e91-ad09-83737784c665 -- Regards, Alena Rybakina Postgres Professional
On Thu, 2024-10-03 at 23:10 +0300, Alena Rybakina wrote: > On 03.10.2024 22:52, Ivan Kush wrote: > > > > Hello, hackers! I with my friends propose the patch to replace IN > > VALUES to ANY in WHERE clauses. > > > > # Intro > > > > The `VALUES` in the `IN VALUES` construct is replaced with with an > > array of values when `VALUES` contains 1 column. In the end it will be > > replaced with ANY by the existing function makeA_Expr > > (src/backend/nodes/makefuncs.c) > > > > This improves performance, especially if the values are small. > > Anlrey Lepikhov and I recently described this in an article [0] here and > the implementation already exists, but for now it was posted a binary > application for testing. The acceleration is significant I agree. > > [0] https://danolivo.substack.com/p/7456653e-9716-4e91-ad09-83737784c665 I believe that the speed improvement is significant, but who writes a query like ... WHERE col IN (VALUES (1), (2), (3)) when they could write the much shorter ... WHERE col IN (1, 2, 3) which is already converted to "= ANY"? I wonder if it is worth the extra planning time to detect and improve such queries. Yours, Laurenz Albe
Laurenz Albe <laurenz.albe@cybertec.at> writes: > I wonder if it is worth the extra planning time to detect and improve > such queries. I'm skeptical too. I'm *very* skeptical of implementing it in the grammar as shown here --- I'd go so far as to say that that approach cannot be accepted. That's far too early, and it risks all sorts of problems. An example is that the code as given seems to assume that all the sublists are the same length ... but we haven't checked that yet. I also suspect that this does not behave the same as the original construct for purposes like resolving dissimilar types in the VALUES list. (In an ideal world, perhaps it'd behave the same, but that ship sailed a couple decades ago.) regards, tom lane
Do you mean, that I should try to execute such command? In this patch it gives ANY postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) select * from table1 where fld2 in (VALUES('\\230\\211\\030f\\332\\261R\\333\\021\\356\\337z5\\336\\032\\372'::bytea), ('\\235\\204 \\004\\017\\353\\301\\200\\021\\355a&d}\\245\\312'::byte a)); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------- Bitmap Heap Scan on table1 (actual rows=0 loops=1) Recheck Cond: (fld2 = ANY ('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c3335365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c30303 45c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[])) -> Bitmap Index Scan on table1index (actual rows=0 loops=1) Index Cond: (fld2 = ANY ('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c3335365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c3 030345c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[])) (4 rows) Do you plan to send your implementation to the hackers? On 10/3/24 23:10, Alena Rybakina wrote: > I think you should think about putting these constants in ANY Array > > EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) > select * from t > where x in (VALUES(1200), (1)); > QUERY PLAN > --------------------------------------------------- > Seq Scan on t (actual rows=1 loops=1) > Filter: (x = ANY ('{1200,1}'::integer[])) > (3 rows) -- Best wishes, Ivan Kush Tantor Labs LLC
Some ORMs or proprietary software may write it mistakenly. In these cases this idea may be helpful. This patch contains GUC to enable/disable this optimization On 10/3/24 23:19, Laurenz Albe wrote: > On Thu, 2024-10-03 at 23:10 +0300, Alena Rybakina wrote: >> On 03.10.2024 22:52, Ivan Kush wrote: >>> Hello, hackers! I with my friends propose the patch to replace IN >>> VALUES to ANY in WHERE clauses. >>> >>> # Intro >>> >>> The `VALUES` in the `IN VALUES` construct is replaced with with an >>> array of values when `VALUES` contains 1 column. In the end it will be >>> replaced with ANY by the existing function makeA_Expr >>> (src/backend/nodes/makefuncs.c) >>> >>> This improves performance, especially if the values are small. >> Anlrey Lepikhov and I recently described this in an article [0] here and >> the implementation already exists, but for now it was posted a binary >> application for testing. The acceleration is significant I agree. >> >> [0] https://danolivo.substack.com/p/7456653e-9716-4e91-ad09-83737784c665 > I believe that the speed improvement is significant, but who writes a > query like > > ... WHERE col IN (VALUES (1), (2), (3)) > > when they could write the much shorter > > ... WHERE col IN (1, 2, 3) > > which is already converted to "= ANY"? > > I wonder if it is worth the extra planning time to detect and improve > such queries. > > Yours, > Laurenz Albe -- Best wishes, Ivan Kush Tantor Labs LLC
On 04.10.2024 11:43, Ivan Kush wrote: > Do you mean, that I should try to execute such command? > > In this patch it gives ANY > > postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) > select * from table1 > where fld2 in > (VALUES('\\230\\211\\030f\\332\\261R\\333\\021\\356\\337z5\\336\\032\\372'::bytea), > ('\\235\\204 \\004\\017\\353\\301\\200\\021\\355a&d}\\245\\312'::byte > a)); > QUERY > PLAN > > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > -------------------------------------------------------------------------------------------- > > Bitmap Heap Scan on table1 (actual rows=0 loops=1) > Recheck Cond: (fld2 = ANY > ('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c3335365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c30303 > 45c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[])) > > -> Bitmap Index Scan on table1index (actual rows=0 loops=1) > Index Cond: (fld2 = ANY > ('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c3335365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c3 > 030345c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[])) > > (4 rows) Yes I meant it. > > Do you plan to send your implementation to the hackers? > It was sent here [0]. [0] https://www.postgresql.org/message-id/21d5fca5-0c02-4afd-8c98-d0930b298a8d%40gmail.com -- Regards, Alena Rybakina Postgres Professional
I agree, your realization is better: reliability is better and debugging is simplier. I've looked at the code, looks good to me. Only style notes like VTA/VtA, SELECT/select, etc. may be corrected On 10/4/24 12:15, Alena Rybakina wrote: > > It was sent here [0]. > > [0] > https://www.postgresql.org/message-id/21d5fca5-0c02-4afd-8c98-d0930b298a8d%40gmail.com > -- Best wishes, Ivan Kush Tantor Labs LLC
Hi, Alena!
On Thu, Jan 9, 2025 at 3:11 PM Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
> On 04.10.2024 12:05, Andrei Lepikhov wrote:
> > We also have an implementation of VALUES -> ARRAY transformation.
> > Because enterprises must deal with users' problems, many of these
> > users employ automatically generated queries.
> > Being informed very well of the consensus about that stuff, we've
> > designed it as a library. But, looking into the code now, I see that
> > it only needs a few cycles if no one 'x IN VALUES' expression is
> > presented in the query. Who knows? It may be OK for the core.
> > So, I've rewritten the code into the patch - see it in the attachment.
> >
> > The idea is quite simple - at the same place as
> > convert_ANY_sublink_to_join, we can test the SubLink on proper VALUES
> > RTE and perform the transformation if it's convertible.
>
> I updated the patch due to the problem with the coercion types for both
> sides of the expression.
>
> We must find a common type for both leftop of the expression and rightop
> including constants for correct transformation, and at the same time
> check that the resulting types are compatible.
>
> To do this we find an operator for the two input types if it is
> possible, and also remember the target types for the left and right
> sides, and after that make a coercion.
>
> This processing is only needed in cases where we are not working with
> parameters since the final type is not specified for the parameters.
I took a look at this patch.
+ /* TODO: remember parameters */
What was intended to do here?
On Thu, Jan 9, 2025 at 3:11 PM Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
> On 04.10.2024 12:05, Andrei Lepikhov wrote:
> > We also have an implementation of VALUES -> ARRAY transformation.
> > Because enterprises must deal with users' problems, many of these
> > users employ automatically generated queries.
> > Being informed very well of the consensus about that stuff, we've
> > designed it as a library. But, looking into the code now, I see that
> > it only needs a few cycles if no one 'x IN VALUES' expression is
> > presented in the query. Who knows? It may be OK for the core.
> > So, I've rewritten the code into the patch - see it in the attachment.
> >
> > The idea is quite simple - at the same place as
> > convert_ANY_sublink_to_join, we can test the SubLink on proper VALUES
> > RTE and perform the transformation if it's convertible.
>
> I updated the patch due to the problem with the coercion types for both
> sides of the expression.
>
> We must find a common type for both leftop of the expression and rightop
> including constants for correct transformation, and at the same time
> check that the resulting types are compatible.
>
> To do this we find an operator for the two input types if it is
> possible, and also remember the target types for the left and right
> sides, and after that make a coercion.
>
> This processing is only needed in cases where we are not working with
> parameters since the final type is not specified for the parameters.
I took a look at this patch.
+ /* TODO: remember parameters */
What was intended to do here?
Also, aren't we too restrictive while requiring is_simple_values_sequence()?
For instance, I believe cases like this (containing Var) could be transformed too.
select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), (1)));
------
Regards,
Alexander Korotkov
Supabase
------
Regards,
Alexander Korotkov
Supabase
Hi!
On 09.02.2025 18:38, Alexander Korotkov wrote:
I'm still working on it.Also, aren't we too restrictive while requiring is_simple_values_sequence()? For instance, I believe cases like this (containing Var) could be transformed too. select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), (1)));
Also, I think there is quite a code duplication about construction of SAOP between match_orclause_to_indexcol() and convert_VALUES_to_ANY() functions. I would like to see a refactoring as a separate first patch, which extracts the common part into a function.
Done.
I have attached a patch. In addition to the transfer, I added the process of searching for a suitable operator and type for the left expression for input expressions: const and left expression, since they may differ from the declared types. Additionally, we convert the left expr to a type suitable for the found operator.
-- Regards, Alena Rybakina Postgres Professional