Thread: Re: Replace IN VALUES with ANY in WHERE clauses during optimization

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

From
Alena Rybakina
Date:
Hi!

On 03.10.2024 22:52, Ivan Kush wrote:
>
> Hello, hackers! I with my friends propose the patch to replace IN 
> VALUES to ANY in WHERE clauses.
>
> # Intro
>
> The `VALUES` in the `IN VALUES` construct is replaced with with an 
> array of values when `VALUES` contains 1 column. In the end it will be 
> replaced with ANY by the existing function makeA_Expr 
> (src/backend/nodes/makefuncs.c)
>
> This improves performance, especially if the values are small.
>
> # Patch
>
> v1-in_values_to_array.patch
>
> # How realized
>
> `VALUES` statement corresponds to `values_clause` nonterminal symbol 
> in gram.y, where it's parsed to `SelectStmt` node.
>
> `IN` is parsed in `a_expr` symbol. When it contains `VALUES` with 1 
> column, parser extracts data from `SelectStmt` and passes it
>
> to function call `makeSimpleA_Expr` where simple `A_Expr` is created.
>
> Later during optimizations of parser tree this `A_Expr` will be 
> transformed to `ArrayExpr` (already realized in Postgres)
>
>
> # Authors.
> Author: Ivan Kush <ivan.kush@tantorlabs.com>
> Author: Vadim Yacenko <vadim.yacenko@tantorlabs.com>
> Author: Alexander Simonov <alexander.simonov@tantorlabs.com>
>
> # Tests
> Implementation contains many regression tests of varying complexity, 
> which check supported features.
>
> # Platform
> This patch was checkouted from tag REL_17_STABLE. Code is developed in 
> Linux, doesn't contain platfrom-specific code, only Postgres internal 
> data structures and functions.
>
> # Documentation
> Regression tests contain many examples
>
> # Performance
> It increases performance
>
> # Example
> Let's compare result. With path the execution time is significantly 
> lower.
>
> We have a table table1 with 10000 rows.
>
> postgres=# \d table1;
>                          Table "public.table1"
>  Column |            Type             | Collation | Nullable | Default
> --------+-----------------------------+-----------+----------+---------
>  fld1   | timestamp without time zone |           | not null |
>  fld2   | bytea                       |           | not null |
> Indexes:
>     "table1index" btree (fld2)
>
> Let's execute several commands
> see commands.sql
>
> Plan no patch
> see plan_no_patch.txt
>
>
> Plan with patch
> see plan_with_patch.txt

I think you should think about putting these constants in ANY Array

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
select * from t
     where x in (VALUES(1200), (1));
                     QUERY PLAN
---------------------------------------------------
  Seq Scan on t (actual rows=1 loops=1)
    Filter: (x = ANY ('{1200,1}'::integer[]))
(3 rows)

Anlrey Lepikhov and I recently described this in an article [0] here and 
the implementation already exists, but for now it was posted a binary 
application for testing. The acceleration is significant I agree.

[0] https://danolivo.substack.com/p/7456653e-9716-4e91-ad09-83737784c665

-- 
Regards,
Alena Rybakina
Postgres Professional




Re: Replace IN VALUES with ANY in WHERE clauses during optimization

From
Laurenz Albe
Date:
On Thu, 2024-10-03 at 23:10 +0300, Alena Rybakina wrote:
> On 03.10.2024 22:52, Ivan Kush wrote:
> >
> > Hello, hackers! I with my friends propose the patch to replace IN
> > VALUES to ANY in WHERE clauses.
> >
> > # Intro
> >
> > The `VALUES` in the `IN VALUES` construct is replaced with with an
> > array of values when `VALUES` contains 1 column. In the end it will be
> > replaced with ANY by the existing function makeA_Expr
> > (src/backend/nodes/makefuncs.c)
> >
> > This improves performance, especially if the values are small.
>
> Anlrey Lepikhov and I recently described this in an article [0] here and
> the implementation already exists, but for now it was posted a binary
> application for testing. The acceleration is significant I agree.
>
> [0] https://danolivo.substack.com/p/7456653e-9716-4e91-ad09-83737784c665

I believe that the speed improvement is significant, but who writes a
query like

  ... WHERE col IN (VALUES (1), (2), (3))

when they could write the much shorter

  ... WHERE col IN (1, 2, 3)

which is already converted to "= ANY"?

I wonder if it is worth the extra planning time to detect and improve
such queries.

Yours,
Laurenz Albe



Re: Replace IN VALUES with ANY in WHERE clauses during optimization

From
Tom Lane
Date:
Laurenz Albe <laurenz.albe@cybertec.at> writes:
> I wonder if it is worth the extra planning time to detect and improve
> such queries.

I'm skeptical too.  I'm *very* skeptical of implementing it in the
grammar as shown here --- I'd go so far as to say that that approach
cannot be accepted.  That's far too early, and it risks all sorts
of problems.  An example is that the code as given seems to assume
that all the sublists are the same length ... but we haven't checked
that yet.  I also suspect that this does not behave the same as the
original construct for purposes like resolving dissimilar types in
the VALUES list.  (In an ideal world, perhaps it'd behave the same,
but that ship sailed a couple decades ago.)

            regards, tom lane



Re: Replace IN VALUES with ANY in WHERE clauses during optimization

From
Ivan Kush
Date:
Do you mean, that I should try to execute such command?

In this patch it gives ANY

postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
select * from table1
    where fld2 in 
(VALUES('\\230\\211\\030f\\332\\261R\\333\\021\\356\\337z5\\336\\032\\372'::bytea), 
('\\235\\204 \\004\\017\\353\\301\\200\\021\\355a&d}\\245\\312'::byte
a));

                                                                                                                             QUERY
PLAN


---------------------------------------------------------------------------------------------------------------------------------------------------------------------------


-------------------------------------------------------------------------------------------- 

Bitmap Heap Scan on table1 (actual rows=0 loops=1)
   Recheck Cond: (fld2 = ANY 

('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c3335365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c30303
45c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[])) 

   ->  Bitmap Index Scan on table1index (actual rows=0 loops=1)
         Index Cond: (fld2 = ANY 

('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c3335365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c3
030345c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[])) 

(4 rows)

Do you plan to send your implementation to the hackers?

On 10/3/24 23:10, Alena Rybakina wrote:
> I think you should think about putting these constants in ANY Array
>
> EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
> select * from t
>     where x in (VALUES(1200), (1));
>                     QUERY PLAN
> ---------------------------------------------------
>  Seq Scan on t (actual rows=1 loops=1)
>    Filter: (x = ANY ('{1200,1}'::integer[]))
> (3 rows)

-- 
Best wishes,
Ivan Kush
Tantor Labs LLC




Re: Replace IN VALUES with ANY in WHERE clauses during optimization

From
Ivan Kush
Date:
Some ORMs or proprietary software may write it mistakenly. In these 
cases this idea may be helpful.

This patch contains GUC to enable/disable this optimization

On 10/3/24 23:19, Laurenz Albe wrote:
> On Thu, 2024-10-03 at 23:10 +0300, Alena Rybakina wrote:
>> On 03.10.2024 22:52, Ivan Kush wrote:
>>> Hello, hackers! I with my friends propose the patch to replace IN
>>> VALUES to ANY in WHERE clauses.
>>>
>>> # Intro
>>>
>>> The `VALUES` in the `IN VALUES` construct is replaced with with an
>>> array of values when `VALUES` contains 1 column. In the end it will be
>>> replaced with ANY by the existing function makeA_Expr
>>> (src/backend/nodes/makefuncs.c)
>>>
>>> This improves performance, especially if the values are small.
>> Anlrey Lepikhov and I recently described this in an article [0] here and
>> the implementation already exists, but for now it was posted a binary
>> application for testing. The acceleration is significant I agree.
>>
>> [0] https://danolivo.substack.com/p/7456653e-9716-4e91-ad09-83737784c665
> I believe that the speed improvement is significant, but who writes a
> query like
>
>    ... WHERE col IN (VALUES (1), (2), (3))
>
> when they could write the much shorter
>
>    ... WHERE col IN (1, 2, 3)
>
> which is already converted to "= ANY"?
>
> I wonder if it is worth the extra planning time to detect and improve
> such queries.
>
> Yours,
> Laurenz Albe

-- 
Best wishes,
Ivan Kush
Tantor Labs LLC




Re: Replace IN VALUES with ANY in WHERE clauses during optimization

From
Alena Rybakina
Date:
On 04.10.2024 11:43, Ivan Kush wrote:
> Do you mean, that I should try to execute such command?
>
> In this patch it gives ANY
>
> postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
> select * from table1
>    where fld2 in 
> (VALUES('\\230\\211\\030f\\332\\261R\\333\\021\\356\\337z5\\336\\032\\372'::bytea), 
> ('\\235\\204 \\004\\017\\353\\301\\200\\021\\355a&d}\\245\\312'::byte
> a));
>
                                                                                                                             QUERY

> PLAN
>
>
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>
> -------------------------------------------------------------------------------------------- 
>
> Bitmap Heap Scan on table1 (actual rows=0 loops=1)
>   Recheck Cond: (fld2 = ANY 
>
('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c3335365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c30303
> 45c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[])) 
>
>   ->  Bitmap Index Scan on table1index (actual rows=0 loops=1)
>         Index Cond: (fld2 = ANY 
>
('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c3335365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c3
> 030345c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[])) 
>
> (4 rows)
Yes I meant it.
>
> Do you plan to send your implementation to the hackers?
>
It was sent here [0].

[0] 
https://www.postgresql.org/message-id/21d5fca5-0c02-4afd-8c98-d0930b298a8d%40gmail.com

-- 
Regards,
Alena Rybakina
Postgres Professional




Re: Replace IN VALUES with ANY in WHERE clauses during optimization

From
Ivan Kush
Date:
I agree, your realization is better: reliability is better and debugging 
is simplier.
I've looked at the code, looks good to me. Only style notes like 
VTA/VtA, SELECT/select, etc. may be corrected

On 10/4/24 12:15, Alena Rybakina wrote:
>
> It was sent here [0].
>
> [0] 
> https://www.postgresql.org/message-id/21d5fca5-0c02-4afd-8c98-d0930b298a8d%40gmail.com
>
-- 
Best wishes,
Ivan Kush
Tantor Labs LLC




Re: Replace IN VALUES with ANY in WHERE clauses during optimization

From
Alexander Korotkov
Date:
Hi, Alena!

On Thu, Jan 9, 2025 at 3:11 PM Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
> On 04.10.2024 12:05, Andrei Lepikhov wrote:
> > We also have an implementation of VALUES -> ARRAY transformation.
> > Because enterprises must deal with users' problems, many of these
> > users employ automatically generated queries.
> > Being informed very well of the consensus about that stuff, we've
> > designed it as a library. But, looking into the code now, I see that
> > it only needs a few cycles if no one 'x IN VALUES' expression is
> > presented in the query. Who knows? It may be OK for the core.
> > So, I've rewritten the code into the patch - see it in the attachment.
> >
> > The idea is quite simple - at the same place as
> > convert_ANY_sublink_to_join, we can test the SubLink on proper VALUES
> > RTE and perform the transformation if it's convertible.
>
> I updated the patch due to the problem with the coercion types for both
> sides of the expression.
>
> We must find a common type for both leftop of the expression and rightop
> including constants for correct transformation, and at the same time
> check that the resulting types are compatible.
>
> To do this we find an operator for the two input types if it is
> possible, and also remember the target types for the left and right
> sides, and after that make a coercion.
>
> This processing is only needed in cases where we are not working with
> parameters since the final type is not specified for the parameters.

I took a look at this patch.

+    /* TODO: remember parameters */

What was intended to do here?

Also, aren't we too restrictive while requiring is_simple_values_sequence()?
For instance, I believe cases like this (containing Var) could be transformed too.

select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), (1)));

------
Regards,
Alexander Korotkov
Supabase

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

From
Alena Rybakina
Date:

Hi!

On 09.02.2025 18:38, Alexander Korotkov wrote:
Also, aren't we too restrictive while requiring is_simple_values_sequence()?
For instance, I believe cases like this (containing Var) could be transformed too.

select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), (1)));
I'm still working on it.
Also, I think there is quite a code duplication about construction of
SAOP between match_orclause_to_indexcol() and convert_VALUES_to_ANY()
functions.  I would like to see a refactoring as a separate first
patch, which extracts the common part into a function.

Done.

I have attached a patch. In addition to the transfer, I added the process of searching for a suitable operator and type for the left expression for input expressions: const and left expression, since they may differ from the declared types. Additionally, we convert the left expr to a type suitable for the found operator.

-- 
Regards,
Alena Rybakina
Postgres Professional
Attachment