Thread: Re: Replace IN VALUES with ANY in WHERE clauses during optimization

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

From
Alena Rybakina
Date:
Hi!

On 03.10.2024 22:52, Ivan Kush wrote:
>
> Hello, hackers! I with my friends propose the patch to replace IN 
> VALUES to ANY in WHERE clauses.
>
> # Intro
>
> The `VALUES` in the `IN VALUES` construct is replaced with with an 
> array of values when `VALUES` contains 1 column. In the end it will be 
> replaced with ANY by the existing function makeA_Expr 
> (src/backend/nodes/makefuncs.c)
>
> This improves performance, especially if the values are small.
>
> # Patch
>
> v1-in_values_to_array.patch
>
> # How realized
>
> `VALUES` statement corresponds to `values_clause` nonterminal symbol 
> in gram.y, where it's parsed to `SelectStmt` node.
>
> `IN` is parsed in `a_expr` symbol. When it contains `VALUES` with 1 
> column, parser extracts data from `SelectStmt` and passes it
>
> to function call `makeSimpleA_Expr` where simple `A_Expr` is created.
>
> Later during optimizations of parser tree this `A_Expr` will be 
> transformed to `ArrayExpr` (already realized in Postgres)
>
>
> # Authors.
> Author: Ivan Kush <ivan.kush@tantorlabs.com>
> Author: Vadim Yacenko <vadim.yacenko@tantorlabs.com>
> Author: Alexander Simonov <alexander.simonov@tantorlabs.com>
>
> # Tests
> Implementation contains many regression tests of varying complexity, 
> which check supported features.
>
> # Platform
> This patch was checkouted from tag REL_17_STABLE. Code is developed in 
> Linux, doesn't contain platfrom-specific code, only Postgres internal 
> data structures and functions.
>
> # Documentation
> Regression tests contain many examples
>
> # Performance
> It increases performance
>
> # Example
> Let's compare result. With path the execution time is significantly 
> lower.
>
> We have a table table1 with 10000 rows.
>
> postgres=# \d table1;
>                          Table "public.table1"
>  Column |            Type             | Collation | Nullable | Default
> --------+-----------------------------+-----------+----------+---------
>  fld1   | timestamp without time zone |           | not null |
>  fld2   | bytea                       |           | not null |
> Indexes:
>     "table1index" btree (fld2)
>
> Let's execute several commands
> see commands.sql
>
> Plan no patch
> see plan_no_patch.txt
>
>
> Plan with patch
> see plan_with_patch.txt

I think you should think about putting these constants in ANY Array

EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
select * from t
     where x in (VALUES(1200), (1));
                     QUERY PLAN
---------------------------------------------------
  Seq Scan on t (actual rows=1 loops=1)
    Filter: (x = ANY ('{1200,1}'::integer[]))
(3 rows)

Anlrey Lepikhov and I recently described this in an article [0] here and 
the implementation already exists, but for now it was posted a binary 
application for testing. The acceleration is significant I agree.

[0] https://danolivo.substack.com/p/7456653e-9716-4e91-ad09-83737784c665

-- 
Regards,
Alena Rybakina
Postgres Professional




Re: Replace IN VALUES with ANY in WHERE clauses during optimization

From
Laurenz Albe
Date:
On Thu, 2024-10-03 at 23:10 +0300, Alena Rybakina wrote:
> On 03.10.2024 22:52, Ivan Kush wrote:
> >
> > Hello, hackers! I with my friends propose the patch to replace IN
> > VALUES to ANY in WHERE clauses.
> >
> > # Intro
> >
> > The `VALUES` in the `IN VALUES` construct is replaced with with an
> > array of values when `VALUES` contains 1 column. In the end it will be
> > replaced with ANY by the existing function makeA_Expr
> > (src/backend/nodes/makefuncs.c)
> >
> > This improves performance, especially if the values are small.
>
> Anlrey Lepikhov and I recently described this in an article [0] here and
> the implementation already exists, but for now it was posted a binary
> application for testing. The acceleration is significant I agree.
>
> [0] https://danolivo.substack.com/p/7456653e-9716-4e91-ad09-83737784c665

I believe that the speed improvement is significant, but who writes a
query like

  ... WHERE col IN (VALUES (1), (2), (3))

when they could write the much shorter

  ... WHERE col IN (1, 2, 3)

which is already converted to "= ANY"?

I wonder if it is worth the extra planning time to detect and improve
such queries.

Yours,
Laurenz Albe



Laurenz Albe <laurenz.albe@cybertec.at> writes:
> I wonder if it is worth the extra planning time to detect and improve
> such queries.

I'm skeptical too.  I'm *very* skeptical of implementing it in the
grammar as shown here --- I'd go so far as to say that that approach
cannot be accepted.  That's far too early, and it risks all sorts
of problems.  An example is that the code as given seems to assume
that all the sublists are the same length ... but we haven't checked
that yet.  I also suspect that this does not behave the same as the
original construct for purposes like resolving dissimilar types in
the VALUES list.  (In an ideal world, perhaps it'd behave the same,
but that ship sailed a couple decades ago.)

            regards, tom lane



Do you mean, that I should try to execute such command?

In this patch it gives ANY

postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
select * from table1
    where fld2 in 
(VALUES('\\230\\211\\030f\\332\\261R\\333\\021\\356\\337z5\\336\\032\\372'::bytea), 
('\\235\\204 \\004\\017\\353\\301\\200\\021\\355a&d}\\245\\312'::byte
a));

                                                                                                                             QUERY
PLAN


---------------------------------------------------------------------------------------------------------------------------------------------------------------------------


-------------------------------------------------------------------------------------------- 

Bitmap Heap Scan on table1 (actual rows=0 loops=1)
   Recheck Cond: (fld2 = ANY 

('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c3335365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c30303
45c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[])) 

   ->  Bitmap Index Scan on table1index (actual rows=0 loops=1)
         Index Cond: (fld2 = ANY 

('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c3335365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c3
030345c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[])) 

(4 rows)

Do you plan to send your implementation to the hackers?

On 10/3/24 23:10, Alena Rybakina wrote:
> I think you should think about putting these constants in ANY Array
>
> EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
> select * from t
>     where x in (VALUES(1200), (1));
>                     QUERY PLAN
> ---------------------------------------------------
>  Seq Scan on t (actual rows=1 loops=1)
>    Filter: (x = ANY ('{1200,1}'::integer[]))
> (3 rows)

-- 
Best wishes,
Ivan Kush
Tantor Labs LLC




Some ORMs or proprietary software may write it mistakenly. In these 
cases this idea may be helpful.

This patch contains GUC to enable/disable this optimization

On 10/3/24 23:19, Laurenz Albe wrote:
> On Thu, 2024-10-03 at 23:10 +0300, Alena Rybakina wrote:
>> On 03.10.2024 22:52, Ivan Kush wrote:
>>> Hello, hackers! I with my friends propose the patch to replace IN
>>> VALUES to ANY in WHERE clauses.
>>>
>>> # Intro
>>>
>>> The `VALUES` in the `IN VALUES` construct is replaced with with an
>>> array of values when `VALUES` contains 1 column. In the end it will be
>>> replaced with ANY by the existing function makeA_Expr
>>> (src/backend/nodes/makefuncs.c)
>>>
>>> This improves performance, especially if the values are small.
>> Anlrey Lepikhov and I recently described this in an article [0] here and
>> the implementation already exists, but for now it was posted a binary
>> application for testing. The acceleration is significant I agree.
>>
>> [0] https://danolivo.substack.com/p/7456653e-9716-4e91-ad09-83737784c665
> I believe that the speed improvement is significant, but who writes a
> query like
>
>    ... WHERE col IN (VALUES (1), (2), (3))
>
> when they could write the much shorter
>
>    ... WHERE col IN (1, 2, 3)
>
> which is already converted to "= ANY"?
>
> I wonder if it is worth the extra planning time to detect and improve
> such queries.
>
> Yours,
> Laurenz Albe

-- 
Best wishes,
Ivan Kush
Tantor Labs LLC




Re: Replace IN VALUES with ANY in WHERE clauses during optimization

From
Alena Rybakina
Date:
On 04.10.2024 11:43, Ivan Kush wrote:
> Do you mean, that I should try to execute such command?
>
> In this patch it gives ANY
>
> postgres=# EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF)
> select * from table1
>    where fld2 in 
> (VALUES('\\230\\211\\030f\\332\\261R\\333\\021\\356\\337z5\\336\\032\\372'::bytea), 
> ('\\235\\204 \\004\\017\\353\\301\\200\\021\\355a&d}\\245\\312'::byte
> a));
>
                                                                                                                             QUERY

> PLAN
>
>
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>
> -------------------------------------------------------------------------------------------- 
>
> Bitmap Heap Scan on table1 (actual rows=0 loops=1)
>   Recheck Cond: (fld2 = ANY 
>
('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c3335365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c30303
> 45c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[])) 
>
>   ->  Bitmap Index Scan on table1index (actual rows=0 loops=1)
>         Index Cond: (fld2 = ANY 
>
('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c3335365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c3
> 030345c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[])) 
>
> (4 rows)
Yes I meant it.
>
> Do you plan to send your implementation to the hackers?
>
It was sent here [0].

[0] 
https://www.postgresql.org/message-id/21d5fca5-0c02-4afd-8c98-d0930b298a8d%40gmail.com

-- 
Regards,
Alena Rybakina
Postgres Professional