Thread: pgsql: Convert 'x IN (VALUES ...)' to 'x = ANY ...' then appropriate

pgsql: Convert 'x IN (VALUES ...)' to 'x = ANY ...' then appropriate

From
Alexander Korotkov
Date:
Convert 'x IN (VALUES ...)' to 'x = ANY ...' then appropriate

This commit implements the automatic conversion of 'x IN (VALUES ...)' into
ScalarArrayOpExpr.  That simplifies the query tree, eliminating the appearance
of an unnecessary join.

Since VALUES describes a relational table, and the value of such a list is
a table row, the optimizer will likely face an underestimation problem due to
the inability to estimate cardinality through MCV statistics.  The cardinality
evaluation mechanism can work with the array inclusion check operation.
If the array is small enough (< 100 elements), it will perform a statistical
evaluation element by element.

We perform the transformation in the convert_ANY_sublink_to_join() if VALUES
RTE is proper and the transformation is convertible.  The conversion is only
possible for operations on scalar values, not rows.  Also, we currently
support the transformation only when it ends up with a constant array.
Otherwise, the evaluation of non-hashed SAOP might be slower than the
corresponding Hash Join with VALUES.

Discussion: https://postgr.es/m/0184212d-1248-4f1f-a42d-f5cb1c1976d2%40tantorlabs.com
Author: Alena Rybakina <a.rybakina@postgrespro.ru>
Author: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Ivan Kush <ivan.kush@tantorlabs.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/c0962a113d1f2f94cb7222a7ca025a67e9ce3860

Modified Files
--------------
src/backend/optimizer/plan/subselect.c    |  80 ++++++++
src/backend/optimizer/prep/prepjointree.c |  12 ++
src/backend/optimizer/util/clauses.c      |  14 +-
src/include/optimizer/subselect.h         |   3 +
src/test/regress/expected/subselect.out   | 308 ++++++++++++++++++++++++++++++
src/test/regress/sql/subselect.sql        | 100 ++++++++++
6 files changed, 512 insertions(+), 5 deletions(-)


Re: pgsql: Convert 'x IN (VALUES ...)' to 'x = ANY ...' then appropriate

From
Melanie Plageman
Date:
On Fri, Apr 4, 2025 at 9:17 AM Alexander Korotkov
<akorotkov@postgresql.org> wrote:
>
> Convert 'x IN (VALUES ...)' to 'x = ANY ...' then appropriate
>
> This commit implements the automatic conversion of 'x IN (VALUES ...)' into
> ScalarArrayOpExpr.  That simplifies the query tree, eliminating the appearance
> of an unnecessary join.

I haven't looked at this patch, but it seems likely that it is related
to the recent failures I noticed in CI in the subselect test when the
regress suite is run by 002_pg_upgrade.pl

diff --strip-trailing-cr -U3
C:/cirrus/src/test/regress/expected/subselect.out
C:/cirrus/build/testrun/pg_upgrade/002_pg_upgrade/data/results/subselect.out
--- C:/cirrus/src/test/regress/expected/subselect.out 2025-04-04
14:44:17.637206600 +0000
+++ C:/cirrus/build/testrun/pg_upgrade/002_pg_upgrade/data/results/subselect.out
2025-04-04 14:47:20.358393500 +0000
@@ -2769,15 +2769,16 @@
 EXPLAIN (COSTS OFF)
 SELECT c.oid,c.relname FROM pg_class c JOIN pg_am a USING (oid)
 WHERE c.oid IN (VALUES (1), (2));
-                          QUERY PLAN
----------------------------------------------------------------
- Hash Join
-   Hash Cond: (a.oid = c.oid)
-   ->  Seq Scan on pg_am a
-   ->  Hash
-         ->  Index Scan using pg_class_oid_index on pg_class c
-               Index Cond: (oid = ANY ('{1,2}'::oid[]))
-(6 rows)
+                       QUERY PLAN
+---------------------------------------------------------
+ Merge Join
+   Merge Cond: (c.oid = a.oid)
+   ->  Index Scan using pg_class_oid_index on pg_class c
+         Index Cond: (oid = ANY ('{1,2}'::oid[]))
+   ->  Sort
+         Sort Key: a.oid
+         ->  Seq Scan on pg_am a
+(7 rows)



Re: pgsql: Convert 'x IN (VALUES ...)' to 'x = ANY ...' then appropriate

From
Melanie Plageman
Date:
On Fri, Apr 4, 2025 at 9:17 AM Alexander Korotkov
<akorotkov@postgresql.org> wrote:
>
> Convert 'x IN (VALUES ...)' to 'x = ANY ...' then appropriate
>
> This commit implements the automatic conversion of 'x IN (VALUES ...)' into
> ScalarArrayOpExpr.  That simplifies the query tree, eliminating the appearance
> of an unnecessary join.

I haven't looked at this patch, but it seems likely that it is related
to the recent failures I noticed in CI in the subselect test when the
regress suite is run by 002_pg_upgrade.pl

diff --strip-trailing-cr -U3
C:/cirrus/src/test/regress/expected/subselect.out
C:/cirrus/build/testrun/pg_upgrade/002_pg_upgrade/data/results/subselect.out
--- C:/cirrus/src/test/regress/expected/subselect.out 2025-04-04
14:44:17.637206600 +0000
+++ C:/cirrus/build/testrun/pg_upgrade/002_pg_upgrade/data/results/subselect.out
2025-04-04 14:47:20.358393500 +0000
@@ -2769,15 +2769,16 @@
 EXPLAIN (COSTS OFF)
 SELECT c.oid,c.relname FROM pg_class c JOIN pg_am a USING (oid)
 WHERE c.oid IN (VALUES (1), (2));
-                          QUERY PLAN
----------------------------------------------------------------
- Hash Join
-   Hash Cond: (a.oid = c.oid)
-   ->  Seq Scan on pg_am a
-   ->  Hash
-         ->  Index Scan using pg_class_oid_index on pg_class c
-               Index Cond: (oid = ANY ('{1,2}'::oid[]))
-(6 rows)
+                       QUERY PLAN
+---------------------------------------------------------
+ Merge Join
+   Merge Cond: (c.oid = a.oid)
+   ->  Index Scan using pg_class_oid_index on pg_class c
+         Index Cond: (oid = ANY ('{1,2}'::oid[]))
+   ->  Sort
+         Sort Key: a.oid
+         ->  Seq Scan on pg_am a
+(7 rows)



Re: pgsql: Convert 'x IN (VALUES ...)' to 'x = ANY ...' then appropriate

From
Alexander Korotkov
Date:
On Fri, Apr 4, 2025 at 6:47 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Fri, Apr 4, 2025 at 9:17 AM Alexander Korotkov
> <akorotkov@postgresql.org> wrote:
> >
> > Convert 'x IN (VALUES ...)' to 'x = ANY ...' then appropriate
> >
> > This commit implements the automatic conversion of 'x IN (VALUES ...)' into
> > ScalarArrayOpExpr.  That simplifies the query tree, eliminating the appearance
> > of an unnecessary join.
>
> I haven't looked at this patch, but it seems likely that it is related
> to the recent failures I noticed in CI in the subselect test when the
> regress suite is run by 002_pg_upgrade.pl


Thank you for pointing.  I'm looking right now.

------
Regards,
Alexander Korotkov
Supabase



Re: pgsql: Convert 'x IN (VALUES ...)' to 'x = ANY ...' then appropriate

From
Alexander Korotkov
Date:
On Fri, Apr 4, 2025 at 6:47 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Fri, Apr 4, 2025 at 9:17 AM Alexander Korotkov
> <akorotkov@postgresql.org> wrote:
> >
> > Convert 'x IN (VALUES ...)' to 'x = ANY ...' then appropriate
> >
> > This commit implements the automatic conversion of 'x IN (VALUES ...)' into
> > ScalarArrayOpExpr.  That simplifies the query tree, eliminating the appearance
> > of an unnecessary join.
>
> I haven't looked at this patch, but it seems likely that it is related
> to the recent failures I noticed in CI in the subselect test when the
> regress suite is run by 002_pg_upgrade.pl


Thank you for pointing.  I'm looking right now.

------
Regards,
Alexander Korotkov
Supabase



On Mon, 7 Apr 2025 at 19:39, Melanie Plageman <melanieplageman@gmail.com> wrote:
> +++ C:/cirrus/build/testrun/pg_upgrade/002_pg_upgrade/data/results/subselect.out
> 2025-04-04 14:47:20.358393500 +0000
> @@ -2769,15 +2769,16 @@
>  EXPLAIN (COSTS OFF)
>  SELECT c.oid,c.relname FROM pg_class c JOIN pg_am a USING (oid)
>  WHERE c.oid IN (VALUES (1), (2));
> -                          QUERY PLAN
> ----------------------------------------------------------------
> - Hash Join
> -   Hash Cond: (a.oid = c.oid)
> -   ->  Seq Scan on pg_am a
> -   ->  Hash
> -         ->  Index Scan using pg_class_oid_index on pg_class c
> -               Index Cond: (oid = ANY ('{1,2}'::oid[]))
> -(6 rows)
> +                       QUERY PLAN
> +---------------------------------------------------------
> + Merge Join
> +   Merge Cond: (c.oid = a.oid)
> +   ->  Index Scan using pg_class_oid_index on pg_class c
> +         Index Cond: (oid = ANY ('{1,2}'::oid[]))
> +   ->  Sort
> +         Sort Key: a.oid
> +         ->  Seq Scan on pg_am a
> +(7 rows)

Are these failures from patches applied to master prior to 3ba2cdaa?

David



On Mon, 7 Apr 2025 at 19:39, Melanie Plageman <melanieplageman@gmail.com> wrote:
> +++ C:/cirrus/build/testrun/pg_upgrade/002_pg_upgrade/data/results/subselect.out
> 2025-04-04 14:47:20.358393500 +0000
> @@ -2769,15 +2769,16 @@
>  EXPLAIN (COSTS OFF)
>  SELECT c.oid,c.relname FROM pg_class c JOIN pg_am a USING (oid)
>  WHERE c.oid IN (VALUES (1), (2));
> -                          QUERY PLAN
> ----------------------------------------------------------------
> - Hash Join
> -   Hash Cond: (a.oid = c.oid)
> -   ->  Seq Scan on pg_am a
> -   ->  Hash
> -         ->  Index Scan using pg_class_oid_index on pg_class c
> -               Index Cond: (oid = ANY ('{1,2}'::oid[]))
> -(6 rows)
> +                       QUERY PLAN
> +---------------------------------------------------------
> + Merge Join
> +   Merge Cond: (c.oid = a.oid)
> +   ->  Index Scan using pg_class_oid_index on pg_class c
> +         Index Cond: (oid = ANY ('{1,2}'::oid[]))
> +   ->  Sort
> +         Sort Key: a.oid
> +         ->  Seq Scan on pg_am a
> +(7 rows)

Are these failures from patches applied to master prior to 3ba2cdaa?

David



Re: pgsql: Convert 'x IN (VALUES ...)' to 'x = ANY ...' then appropriate

From
Alexander Korotkov
Date:
Hi, David!

On Mon, Apr 7, 2025 at 12:15 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Mon, 7 Apr 2025 at 19:39, Melanie Plageman <melanieplageman@gmail.com> wrote:
> > +++ C:/cirrus/build/testrun/pg_upgrade/002_pg_upgrade/data/results/subselect.out
> > 2025-04-04 14:47:20.358393500 +0000
> > @@ -2769,15 +2769,16 @@
> >  EXPLAIN (COSTS OFF)
> >  SELECT c.oid,c.relname FROM pg_class c JOIN pg_am a USING (oid)
> >  WHERE c.oid IN (VALUES (1), (2));
> > -                          QUERY PLAN
> > ----------------------------------------------------------------
> > - Hash Join
> > -   Hash Cond: (a.oid = c.oid)
> > -   ->  Seq Scan on pg_am a
> > -   ->  Hash
> > -         ->  Index Scan using pg_class_oid_index on pg_class c
> > -               Index Cond: (oid = ANY ('{1,2}'::oid[]))
> > -(6 rows)
> > +                       QUERY PLAN
> > +---------------------------------------------------------
> > + Merge Join
> > +   Merge Cond: (c.oid = a.oid)
> > +   ->  Index Scan using pg_class_oid_index on pg_class c
> > +         Index Cond: (oid = ANY ('{1,2}'::oid[]))
> > +   ->  Sort
> > +         Sort Key: a.oid
> > +         ->  Seq Scan on pg_am a
> > +(7 rows)
>
> Are these failures from patches applied to master prior to 3ba2cdaa?

Yes, these failures appears before 3ba2cdaa.  Tom committed 3ba2cdaa
to fix the problem before I get into it [1].

Links.
1. https://www.postgresql.org/message-id/srnuqlttuimzmvoulhsrbgvj4vnul6b65osswvua7sfkqsvmuy%40yg7apybpxp34

------
Regards,
Alexander Korotkov
Supabase



Re: pgsql: Convert 'x IN (VALUES ...)' to 'x = ANY ...' then appropriate

From
Alexander Korotkov
Date:
Hi, David!

On Mon, Apr 7, 2025 at 12:15 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Mon, 7 Apr 2025 at 19:39, Melanie Plageman <melanieplageman@gmail.com> wrote:
> > +++ C:/cirrus/build/testrun/pg_upgrade/002_pg_upgrade/data/results/subselect.out
> > 2025-04-04 14:47:20.358393500 +0000
> > @@ -2769,15 +2769,16 @@
> >  EXPLAIN (COSTS OFF)
> >  SELECT c.oid,c.relname FROM pg_class c JOIN pg_am a USING (oid)
> >  WHERE c.oid IN (VALUES (1), (2));
> > -                          QUERY PLAN
> > ----------------------------------------------------------------
> > - Hash Join
> > -   Hash Cond: (a.oid = c.oid)
> > -   ->  Seq Scan on pg_am a
> > -   ->  Hash
> > -         ->  Index Scan using pg_class_oid_index on pg_class c
> > -               Index Cond: (oid = ANY ('{1,2}'::oid[]))
> > -(6 rows)
> > +                       QUERY PLAN
> > +---------------------------------------------------------
> > + Merge Join
> > +   Merge Cond: (c.oid = a.oid)
> > +   ->  Index Scan using pg_class_oid_index on pg_class c
> > +         Index Cond: (oid = ANY ('{1,2}'::oid[]))
> > +   ->  Sort
> > +         Sort Key: a.oid
> > +         ->  Seq Scan on pg_am a
> > +(7 rows)
>
> Are these failures from patches applied to master prior to 3ba2cdaa?

Yes, these failures appears before 3ba2cdaa.  Tom committed 3ba2cdaa
to fix the problem before I get into it [1].

Links.
1. https://www.postgresql.org/message-id/srnuqlttuimzmvoulhsrbgvj4vnul6b65osswvua7sfkqsvmuy%40yg7apybpxp34

------
Regards,
Alexander Korotkov
Supabase



Re: pgsql: Convert 'x IN (VALUES ...)' to 'x = ANY ...' then appropriate

From
Melanie Plageman
Date:


On Mon, Apr 7, 2025 at 5:15 AM David Rowley <dgrowleyml@gmail.com> wrote:
Are these failures from patches applied to master prior to 3ba2cdaa?

Yea, my email was held in moderation for days. I guess cross-posting is flagged. I thought I saw people regularly cc pgsql-hackers when replying to pgsql-committers, but I guess not.

Re: pgsql: Convert 'x IN (VALUES ...)' to 'x = ANY ...' then appropriate

From
Melanie Plageman
Date:


On Mon, Apr 7, 2025 at 5:15 AM David Rowley <dgrowleyml@gmail.com> wrote:
Are these failures from patches applied to master prior to 3ba2cdaa?

Yea, my email was held in moderation for days. I guess cross-posting is flagged. I thought I saw people regularly cc pgsql-hackers when replying to pgsql-committers, but I guess not.