Home > mailing lists

Re: Is the plan for IN(1,2,3) always the same as for =ANY('{1,2,3}') when using PQexec with no params? - Mailing list pgsql-hackers

From	Dmitry Koterov
Subject	Re: Is the plan for IN(1,2,3) always the same as for =ANY('{1,2,3}') when using PQexec with no params?
Date	December 7, 2022 04:45:30
Msg-id	CA+CZih6iihRYNV-xZbwUkKYV21WqJBVHVMpAzSzVX=m7-6WGrQ@mail.gmail.com Whole thread Raw
In response to	Re: Is the plan for IN(1,2,3) always the same as for =ANY('{1,2,3}') when using PQexec with no params? (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

Thanks Tom!

It sounds like that for multi-value IN/ANY, they act the same way as you mentioned. But I found the difference in plans for the single-value variant.

Imagine we have a btree(a, b) index. Compare two queries for one-element use case:

1. a='aaa' AND b=ANY('{bbb}')
2. a='aaa' AND b IN('bbb')

They may produce different plans: IN() always coalesces to field='aaa' in the plan, whilst =ANY() always remains =ANY(). This causes PG to choose a "post-filtering" plan sometimes:

1. For =ANY: Index Cond: (a='aaa'); Filter: b=ANY('{bbb}')
2. For IN(): Index Cond: (a='aaa') AND (b='bbb')

Do you think that this difference is significant? Or maybe something is off in the planner, should it treat them differently by design, is it intended?

Below is an example screenshot from the production database with real data. We see that IN(20) is literally the same as =20 (green marker), whilst =any('{20}') causes PG to use post-filtering. (The cardinality of data in "type" field is low, just several unique values there in the entire table, so it probably doesn't make a big difference, whether a post-filtering is used or not, but anyways, the difference between IN() and =ANY looks a little scary. The index is "btree (cred_id, external_id, type)".)

Thanks!

On Wed, Nov 16, 2022 at 7:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Dmitry Koterov <dmitry.koterov@gmail.com> writes:
> PG13+. Assume we have two identical queries with no arguments (as a plain
> text, e.g. passed to PQexec - NOT to PQexecParams!):

> - one with "a=X AND b IN(...)"
> - and one with "a=X and b=ANY('{...}')

> The question: is it guaranteed that the planner will always choose
> identical plans for them (or, at least, the plan for ANY will not match an
> existing index worse than the plan with IN)?

This depends greatly on what "..." represents. But if it's a list
of constants, they're probably equivalent. transformAExprIn()
offers some caveats:

* We try to generate a ScalarArrayOpExpr from IN/NOT IN, but this is only
* possible if there is a suitable array type available. If not, we fall
* back to a boolean condition tree with multiple copies of the lefthand
* expression. Also, any IN-list items that contain Vars are handled as
* separate boolean conditions, because that gives the planner more scope
* for optimization on such clauses.

regards, tom lane

Attachment

image.png

pgsql-hackers by date:

From: Masahiko Sawada
Date: 07 December 2022, 04:39:24
Subject: Re: Perform streaming logical transactions by background workers and parallel apply

From: Amit Kapila
Date: 07 December 2022, 05:06:45
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)

Re: Is the plan for IN(1,2,3) always the same as for =ANY('{1,2,3}') when using PQexec with no params? - Mailing list pgsql-hackers

Attachment

Previous

Next