Thread: Missed query planner optimization: `n in (select q)` -> `n in (q)`

Missed query planner optimization: `n in (select q)` -> `n in (q)`

From
"Josh"
Date:
Hey y'all!

So recently, I ran into an issue where a query I wrote wasn't using an index, presumably because what I was doing was
toohard for the query planner to figure out. I've distilled the problem into its essence to the best of my ability, and
foundthat it's because `select` seems to hinder it. 

The problem boils down to the planner not figuring out that these two queries should use an index:

```sql
-- Setup
create table numbers(n int);
insert into numbers (n) select generate_series(1, 1000000);
create index numbers_n_idx on numbers(n);

-- Non-indexed queries
explain analyze select numbers.n from (values (5000000)) quantities(q)
join numbers on numbers.n in (select q);

explain analyze select numbers.n from (values (5000000)) quantities(q)
join numbers on numbers.n = any(select q);
```

These examples may seem silly, so let me provide a "case study" query that should justify the need for such an
optimization.I had a query that was generating an array of items, and wanted to join it to a table given that some
columnof that table was present in the array. It looked like so: 

```sql
select numbers.n from quantities join numbers on numbers.n in (select unnest(quantities.q));
```

This turned out to be horrendously slow, because it was performing a sequential scan! I did however end up settling on
thefollowing form: 

```sql
select numbers.n from quantities join numbers on numbers.n = any(quantities.q);
```

This was only possible because I was dealing with arrays though, and an operation such as `in (select unnest...)` can
beeasily converted to `= any(...)`. However for the general case, I believe an optimization in this area may provide
benefitas there may exist a circumstance that does not have an alternative to a sub-query select (`= any()` was my
alternative),but I am just a database newbie. 

I've noticed this problem has been around since at least 11.7, and is still present as of the `postgres:15beta1` docker
image.I've attached a script which reproduces the issue. It uses docker, so I'm confident you'll be able to run it
withoutissue. 

Finally, I ask:

- Is this an issue that should be fixed? I'm a database newbie so I have no idea about the deep semantics of SQL and
whata select inside a `join_condition` could imply to the planner to prevent it from optimizing it. 

- If "yes" to the previous question, what would be the precise semantics of such an optimization? I loosely say `n in
(selectq)` -> `n in (q)` for all n and q, but of course I don't have enough knowledge to know that this is correct in
termsof whatever Postgres' internal query optimization IR is. 

- Can a database newbie like myself contribute an optimization pass in Postgres to fix this? I'm fascinated by the work
y'alldo, and submitting a patch to Postgres that makes it into production would make my week. 

Thank you for your time, and have a great day!

Attachment

Re: Missed query planner optimization: `n in (select q)` -> `n in (q)`

From
"David G. Johnston"
Date:
On Sun, Jun 12, 2022 at 2:47 PM Josh <postgres@sirjosh3917.com> wrote:

This was only possible because I was dealing with arrays though, and an operation such as `in (select unnest...)` can be easily converted to `= any(...)`. However for the general case,

In the general case you don't have subqueries inside join conditions.
 
I believe an optimization in this area may provide benefit as there may exist a circumstance that does not have an alternative to a sub-query select (`= any()` was my alternative)

I think we'd want a concrete example of a non-poorly written query (or at least a poorly written one that, say, is generated by a framework, not just inexperienced human SQL writers) before we'd want to even entertain spending time on something like this.
 
- Is this an issue that should be fixed?

Probably not worth the effort.

I'm fascinated by the work y'all do, and submitting a patch to Postgres that makes it into production would make my week.


Maybe you'll find almost as much good is done helping others get their patches committed.  There are many in need of reviewers.