Re: BUG #17964: Missed query planner optimization - Mailing list pgsql-bugs

From Mathias Kunter
Subject Re: BUG #17964: Missed query planner optimization
Date
Msg-id 94608857-707f-55ce-99a0-ac500aeb06f8@gmail.com
Whole thread Raw
In response to Re: BUG #17964: Missed query planner optimization  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: BUG #17964: Missed query planner optimization  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-bugs
> It's not a bug that the planner does not consider evaluating the join
> before the UNION

Yes, it's not a bug, but it's something which can be improved. If I 
simply change the original query from this:

> SELECT ... WHERE id IN (SELECT ...);

into this:

> SELECT ... WHERE id = ANY(ARRAY(SELECT ...));

then Postgres uses an index scan, and the query is orders of magnitude 
faster. Note that the planner actually correctly computes the estimated 
costs for both variants, since I get:

> cost=769.11..1227.17 when using IN
> cost=86.45..86.65    when using ANY

See https://dbfiddle.uk/iOkiiTJJ

Also note that this issue doesn't only affect UNION queries. For 
example, the following query will also execute orders of magnitude 
faster if I simply replace IN with ANY:

> SELECT * FROM t WHERE x = 'a' OR y IN (SELECT ...);

Again, estimated costs say that using ANY should be faster:

> cost=8.30..2443.31 when using IN
> cost=56.45..350.69 when using ANY

See https://dbfiddle.uk/b9piwQr4

Hence, why doesn't the planner simply test whether it's beneficial to 
replace IN with ANY? It seems that all which has to be done is to 
compare the query plans for both possible execution variants. I guess 
this should be rather simple to implement, isn't it?

Thanks

Mathias


Am 06.06.23 um 23:32 schrieb David Rowley:
> On Wed, 7 Jun 2023 at 04:44, PG Bug reporting form
> <noreply@postgresql.org> wrote:
>> In the example below, the query planner uses a sequential scan (query 1)
>> even though it could use an index scan (query 2).
>>
>> EXPLAIN ANALYZE SELECT id, name FROM (SELECT id, name FROM table1 UNION
>> SELECT id, name FROM table2) AS q
>> WHERE id IN (SELECT id FROM table3);
> 
>> EXPLAIN ANALYZE SELECT id, name FROM (SELECT id, name FROM table1 UNION
>> SELECT id, name FROM table2) AS q
>> WHERE id IN (1538,8836,5486,3464,2673);
> 
> It's not a bug that the plannSer does not consider evaluating the join
> before the UNION, it's just an optimisation opportunity we don't
> currently explore.
> 
> If you want that, then write:
> 
> EXPLAIN ANALYZE SELECT id, name FROM table1 WHERE id IN (SELECT id
> FROM table3) UNION SELECT id, name FROM table2 WHERE id IN (SELECT id
> FROM table3);
> 
> David



pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #17967: [msvc-x86] Failed to build with error LNK2019: unresolved external symbol __BitScanReverse64 referen
Next
From: David Rowley
Date:
Subject: Re: BUG #17964: Missed query planner optimization