Re: BUG #17544: Join order for INNER JOIN ... USING with GROUP BY on join column affects selected query plan - Mailing list pgsql-bugs

From Andrey Lepikhov
Subject Re: BUG #17544: Join order for INNER JOIN ... USING with GROUP BY on join column affects selected query plan
Date
Msg-id ddd9555b-96d4-ca8d-8b04-f6fec2b90124@postgrespro.ru
Whole thread Raw
In response to BUG #17544: Join order for INNER JOIN ... USING with GROUP BY on join column affects selected query plan  (PG Bug reporting form <noreply@postgresql.org>)
List pgsql-bugs
On 7/10/22 09:52, PG Bug reporting form wrote:
> All join orders for inner joins give semantically equivalent results, and
> documentation at https://www.postgresql.org/docs/current/explicit-joins.html
> indicates that the planner should explore all join orders unless there are
> too many tables, so I would have expected the join order here to be
> immaterial to the query plan chosen.
> 
> Digging a bit deeper, this appears to be due to the USING clause causing the
> GROUP BY group_id to be rewritten to `groups.group_id` in the first query
> and `tasks.group_id` in the second query, and the former resulting in the
> simpler plan not being used. The same difference can be observed running:
> 
>      EXPLAIN (ANALYZE, TIMING) SELECT groups.group_id, BOOL_AND(finished)
> FROM tasks INNER JOIN groups ON tasks.group_id = groups.group_id GROUP BY
> groups.group_id, priority ORDER BY priority ASC LIMIT 10;
>      (the slow plan)
> 
>      EXPLAIN (ANALYZE, TIMING) SELECT tasks.group_id, BOOL_AND(finished) FROM
> tasks INNER JOIN groups ON tasks.group_id = groups.group_id GROUP BY
> tasks.group_id, priority ORDER BY priority ASC LIMIT 10;
>      (the fast plan) 
Documentation speaks the truth - optimizer checks all join permutations. 
But USING clause doesn't pull both group_id vars and implicitly chooses 
only one according to an algorithm, described in 
parse_clause.c::buildMergedJoinVar(). So, because you use group_id in 
upper GROUP BY, optimizer is limited to specific set of strategies, 
because it must use the same grouping variable, as implicitly chosen in 
the JOIN USING clause.

-- 
Regards
Andrey Lepikhov
Postgres Professional



pgsql-bugs by date:

Previous
From: David Rowley
Date:
Subject: Re: Reducing chunk header sizes on all memory context types
Next
From: Yura Sokolov
Date:
Subject: Re: can't drop table due to reference from orphaned temp function