On 7/10/22 09:52, PG Bug reporting form wrote:
> All join orders for inner joins give semantically equivalent results, and
> documentation at https://www.postgresql.org/docs/current/explicit-joins.html
> indicates that the planner should explore all join orders unless there are
> too many tables, so I would have expected the join order here to be
> immaterial to the query plan chosen.
>
> Digging a bit deeper, this appears to be due to the USING clause causing the
> GROUP BY group_id to be rewritten to `groups.group_id` in the first query
> and `tasks.group_id` in the second query, and the former resulting in the
> simpler plan not being used. The same difference can be observed running:
>
> EXPLAIN (ANALYZE, TIMING) SELECT groups.group_id, BOOL_AND(finished)
> FROM tasks INNER JOIN groups ON tasks.group_id = groups.group_id GROUP BY
> groups.group_id, priority ORDER BY priority ASC LIMIT 10;
> (the slow plan)
>
> EXPLAIN (ANALYZE, TIMING) SELECT tasks.group_id, BOOL_AND(finished) FROM
> tasks INNER JOIN groups ON tasks.group_id = groups.group_id GROUP BY
> tasks.group_id, priority ORDER BY priority ASC LIMIT 10;
> (the fast plan)
Documentation speaks the truth - optimizer checks all join permutations.
But USING clause doesn't pull both group_id vars and implicitly chooses
only one according to an algorithm, described in
parse_clause.c::buildMergedJoinVar(). So, because you use group_id in
upper GROUP BY, optimizer is limited to specific set of strategies,
because it must use the same grouping variable, as implicitly chosen in
the JOIN USING clause.
--
Regards
Andrey Lepikhov
Postgres Professional