BUG #17544: Join order for INNER JOIN ... USING with GROUP BY on join column affects selected query plan - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #17544: Join order for INNER JOIN ... USING with GROUP BY on join column affects selected query plan
Date
Msg-id 17544-ebd06b00b8836a04@postgresql.org
Whole thread Raw
Responses Re: BUG #17544: Join order for INNER JOIN ... USING with GROUP BY on join column affects selected query plan  (Andrey Lepikhov <a.lepikhov@postgrespro.ru>)
Re: BUG #17544: Join order for INNER JOIN ... USING with GROUP BY on join column affects selected query plan  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17544
Logged by:          Eric Sheng
Email address:      _@ericsheng.com
PostgreSQL version: 14.4
Operating system:   Ubuntu 20.04
Description:

It seems like the join order of a simple INNER JOIN ... USING(...) affects
query plan selection even when only joining two tables, when used with GROUP
BY on the join column.

Schema: groups with priority, containing zero or more tasks with a boolean

    CREATE TABLE groups(group_id INT PRIMARY KEY NOT NULL, priority INT);
    CREATE TABLE tasks(task_id INT PRIMARY KEY NOT NULL, group_id INT NOT
NULL REFERENCES groups(group_id), finished BOOLEAN NOT NULL);
    CREATE INDEX groups_priority_index ON groups(priority, group_id);
    CREATE INDEX tasks_group_id_index ON tasks(group_id);

Data: 1000 groups (random priority) with 1000 tasks per group

    DO $$
      BEGIN
        FOR i IN 0 .. 999 LOOP
          INSERT INTO groups(group_id, priority) VALUES(i, FLOOR(RANDOM() *
2000000000)::INT);
          FOR j in 0 .. 999 LOOP
            INSERT INTO tasks(task_id, group_id, finished) VALUES (1000 * i
+ j, i, RANDOM() * 100 < 99);
          END LOOP;
        END LOOP;
      END;
    $$;

Queries: fetching the lowest priority groups and whether or not all tasks in
them have finished (same query twice, just different join order)

    EXPLAIN (ANALYZE, TIMING) SELECT group_id, BOOL_AND(finished) FROM
groups INNER JOIN tasks USING(group_id) GROUP BY group_id, priority ORDER BY
priority ASC LIMIT 10;


QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------------------------------------
     Limit  (cost=15771.99..15772.02 rows=10 width=9) (actual
time=293.054..302.919 rows=10 loops=1)
       ->  Sort  (cost=15771.99..15777.64 rows=2260 width=9) (actual
time=293.051..302.914 rows=10 loops=1)
             Sort Key: groups.priority
             Sort Method: top-N heapsort  Memory: 25kB
             ->  Finalize HashAggregate  (cost=15700.56..15723.16 rows=2260
width=9) (actual time=292.744..302.777 rows=1000 loops=1)
                   Group Key: groups.group_id
                   Batches: 1  Memory Usage: 241kB
                   ->  Gather  (cost=15203.36..15677.96 rows=4520 width=9)
(actual time=291.308..301.725 rows=2836 loops=1)
                         Workers Planned: 2
                         Workers Launched: 2
                         ->  Partial HashAggregate  (cost=14203.36..14225.96
rows=2260 width=9) (actual time=263.344..263.560 rows=945 loops=3)
                               Group Key: groups.group_id
                               Batches: 1  Memory Usage: 241kB
                               Worker 0:  Batches: 1  Memory Usage: 241kB
                               Worker 1:  Batches: 1  Memory Usage: 241kB
                               ->  Hash Join  (cost=60.85..11725.61
rows=495550 width=9) (actual time=0.719..182.977 rows=333333 loops=3)
                                     Hash Cond: (tasks.group_id =
groups.group_id)
                                     ->  Parallel Seq Scan on tasks
(cost=0.00..10361.50 rows=495550 width=5) (actual time=0.044..61.366
rows=333333 loops=3)
                                     ->  Hash  (cost=32.60..32.60 rows=2260
width=8) (actual time=0.627..0.627 rows=1000 loops=3)
                                           Buckets: 4096  Batches: 1  Memory
Usage: 72kB
                                           ->  Seq Scan on groups
(cost=0.00..32.60 rows=2260 width=8) (actual time=0.042..0.298 rows=1000
loops=3)
     Planning Time: 0.310 ms
     Execution Time: 303.294 ms

    EXPLAIN (ANALYZE, TIMING) SELECT group_id, BOOL_AND(finished) FROM tasks
INNER JOIN groups USING(group_id) GROUP BY group_id, priority ORDER BY
priority ASC LIMIT 10;


QUERY PLAN


----------------------------------------------------------------------------------------------------------------------------------------------------------
     Limit  (cost=0.70..1.35 rows=10 width=9) (actual time=1.146..10.606
rows=10 loops=1)
       ->  GroupAggregate  (cost=0.70..64932.27 rows=1000000 width=9)
(actual time=1.144..10.599 rows=10 loops=1)
             Group Key: groups.priority, tasks.group_id
             ->  Nested Loop  (cost=0.70..47432.27 rows=1000000 width=9)
(actual time=0.055..8.028 rows=10001 loops=1)
                   ->  Index Only Scan using groups_priority_index on groups
 (cost=0.28..43.27 rows=1000 width=8) (actual time=0.028..0.036 rows=11
loops=1)
                         Heap Fetches: 0
                   ->  Index Scan using tasks_group_id_index on tasks
(cost=0.42..37.39 rows=1000 width=5) (actual time=0.017..0.433 rows=909
loops=11)
                         Index Cond: (group_id = groups.group_id)
     Planning Time: 0.878 ms
     Execution Time: 10.672 ms

All join orders for inner joins give semantically equivalent results, and
documentation at https://www.postgresql.org/docs/current/explicit-joins.html
indicates that the planner should explore all join orders unless there are
too many tables, so I would have expected the join order here to be
immaterial to the query plan chosen.

Digging a bit deeper, this appears to be due to the USING clause causing the
GROUP BY group_id to be rewritten to `groups.group_id` in the first query
and `tasks.group_id` in the second query, and the former resulting in the
simpler plan not being used. The same difference can be observed running:

    EXPLAIN (ANALYZE, TIMING) SELECT groups.group_id, BOOL_AND(finished)
FROM tasks INNER JOIN groups ON tasks.group_id = groups.group_id GROUP BY
groups.group_id, priority ORDER BY priority ASC LIMIT 10;
    (the slow plan)

    EXPLAIN (ANALYZE, TIMING) SELECT tasks.group_id, BOOL_AND(finished) FROM
tasks INNER JOIN groups ON tasks.group_id = groups.group_id GROUP BY
tasks.group_id, priority ORDER BY priority ASC LIMIT 10;
    (the fast plan)


pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #17543: CSVLOG malformed from disk space error
Next
From: Jiří Ledvinka
Date:
Subject: RE: server closed the connection unexpectedly - bug report