Partitionwise JOIN scanning all partitions, even unneeded ones - Mailing list pgsql-general
From | Dimitrios Apostolou |
---|---|
Subject | Partitionwise JOIN scanning all partitions, even unneeded ones |
Date | |
Msg-id | 704abd8e-5b3e-f946-fb7c-58d39eb1268e@gmx.net Whole thread Raw |
Responses |
Re: Partitionwise JOIN scanning all partitions, even unneeded ones
|
List | pgsql-general |
Hello list, I believe I'm facing a performance bug in PostgreSQL with partitionwise JOINs. I have reduced the issue to the minimum queries I could, please read-on for details and see attached files for EXPLAIN output with timings and database settings. I'd appreciate help on whether it's a real issue, and if it's unknown I can forward this to the psql-bugs mailing list. I'd also appreciate any critique on the clarity of my description and on my schema and queries, since I'm new to postgres. ==== Short description + Each textual "task_id" is described by a unique number "task_n". + Each task can have many workitems, and this relation is stored in "tasks_mm_workitems". A workitem has a unique number "workitem_n". + I have a massive (10G rows) table called test_runs_raw. It contains a list of test runs for each workitem_n. + TABLE test_runs_raw is PARTITIONED into 1000 partitions (many of them empty, reserved for future) according to RANGE(workitem_n). SUMMARY: I want to do stuff with all test_runs of a task. Basically I want JOIN tables in order to go from one "task_id" to one "task_n" to many "workitem_n" to manymany test runs. PROBLEM: If I JOIN the 3 tables to do that, postgres scans ALL partitions of the massive table test_runs_raw, and this takes hours. On the other hand, if I omit the 1st part of the JOIN and provide the "task_n" manually, and JOIN only the 2 latter tables, postgres only scans the relevant partitions and my queries return fast. VERSION: PostgreSQL 15.4 (Ubuntu 15.4-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0,64-bit ==== Queries Here are some seemingly meaningless queries, that are just minimal reproducers of the issue (you can ignore "DISTINCT" part, it's just there to limit the output I was getting). ===== Slow query: EXPLAIN (ANALYZE, VERBOSE,BUFFERS,SETTINGS) SELECT DISTINCT workitem_n FROM task_ids JOIN tasks_mm_workitems USING(task_n) JOIN test_runs_raw USING(workitem_n) WHERE task_id = '1698813977'; The EXPLAIN output here shows a parallel hash join doing seq scans on each and every partition. Basically the whole 10G rows table is being seq-scanned. ===== Fast query: Here I resolve the "task_n" from the "task_id" manually, and omit the first part of the JOIN. SELECT task_n FROM task_ids WHERE task_id = '1698813977'; task_n -------- 59854 (1 row) EXPLAIN (ANALYZE, VERBOSE,BUFFERS,SETTINGS) SELECT DISTINCT workitem_n FROM tasks_mm_workitems JOIN test_runs_raw USING(workitem_n) WHERE task_n = 59854; The EXPLAIN output here shows "(never executed)" on most partition seq scans. The relevant partitions are index-scanned. Basically only a few partitions are accessed, and only partially through their index on "workitem_n". This is excellent and what I would expect in all cases. ==== Schema description > \d+ task_ids Column | Type | Collation | Nullable | Default ---------+---------+-----------+----------+------------------------------------------ task_n | integer | | not null | nextval('task_ids_task_n_seq'::regclass) task_id | text | | not null | Indexes: "task_ids_pkey" PRIMARY KEY, btree (task_n) "task_ids_task_id_key" UNIQUE CONSTRAINT, btree (task_id) > \d tasks_mm_workitems Table "public.tasks_mm_workitems" Column | Type | Collation | Nullable | Default ------------+---------+-----------+----------+--------- task_n | integer | | not null | workitem_n | integer | | not null | Indexes: "tasks_mm_workitems_pkey" PRIMARY KEY, btree (task_n, workitem_n) Foreign-key constraints: "fk1_tasks_mm_workitems" FOREIGN KEY (task_n) REFERENCES task_ids(task_n) "fk2_tasks_mm_workitems" FOREIGN KEY (workitem_n) REFERENCES workitem_ids(workitem_n) > \d test_runs_raw Partitioned table "public.test_runs_raw" Column | Type | Collation | Nullable | Default -------------------+-----------------------------+-----------+----------+---------------------------------- run_n | bigint | | not null | generated by default as identity [...] workitem_n | integer | | not null | [...] Partition key: RANGE (workitem_n) Indexes: "test_runs_raw_partitioned_pkey" PRIMARY KEY, btree (workitem_n, run_n), tablespace "archival_tablespace_1" Foreign-key constraints: [...] "test_runs_raw_partitioned_workitem_n_fkey" FOREIGN KEY (workitem_n) REFERENCES workitem_ids(workitem_n) Number of partitions: 1000 (Use \d+ to list them.) Tablespace: "tablespace1" Thank you in advance, Dimitris
Attachment
pgsql-general by date: