Home > mailing lists

Extremely inefficient merge-join - Mailing list pgsql-performance

From	Marcin Gozdalik
Subject	Extremely inefficient merge-join
Date	March 17, 2021 19:10:14
Msg-id	CADu1mRO28n4FifmrJYQ6TfFFdWS7tvwksE=7FVmuqc7BsL2OeQ@mail.gmail.com Whole thread Raw
Responses	Re: Extremely inefficient merge-join
List	pgsql-performance

Tree view

I am having a rare issue with extremely inefficient merge join. The query plan indicates that PG is doing some kind of nested loop, although an index is present.

PostgreSQL 9.6.17 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit

Schema of dir_current (some columns left out for brevity):

starfish=# \d sf.dir_current
Table "sf.dir_current"
Column | Type | Collation | Nullable | Default
--------------------+-------------+-----------+----------+-----------------------------------------------
id | bigint | | not null | nextval('sf.object_id_seq'::regclass)
volume_id | bigint | | not null |
parent_id | bigint | | |
blocks | sf.blkcnt_t | | |
rec_aggrs | jsonb | | not null |
Indexes:
"dir_current_pk" PRIMARY KEY, btree (id), tablespace "sf_current"
"dir_current_parentid_idx" btree (parent_id), tablespace "sf_current"
"dir_current_volume_id_out_of_sync_time_idx" btree (volume_id, out_of_sync_time) WHERE out_of_sync_time IS NOT NULL, tablespace "sf_current"
"dir_current_volume_id_path_unq_idx" UNIQUE, btree (volume_id, path text_pattern_ops), tablespace "sf_current"
"dir_current_volumeid_id_unq" UNIQUE CONSTRAINT, btree (volume_id, id), tablespace "sf_current"
Foreign-key constraints:
"dir_current_parentid_fk" FOREIGN KEY (parent_id) REFERENCES sf.dir_current(id) DEFERRABLE INITIALLY DEFERRED

dir_process is created as a temporary table:

CREATE TEMP TABLE dir_process AS (
SELECT sf.dir_current.id, volume_id, parent_id, depth, size, blocks, atime, ctime, mtime, sync_time, local_aggrs FROM sf.dir_current
WHERE ....
);

CREATE INDEX dir_process_indx ON dir_process(volume_id, id);
ANALYZE dir_process;

and usually contains a few thousands rows.

Slow query:

SELECT dir.id, dir.volume_id, dir.parent_id, dir.rec_aggrs, dir.blocks FROM sf.dir_current AS dir
INNER JOIN dir_process ON dir.parent_id = dir_process.id AND dir.volume_id = dir_process.volume_id
WHERE dir.volume_id = ANY(volume_ids)

dir_current contains around 750M rows altogether, and there is ca. 1.75M rows with volume_id=5.

Sometimes Postgres will choose very inefficient plan, which involves looping many times over same rows, producing hundreds of millions or billions of rows:

LOG: duration: 1125530.496 ms plan:

Merge Join (cost=909.42..40484.01 rows=1 width=456) (actual rows=1 loops=1)
Merge Cond: (dir.volume_id = dir_process.volume_id)
Join Filter: (dir.parent_id = dir_process.id)
Rows Removed by Join Filter: 13583132483
-> Index Scan using dir_current_volumeid_id_unq on dir_current dir (cost=0.12..884052.46 rows=601329 width=456) (actual rows=2000756 loops=1)
Index Cond: (volume_id = ANY ('{5}'::bigint[]))
-> Sort (cost=909.31..912.70 rows=6789 width=16) (actual rows=13581131729 loops=1)
Sort Key: dir_process.volume_id
Sort Method: quicksort Memory: 511kB
-> Seq Scan on dir_process (cost=0.00..822.89 rows=6789 width=16) (actual rows=6789 loops=1)

LOG: duration: 3923310.224 ms plan:
Merge Join (cost=0.17..4324.64 rows=1 width=456) (actual rows=529 loops=1)
Merge Cond: (dir_process.volume_id = dir.volume_id)
Join Filter: (dir.parent_id = dir_process.id)
Rows Removed by Join Filter: 831113021
-> Index Only Scan using dir_process_indx on dir_process (cost=0.05..245.00 rows=450 width=16) (actual rows=450 loops=1)
Heap Fetches: 450
-> Index Scan using dir_current_volumeid_id_unq on dir_current dir (cost=0.12..884052.46 rows=601329 width=456) (actual rows=831113101 loops=1)
Index Cond: (volume_id = ANY ('{5}'::bigint[]))

LOG: duration: 10140968.829 ms plan:
Merge Join (cost=0.17..8389.13 rows=1 width=456) (actual rows=819 loops=1)
Merge Cond: (dir_process.volume_id = dir.volume_id)
Join Filter: (dir.parent_id = dir_process.id)
Rows Removed by Join Filter: 2153506735
-> Index Only Scan using dir_process_indx on dir_process (cost=0.06..659.76 rows=1166 width=16) (actual rows=1166 loops=1)
Heap Fetches: 1166
-> Index Scan using dir_current_volumeid_id_unq on dir_current dir (cost=0.12..885276.20 rows=602172 width=456) (actual rows=2153506389 loops=1)
Index Cond: (volume_id = ANY ('{5}'::bigint[]))

Note that 2153506389 / 1166 = 1 846 918. Similarly 831113101 / 450 = 1 846 918.

I wonder how I can help Postgres query planner to choose a faster plan?

Marcin Gozdalik

pgsql-performance by date:

From: Laurenz Albe
Date: 19 February 2021, 16:29:03
Subject: Re: Autovacuum not functioning for large tables but it is working for few other small tables.

From: Tom Lane
Date: 17 March 2021, 23:47:35
Subject: Re: Extremely inefficient merge-join

Extremely inefficient merge-join - Mailing list pgsql-performance

Previous

Next