Query plan prefers hash join when nested loop is much faster - Mailing list pgsql-general
From | iulian dragos |
---|---|
Subject | Query plan prefers hash join when nested loop is much faster |
Date | |
Msg-id | CAMNsu3ne_LnDGF76ObeauE=0Ejhp800Cu+AFrs05WjUR9QVgGA@mail.gmail.com Whole thread Raw |
Responses |
Re: Query plan prefers hash join when nested loop is much faster
Re: Query plan prefers hash join when nested loop is much faster |
List | pgsql-general |
Hi,
I am trying to understand why the query planner insists on using a hash join, and how to make it choose the better option, which in this case would be a nested loop. I have two tables:
// about 200 million rows
CREATE TABLE module_result(
id bigserial PRIMARY KEY,
name_id bigint NOT NULL references result_name(id),
run_id integer NOT NULL references run (id),
logs text NOT NULL,
status result_status NOT NULL
);
id bigserial PRIMARY KEY,
name_id bigint NOT NULL references result_name(id),
run_id integer NOT NULL references run (id),
logs text NOT NULL,
status result_status NOT NULL
);
CREATE INDEX ON module_result (run_id);
// 500 million rows
CREATE TABLE test_result(
id bigserial PRIMARY KEY,
name_id bigint NOT NULL references result_name(id),
module_result_id bigint NOT NULL references module_result (id),
seconds float NOT NULL,
failure_msg text, -- Either a <failure>...</failure> or an <error message="... />
status result_status NOT NULL
);
id bigserial PRIMARY KEY,
name_id bigint NOT NULL references result_name(id),
module_result_id bigint NOT NULL references module_result (id),
seconds float NOT NULL,
failure_msg text, -- Either a <failure>...</failure> or an <error message="... />
status result_status NOT NULL
);
CREATE INDEX ON test_result (module_result_id);
I'm trying to select all test cases that belong to a given run_id, which logically involves finding all IDs in module_result that belong to a given run, and then selecting the test results for those IDs (run_id has several module_result_id, which in turn have several test_results each).
EXPLAIN ANALYZE SELECT test_result.status, count(test_result.status) as "Count" FROM test_result INNER JOIN module_result ON module_result.id = test_result.module_result_id WHERE module_resul
t.run_id=158523 GROUP BY test_result.status
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Finalize GroupAggregate (cost=7771702.73..7771804.08 rows=3 width=12) (actual time=32341.993..32341.994 rows=2 loops=1) |
| Group Key: test_result.status |
| -> Gather Merge (cost=7771702.73..7771804.02 rows=6 width=12) (actual time=32341.970..32343.222 rows=6 loops=1) |
| Workers Planned: 2 |
| Workers Launched: 2 |
| -> Partial GroupAggregate (cost=7770702.71..7770803.30 rows=3 width=12) (actual time=32340.278..32340.286 rows=2 loops=3) |
| Group Key: test_result.status |
| -> Sort (cost=7770702.71..7770736.23 rows=13408 width=4) (actual time=32339.698..32339.916 rows=4941 loops=3) |
| Sort Key: test_result.status |
| Sort Method: quicksort Memory: 431kB |
| Worker 0: Sort Method: quicksort Memory: 433kB |
| Worker 1: Sort Method: quicksort Memory: 409kB |
| -> Hash Join (cost=586.15..7769783.54 rows=13408 width=4) (actual time=18112.078..32339.011 rows=4941 loops=3) |
| Hash Cond: (test_result.module_result_id = module_result.id) |
| -> Parallel Seq Scan on test_result (cost=0.00..7145224.72 rows=237703872 width=12) (actual time=0.034..15957.894 rows=190207740 loops=3) |
| -> Hash (cost=438.41..438.41 rows=11819 width=8) (actual time=3.905..3.905 rows=14824 loops=3) |
| Buckets: 16384 Batches: 1 Memory Usage: 708kB |
| -> Index Scan using module_result_run_id_idx on module_result (cost=0.57..438.41 rows=11819 width=8) (actual time=0.017..2.197 rows=14824 loops=3) |
| Index Cond: (run_id = 158523) |
| Planning Time: 0.178 ms |
| Execution Time: 32343.330 ms |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
EXPLAIN
Time: 32.572s (32 seconds), executed in: 32.551s (32 seconds)
t.run_id=158523 GROUP BY test_result.status
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Finalize GroupAggregate (cost=7771702.73..7771804.08 rows=3 width=12) (actual time=32341.993..32341.994 rows=2 loops=1) |
| Group Key: test_result.status |
| -> Gather Merge (cost=7771702.73..7771804.02 rows=6 width=12) (actual time=32341.970..32343.222 rows=6 loops=1) |
| Workers Planned: 2 |
| Workers Launched: 2 |
| -> Partial GroupAggregate (cost=7770702.71..7770803.30 rows=3 width=12) (actual time=32340.278..32340.286 rows=2 loops=3) |
| Group Key: test_result.status |
| -> Sort (cost=7770702.71..7770736.23 rows=13408 width=4) (actual time=32339.698..32339.916 rows=4941 loops=3) |
| Sort Key: test_result.status |
| Sort Method: quicksort Memory: 431kB |
| Worker 0: Sort Method: quicksort Memory: 433kB |
| Worker 1: Sort Method: quicksort Memory: 409kB |
| -> Hash Join (cost=586.15..7769783.54 rows=13408 width=4) (actual time=18112.078..32339.011 rows=4941 loops=3) |
| Hash Cond: (test_result.module_result_id = module_result.id) |
| -> Parallel Seq Scan on test_result (cost=0.00..7145224.72 rows=237703872 width=12) (actual time=0.034..15957.894 rows=190207740 loops=3) |
| -> Hash (cost=438.41..438.41 rows=11819 width=8) (actual time=3.905..3.905 rows=14824 loops=3) |
| Buckets: 16384 Batches: 1 Memory Usage: 708kB |
| -> Index Scan using module_result_run_id_idx on module_result (cost=0.57..438.41 rows=11819 width=8) (actual time=0.017..2.197 rows=14824 loops=3) |
| Index Cond: (run_id = 158523) |
| Planning Time: 0.178 ms |
| Execution Time: 32343.330 ms |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
EXPLAIN
Time: 32.572s (32 seconds), executed in: 32.551s (32 seconds)
This plan takes about 30s to execute. If I turn off seqscan, I get a nested loop join that takes about 0.02s to execute:
set enable_seqscan = off
SET
Time: 0.305s
> explain analyze select test_result.status, count(test_result.status) as "Count" from test_result inner join module_result ON module_result.id = test_result.module_result_id where module_resul
t.run_id=158523 group by test_result.status
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Finalize GroupAggregate (cost=34297042.16..34297143.50 rows=3 width=12) (actual time=15.014..15.015 rows=2 loops=1) |
| Group Key: test_result.status |
| -> Gather Merge (cost=34297042.16..34297143.44 rows=6 width=12) (actual time=15.005..15.850 rows=6 loops=1) |
| Workers Planned: 2 |
| Workers Launched: 2 |
| -> Partial GroupAggregate (cost=34296042.13..34296142.72 rows=3 width=12) (actual time=12.937..12.940 rows=2 loops=3) |
| Group Key: test_result.status |
| -> Sort (cost=34296042.13..34296075.65 rows=13408 width=4) (actual time=12.339..12.559 rows=4941 loops=3) |
| Sort Key: test_result.status |
| Sort Method: quicksort Memory: 461kB |
| Worker 0: Sort Method: quicksort Memory: 403kB |
| Worker 1: Sort Method: quicksort Memory: 408kB |
| -> Nested Loop (cost=232.74..34295122.96 rows=13408 width=4) (actual time=0.232..11.671 rows=4941 loops=3) |
| -> Parallel Bitmap Heap Scan on module_result (cost=232.17..44321.35 rows=4925 width=8) (actual time=0.218..0.671 rows=4941 loops=3) |
| Recheck Cond: (run_id = 158523) |
| Heap Blocks: exact=50 |
| -> Bitmap Index Scan on module_result_run_id_idx (cost=0.00..229.21 rows=11819 width=0) (actual time=0.592..0.592 rows=14824 loops=1) |
| Index Cond: (run_id = 158523) |
| -> Index Scan using test_result_module_result_id_idx on test_result (cost=0.57..6911.17 rows=4331 width=12) (actual time=0.002..0.002 rows=1 loops=14824) |
| Index Cond: (module_result_id = module_result.id) |
| Planning Time: 0.214 ms |
| Execution Time: 15.932 ms |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
EXPLAIN
Time: 0.235s
SET
Time: 0.305s
> explain analyze select test_result.status, count(test_result.status) as "Count" from test_result inner join module_result ON module_result.id = test_result.module_result_id where module_resul
t.run_id=158523 group by test_result.status
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Finalize GroupAggregate (cost=34297042.16..34297143.50 rows=3 width=12) (actual time=15.014..15.015 rows=2 loops=1) |
| Group Key: test_result.status |
| -> Gather Merge (cost=34297042.16..34297143.44 rows=6 width=12) (actual time=15.005..15.850 rows=6 loops=1) |
| Workers Planned: 2 |
| Workers Launched: 2 |
| -> Partial GroupAggregate (cost=34296042.13..34296142.72 rows=3 width=12) (actual time=12.937..12.940 rows=2 loops=3) |
| Group Key: test_result.status |
| -> Sort (cost=34296042.13..34296075.65 rows=13408 width=4) (actual time=12.339..12.559 rows=4941 loops=3) |
| Sort Key: test_result.status |
| Sort Method: quicksort Memory: 461kB |
| Worker 0: Sort Method: quicksort Memory: 403kB |
| Worker 1: Sort Method: quicksort Memory: 408kB |
| -> Nested Loop (cost=232.74..34295122.96 rows=13408 width=4) (actual time=0.232..11.671 rows=4941 loops=3) |
| -> Parallel Bitmap Heap Scan on module_result (cost=232.17..44321.35 rows=4925 width=8) (actual time=0.218..0.671 rows=4941 loops=3) |
| Recheck Cond: (run_id = 158523) |
| Heap Blocks: exact=50 |
| -> Bitmap Index Scan on module_result_run_id_idx (cost=0.00..229.21 rows=11819 width=0) (actual time=0.592..0.592 rows=14824 loops=1) |
| Index Cond: (run_id = 158523) |
| -> Index Scan using test_result_module_result_id_idx on test_result (cost=0.57..6911.17 rows=4331 width=12) (actual time=0.002..0.002 rows=1 loops=14824) |
| Index Cond: (module_result_id = module_result.id) |
| Planning Time: 0.214 ms |
| Execution Time: 15.932 ms |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
EXPLAIN
Time: 0.235s
I don't think it's recommended to turn off seqscan in production, so I'm looking for a way to make the query planner choose the significantly faster plan. How can I do that? It's probably related to some statistics, but they are up to date (I run ANALYZE several times).
Any pointers would be very helpful,
thank you,
iulian
pgsql-general by date: