Problem with planner choosing nested loop - Mailing list pgsql-general
From | Alex Solovey |
---|---|
Subject | Problem with planner choosing nested loop |
Date | |
Msg-id | 47F3C43A.1010407@gmail.com Whole thread Raw |
Responses |
Re: Problem with planner choosing nested loop
("Rodrigo E. De León Plicet" <rdeleonp@gmail.com>)
|
List | pgsql-general |
Hello, I was trying to optimize a slow query in database running 8.3.1. It turned out that planner is choosing nested loop join resulting in multiple sequential scans over the long table. Here is a simplified database schema, consisting of two tables: CREATE TABLE bar ( bar_id integer PRIMARY KEY, bar_a integer, bar_b integer, bar_c integer, bar_d integer, bar_e integer, bar_f integer, bar_g integer, bar_h integer ); CREATE TABLE foo ( foo_a integer, foo_b integer, foo_c integer, bar_id integer ); Table "bar" has 16805 records and table "foo" is fairly big, having over 6 million records. default_statistics_target is set to 1000 (in fact, I tried many values from 100 to 1000 but it did not help), VACUUM ANALYZE was executed before running test queries. Running this query: EXPLAIN ANALYZE SELECT foo_b, SUM(foo_c) FROM foo JOIN bar USING (bar_id) WHERE foo_a = 1001 AND bar_h = 1821 AND bar_c = 519 GROUP BY foo_b; produces this plan: QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=110916.41..110916.42 rows=1 width=8) (actual time=20547.433..20547.433 rows=1 loops=1) -> Nested Loop (cost=0.00..110916.40 rows=1 width=8) (actual time=17952.622..20547.175 rows=59 loops=1) Join Filter: (foo.bar_id = bar.bar_id) -> Seq Scan on bar (cost=0.00..393.07 rows=1 width=4) (actual time=0.098..3.561 rows=24 loops=1) Filter: ((bar_h = 1821) AND (bar_c = 519)) -> Seq Scan on foo (cost=0.00..110510.89 rows=995 width=12) (actual time=0.957..855.366 rows=1369 loops=24) Filter: (foo.foo_a = 1001) Total runtime: 20547.518 ms The problem is that 6+ million rows table "foo" is scanned 24 times: Seq Scan on foo (... loops=24) If I try to disable nested loops using set enable_nestloop=off, the plan is just fine: QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=110907.72..110907.73 rows=1 width=8) (actual time=889.239..889.240 rows=1 loops=1) -> Hash Join (cost=393.09..110907.72 rows=1 width=8) (actual time=17.825..889.065 rows=59 loops=1) Hash Cond: (foo.bar_id = bar.bar_id) -> Seq Scan on foo (cost=0.00..110510.89 rows=995 width=12) (actual time=2.309..883.841 rows=1369 loops=1) Filter: (foo_a = 1001) -> Hash (cost=393.07..393.07 rows=1 width=4) (actual time=4.168..4.168 rows=24 loops=1) -> Seq Scan on bar (cost=0.00..393.07 rows=1 width=4) (actual time=0.118..4.141 rows=24 loops=1) Filter: ((bar_h = 1821) AND (bar_c = 519)) Total runtime: 889.329 ms Unfortunately, I cannot disable nested loops because if I do, some other queries degrade miserably, and disabling nested loops just for this query is not an option. I think the problem is caused by wrong estimate for the table "bar": Seq Scan on bar (cost=0.00..393.07 rows=1 width=4) (actual time=0.098..3.561 rows=24 loops=1) but so far, I have no idea how it could be fixed. As I've said, I tried increasing statistics_target to the max value (1000) but it did not help. The test database dump (6.3 Mb download) is available at http://216.159.242.194/test_dump.sql.bz2 Alex
pgsql-general by date: