Bad Planner Statistics for Uneven distribution. - Mailing list pgsql-performance
From | Kevin McArthur |
---|---|
Subject | Bad Planner Statistics for Uneven distribution. |
Date | |
Msg-id | 003b01c6acf1$a66d8910$0701a8c0@kdesktop Whole thread Raw |
Responses |
Re: Bad Planner Statistics for Uneven distribution.
|
List | pgsql-performance |
I discussed this with a few members of #postgresql freenode this morning. I'll keep it breif; [note: i have cleaned out columns not relevant]
I have two tables, brands and models_brands. The first has about 300 records, the later about 350,000 records. The number of distinct brands in the models_brands table is < 10.
=# \d models_brands
Table "public.models_brands"
Column | Type | Modifiers
--------+-----------------------+-----------
model | integer | not null
brand | integer | not null
Indexes:
"models_brands_brand" btree (brand)
Foreign-key constraints:
"models_brands_brand_fkey" FOREIGN KEY (brand) REFERENCES brands(brand_id) ON UPDATE CASCADE ON DELETE CASCADE
"models_brands_model_fkey" FOREIGN KEY (model) REFERENCES models(model_id) ON UPDATE CASCADE ON DELETE CASCADE
Table "public.models_brands"
Column | Type | Modifiers
--------+-----------------------+-----------
model | integer | not null
brand | integer | not null
Indexes:
"models_brands_brand" btree (brand)
Foreign-key constraints:
"models_brands_brand_fkey" FOREIGN KEY (brand) REFERENCES brands(brand_id) ON UPDATE CASCADE ON DELETE CASCADE
"models_brands_model_fkey" FOREIGN KEY (model) REFERENCES models(model_id) ON UPDATE CASCADE ON DELETE CASCADE
a=# \d brands;
Table "public.brands"
Column | Type | Modifiers
------------+------------------------+-----------------------------------------------------------
brand_id | integer | not null default nextval('brands_brand_id_seq'::regclass)
brand_name | character varying(255) | not null
Indexes:
"brands_pkey" PRIMARY KEY, btree (brand_id)
Table "public.brands"
Column | Type | Modifiers
------------+------------------------+-----------------------------------------------------------
brand_id | integer | not null default nextval('brands_brand_id_seq'::regclass)
brand_name | character varying(255) | not null
Indexes:
"brands_pkey" PRIMARY KEY, btree (brand_id)
Now the plans/problems..
=# set enable_seqscan to on;
SET
=# explain analyze select distinct brand from models_brands;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Unique (cost=46300.70..48148.15 rows=4 width=4) (actual time=3699.691..6215.216 rows=4 loops=1)
-> Sort (cost=46300.70..47224.43 rows=369489 width=4) (actual time=3699.681..5027.069 rows=369489 loops=1)
Sort Key: brand
-> Seq Scan on models_brands (cost=0.00..6411.89 rows=369489 width=4) (actual time=0.040..1352.997 rows=369489 loops=1)
Total runtime: 6223.666 ms
(5 rows)
SET
=# explain analyze select distinct brand from models_brands;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Unique (cost=46300.70..48148.15 rows=4 width=4) (actual time=3699.691..6215.216 rows=4 loops=1)
-> Sort (cost=46300.70..47224.43 rows=369489 width=4) (actual time=3699.681..5027.069 rows=369489 loops=1)
Sort Key: brand
-> Seq Scan on models_brands (cost=0.00..6411.89 rows=369489 width=4) (actual time=0.040..1352.997 rows=369489 loops=1)
Total runtime: 6223.666 ms
(5 rows)
=# set enable_seqscan to off;
SET
=# explain analyze select distinct brand from models_brands;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=0.00..863160.68 rows=4 width=4) (actual time=0.131..2584.779 rows=4 loops=1)
-> Index Scan using models_brands_brand on models_brands (cost=0.00..862236.96 rows=369489 width=4) (actual time=0.122..1440.809 rows=369489 loops=1)
Total runtime: 2584.871 ms
(3 rows)
SET
=# explain analyze select distinct brand from models_brands;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=0.00..863160.68 rows=4 width=4) (actual time=0.131..2584.779 rows=4 loops=1)
-> Index Scan using models_brands_brand on models_brands (cost=0.00..862236.96 rows=369489 width=4) (actual time=0.122..1440.809 rows=369489 loops=1)
Total runtime: 2584.871 ms
(3 rows)
Picks the wrong plan here. Should pick the index with seqscanning enabled.
More (as a different wording/query)... (as suggested by others on irc)
=# set enable_seqscan to on;
SET
=# explain analyze select brand_id from brands where exists (select 1 from models_brands where brand = brands.brand_id);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Seq Scan on brands (cost=0.00..30.09 rows=152 width=4) (actual time=7742.460..62567.543 rows=4 loops=1)
Filter: (subplan)
SubPlan
-> Seq Scan on models_brands (cost=0.00..7335.61 rows=92372 width=0) (actual time=206.467..206.467 rows=0 loops=303)
Filter: (brand = $0)
Total runtime: 62567.626 ms
SET
=# explain analyze select brand_id from brands where exists (select 1 from models_brands where brand = brands.brand_id);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Seq Scan on brands (cost=0.00..30.09 rows=152 width=4) (actual time=7742.460..62567.543 rows=4 loops=1)
Filter: (subplan)
SubPlan
-> Seq Scan on models_brands (cost=0.00..7335.61 rows=92372 width=0) (actual time=206.467..206.467 rows=0 loops=303)
Filter: (brand = $0)
Total runtime: 62567.626 ms
a=# set enable_seqscan to off;
SET
SET
=# explain analyze select brand_id from brands where exists (select 1 from models_brands where brand = brands.brand_id);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on brands (cost=100000000.00..100000715.90 rows=152 width=4) (actual time=0.615..3.710 rows=4 loops=1)
Filter: (subplan)
SubPlan
-> Index Scan using models_brands_brand on models_brands (cost=0.00..216410.97 rows=92372 width=0) (actual time=0.008..0.008 rows=0 loops=303)
Index Cond: (brand = $0)
Total runtime: 3.790 ms
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on brands (cost=100000000.00..100000715.90 rows=152 width=4) (actual time=0.615..3.710 rows=4 loops=1)
Filter: (subplan)
SubPlan
-> Index Scan using models_brands_brand on models_brands (cost=0.00..216410.97 rows=92372 width=0) (actual time=0.008..0.008 rows=0 loops=303)
Index Cond: (brand = $0)
Total runtime: 3.790 ms
It was also tried to similar results with a LIMIT 1 in the subquery for exist.
More...
Seqscan still off..
=# explain analyze select distinct brand_id from brands inner join models_brands on (brand_id = brand);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=0.00..867782.58 rows=303 width=4) (actual time=0.391..4898.579 rows=4 loops=1)
-> Merge Join (cost=0.00..866858.85 rows=369489 width=4) (actual time=0.383..3749.771 rows=369489 loops=1)
Merge Cond: ("outer".brand_id = "inner".brand)
-> Index Scan using brands_pkey on brands (cost=0.00..15.53 rows=303 width=4) (actual time=0.080..0.299 rows=60 loops=1)
-> Index Scan using models_brands_brand on models_brands (cost=0.00..862236.96 rows=369489 width=4) (actual time=0.013..1403.175 rows=369489 loops=1)
Total runtime: 4898.697 ms
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=0.00..867782.58 rows=303 width=4) (actual time=0.391..4898.579 rows=4 loops=1)
-> Merge Join (cost=0.00..866858.85 rows=369489 width=4) (actual time=0.383..3749.771 rows=369489 loops=1)
Merge Cond: ("outer".brand_id = "inner".brand)
-> Index Scan using brands_pkey on brands (cost=0.00..15.53 rows=303 width=4) (actual time=0.080..0.299 rows=60 loops=1)
-> Index Scan using models_brands_brand on models_brands (cost=0.00..862236.96 rows=369489 width=4) (actual time=0.013..1403.175 rows=369489 loops=1)
Total runtime: 4898.697 ms
=# set enable_seqscan to on;
SET
=# explain analyze select distinct brand_id from brands inner join models_brands on (brand_id = brand);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=46300.70..52770.04 rows=303 width=4) (actual time=3742.046..8560.833 rows=4 loops=1)
-> Merge Join (cost=46300.70..51846.32 rows=369489 width=4) (actual time=3742.035..7406.677 rows=369489 loops=1)
Merge Cond: ("outer".brand_id = "inner".brand)
-> Index Scan using brands_pkey on brands (cost=0.00..15.53 rows=303 width=4) (actual time=0.077..0.407 rows=60 loops=1)
-> Sort (cost=46300.70..47224.43 rows=369489 width=4) (actual time=3741.584..5051.348 rows=369489 loops=1)
Sort Key: models_brands.brand
-> Seq Scan on models_brands (cost=0.00..6411.89 rows=369489 width=4) (actual time=0.027..1346.178 rows=369489 loops=1)
Total runtime: 8589.502 ms
(8 rows)
SET
=# explain analyze select distinct brand_id from brands inner join models_brands on (brand_id = brand);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=46300.70..52770.04 rows=303 width=4) (actual time=3742.046..8560.833 rows=4 loops=1)
-> Merge Join (cost=46300.70..51846.32 rows=369489 width=4) (actual time=3742.035..7406.677 rows=369489 loops=1)
Merge Cond: ("outer".brand_id = "inner".brand)
-> Index Scan using brands_pkey on brands (cost=0.00..15.53 rows=303 width=4) (actual time=0.077..0.407 rows=60 loops=1)
-> Sort (cost=46300.70..47224.43 rows=369489 width=4) (actual time=3741.584..5051.348 rows=369489 loops=1)
Sort Key: models_brands.brand
-> Seq Scan on models_brands (cost=0.00..6411.89 rows=369489 width=4) (actual time=0.027..1346.178 rows=369489 loops=1)
Total runtime: 8589.502 ms
(8 rows)
Hope that helps
Kevin McArthur
pgsql-performance by date: