Bad Planner Statistics for Uneven distribution. - Mailing list pgsql-performance

From Kevin McArthur
Subject Bad Planner Statistics for Uneven distribution.
Date
Msg-id 003b01c6acf1$a66d8910$0701a8c0@kdesktop
Whole thread Raw
Responses Re: Bad Planner Statistics for Uneven distribution.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
I discussed this with a few members of #postgresql freenode this morning. I'll keep it breif; [note: i have cleaned out columns not relevant]
 
I have two tables, brands and models_brands. The first has about 300 records, the later about 350,000 records. The number of distinct brands in the models_brands table is < 10.
 
 
 
=# \d models_brands
        Table "public.models_brands"
 Column |         Type          | Modifiers
--------+-----------------------+-----------
 model  | integer               | not null
 brand  | integer               | not null
Indexes:
    "models_brands_brand" btree (brand)
Foreign-key constraints:
    "models_brands_brand_fkey" FOREIGN KEY (brand) REFERENCES brands(brand_id) ON UPDATE CASCADE ON DELETE CASCADE
    "models_brands_model_fkey" FOREIGN KEY (model) REFERENCES models(model_id) ON UPDATE CASCADE ON DELETE CASCADE
 
a=# \d brands;
                                      Table "public.brands"
   Column   |          Type          |                         Modifiers
------------+------------------------+-----------------------------------------------------------
 brand_id   | integer                | not null default nextval('brands_brand_id_seq'::regclass)
 brand_name | character varying(255) | not null
Indexes:
    "brands_pkey" PRIMARY KEY, btree (brand_id)
 
Now the plans/problems..
 
=# set enable_seqscan to on;
SET
=# explain analyze select distinct brand from models_brands;
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=46300.70..48148.15 rows=4 width=4) (actual time=3699.691..6215.216 rows=4 loops=1)
   ->  Sort  (cost=46300.70..47224.43 rows=369489 width=4) (actual time=3699.681..5027.069 rows=369489 loops=1)
         Sort Key: brand
         ->  Seq Scan on models_brands  (cost=0.00..6411.89 rows=369489 width=4) (actual time=0.040..1352.997 rows=369489 loops=1)
 Total runtime: 6223.666 ms
(5 rows)
 
=# set enable_seqscan to off;
SET
=# explain analyze select distinct brand from models_brands;
                                                                        QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=0.00..863160.68 rows=4 width=4) (actual time=0.131..2584.779 rows=4 loops=1)
   ->  Index Scan using models_brands_brand on models_brands  (cost=0.00..862236.96 rows=369489 width=4) (actual time=0.122..1440.809 rows=369489 loops=1)
 Total runtime: 2584.871 ms
(3 rows)
 
 
Picks the wrong plan here. Should pick the index with seqscanning enabled.
 
 
More (as a different wording/query)... (as suggested by others on irc)
 
 
=# set enable_seqscan to on;
SET
=# explain analyze select brand_id from brands where exists (select 1 from models_brands where brand = brands.brand_id);
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Seq Scan on brands  (cost=0.00..30.09 rows=152 width=4) (actual time=7742.460..62567.543 rows=4 loops=1)
   Filter: (subplan)
   SubPlan
     ->  Seq Scan on models_brands  (cost=0.00..7335.61 rows=92372 width=0) (actual time=206.467..206.467 rows=0 loops=303)
           Filter: (brand = $0)
 Total runtime: 62567.626 ms
 
a=# set enable_seqscan to off;
SET
=# explain analyze select brand_id from brands where exists (select 1 from models_brands where brand = brands.brand_id);
                                                                      QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on brands  (cost=100000000.00..100000715.90 rows=152 width=4) (actual time=0.615..3.710 rows=4 loops=1)
   Filter: (subplan)
   SubPlan
     ->  Index Scan using models_brands_brand on models_brands  (cost=0.00..216410.97 rows=92372 width=0) (actual time=0.008..0.008 rows=0 loops=303)
           Index Cond: (brand = $0)
 Total runtime: 3.790 ms
 
 
It was also tried to similar results with a LIMIT 1 in the subquery for exist.
 
More...
 
Seqscan still off..
 
 
=# explain analyze select distinct brand_id from brands inner join models_brands on (brand_id = brand);
                                                                           QUERY PLAN                                                                       
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=0.00..867782.58 rows=303 width=4) (actual time=0.391..4898.579 rows=4 loops=1)
   ->  Merge Join  (cost=0.00..866858.85 rows=369489 width=4) (actual time=0.383..3749.771 rows=369489 loops=1)
         Merge Cond: ("outer".brand_id = "inner".brand)
         ->  Index Scan using brands_pkey on brands  (cost=0.00..15.53 rows=303 width=4) (actual time=0.080..0.299 rows=60 loops=1)
         ->  Index Scan using models_brands_brand on models_brands  (cost=0.00..862236.96 rows=369489 width=4) (actual time=0.013..1403.175 rows=369489 loops=1)
 Total runtime: 4898.697 ms
 
=# set enable_seqscan to on;
SET
=# explain analyze select distinct brand_id from brands inner join models_brands on (brand_id = brand);
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=46300.70..52770.04 rows=303 width=4) (actual time=3742.046..8560.833 rows=4 loops=1)
   ->  Merge Join  (cost=46300.70..51846.32 rows=369489 width=4) (actual time=3742.035..7406.677 rows=369489 loops=1)
         Merge Cond: ("outer".brand_id = "inner".brand)
         ->  Index Scan using brands_pkey on brands  (cost=0.00..15.53 rows=303 width=4) (actual time=0.077..0.407 rows=60 loops=1)
         ->  Sort  (cost=46300.70..47224.43 rows=369489 width=4) (actual time=3741.584..5051.348 rows=369489 loops=1)
               Sort Key: models_brands.brand
               ->  Seq Scan on models_brands  (cost=0.00..6411.89 rows=369489 width=4) (actual time=0.027..1346.178 rows=369489 loops=1)
 Total runtime: 8589.502 ms
(8 rows)
 
 
Hope that helps
 
Kevin McArthur
 
 
 
 
 

 
 

pgsql-performance by date:

Previous
From: Bruno Wolff III
Date:
Subject: Re: BUG #2543: Performance delay acrros the same day
Next
From: Kevin Keith
Date:
Subject: Partitioned tables in queries