Home > mailing lists

Bad Planner Statistics for Uneven distribution. - Mailing list pgsql-performance

From	Kevin McArthur
Subject	Bad Planner Statistics for Uneven distribution.
Date	July 21, 2006 15:46:25
Msg-id	003b01c6acf1$a66d8910$0701a8c0@kdesktop Whole thread Raw
Responses	Re: Bad Planner Statistics for Uneven distribution.
List	pgsql-performance

Tree view

I discussed this with a few members of #postgresql freenode this morning. I'll keep it breif; [note: i have cleaned out columns not relevant]

I have two tables, brands and models_brands. The first has about 300 records, the later about 350,000 records. The number of distinct brands in the models_brands table is < 10.

=# \d models_brands
        Table "public.models_brands"
Column |         Type          | Modifiers
--------+-----------------------+-----------
model | integer               | not null
brand | integer               | not null
Indexes:
    "models_brands_brand" btree (brand)
Foreign-key constraints:
    "models_brands_brand_fkey" FOREIGN KEY (brand) REFERENCES brands(brand_id) ON UPDATE CASCADE ON DELETE CASCADE
    "models_brands_model_fkey" FOREIGN KEY (model) REFERENCES models(model_id) ON UPDATE CASCADE ON DELETE CASCADE

a=# \d brands;
                                      Table "public.brands"
   Column   |          Type          |                         Modifiers
------------+------------------------+-----------------------------------------------------------
brand_id   | integer                | not null default nextval('brands_brand_id_seq'::regclass)
brand_name | character varying(255) | not null
Indexes:
    "brands_pkey" PRIMARY KEY, btree (brand_id)

Now the plans/problems..

=# set enable_seqscan to on;
SET
=# explain analyze select distinct brand from models_brands;
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Unique (cost=46300.70..48148.15 rows=4 width=4) (actual time=3699.691..6215.216 rows=4 loops=1)
   -> Sort (cost=46300.70..47224.43 rows=369489 width=4) (actual time=3699.681..5027.069 rows=369489 loops=1)
         Sort Key: brand
         -> Seq Scan on models_brands (cost=0.00..6411.89 rows=369489 width=4) (actual time=0.040..1352.997 rows=369489 loops=1)
Total runtime: 6223.666 ms
(5 rows)

=# set enable_seqscan to off;
SET
=# explain analyze select distinct brand from models_brands;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=0.00..863160.68 rows=4 width=4) (actual time=0.131..2584.779 rows=4 loops=1)
-> Index Scan using models_brands_brand on models_brands (cost=0.00..862236.96 rows=369489 width=4) (actual time=0.122..1440.809 rows=369489 loops=1)
Total runtime: 2584.871 ms
(3 rows)

Picks the wrong plan here. Should pick the index with seqscanning enabled.

More (as a different wording/query)... (as suggested by others on irc)

=# set enable_seqscan to on;
SET
=# explain analyze select brand_id from brands where exists (select 1 from models_brands where brand = brands.brand_id);
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Seq Scan on brands (cost=0.00..30.09 rows=152 width=4) (actual time=7742.460..62567.543 rows=4 loops=1)
   Filter: (subplan)
   SubPlan
     -> Seq Scan on models_brands (cost=0.00..7335.61 rows=92372 width=0) (actual time=206.467..206.467 rows=0 loops=303)
           Filter: (brand = $0)
Total runtime: 62567.626 ms

a=# set enable_seqscan to off;
SET

=# explain analyze select brand_id from brands where exists (select 1 from models_brands where brand = brands.brand_id);
                                                                      QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on brands (cost=100000000.00..100000715.90 rows=152 width=4) (actual time=0.615..3.710 rows=4 loops=1)
   Filter: (subplan)
   SubPlan
     -> Index Scan using models_brands_brand on models_brands (cost=0.00..216410.97 rows=92372 width=0) (actual time=0.008..0.008 rows=0 loops=303)
           Index Cond: (brand = $0)
Total runtime: 3.790 ms

It was also tried to similar results with a LIMIT 1 in the subquery for exist.

More...

Seqscan still off..

=# explain analyze select distinct brand_id from brands inner join models_brands on (brand_id = brand);
                                                                           QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=0.00..867782.58 rows=303 width=4) (actual time=0.391..4898.579 rows=4 loops=1)
   -> Merge Join (cost=0.00..866858.85 rows=369489 width=4) (actual time=0.383..3749.771 rows=369489 loops=1)
         Merge Cond: ("outer".brand_id = "inner".brand)
         -> Index Scan using brands_pkey on brands (cost=0.00..15.53 rows=303 width=4) (actual time=0.080..0.299 rows=60 loops=1)
         -> Index Scan using models_brands_brand on models_brands (cost=0.00..862236.96 rows=369489 width=4) (actual time=0.013..1403.175 rows=369489 loops=1)
Total runtime: 4898.697 ms

=# set enable_seqscan to on;
SET
=# explain analyze select distinct brand_id from brands inner join models_brands on (brand_id = brand);
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=46300.70..52770.04 rows=303 width=4) (actual time=3742.046..8560.833 rows=4 loops=1)
   -> Merge Join (cost=46300.70..51846.32 rows=369489 width=4) (actual time=3742.035..7406.677 rows=369489 loops=1)
         Merge Cond: ("outer".brand_id = "inner".brand)
         -> Index Scan using brands_pkey on brands (cost=0.00..15.53 rows=303 width=4) (actual time=0.077..0.407 rows=60 loops=1)
         -> Sort (cost=46300.70..47224.43 rows=369489 width=4) (actual time=3741.584..5051.348 rows=369489 loops=1)
               Sort Key: models_brands.brand
               -> Seq Scan on models_brands (cost=0.00..6411.89 rows=369489 width=4) (actual time=0.027..1346.178 rows=369489 loops=1)
Total runtime: 8589.502 ms
(8 rows)

Hope that helps

Kevin McArthur

pgsql-performance by date:

From: Bruno Wolff III
Date: 21 July 2006, 15:28:16
Subject: Re: BUG #2543: Performance delay acrros the same day

From: Kevin Keith
Date: 21 July 2006, 16:17:47
Subject: Partitioned tables in queries

Bad Planner Statistics for Uneven distribution. - Mailing list pgsql-performance

Previous

Next