Planner choice & tuning - Mailing list pgsql-general

From Ed L.
Subject Planner choice & tuning
Date
Msg-id 200406221647.47507.pgsql@bluepolka.net
Whole thread Raw
Responses Re: Planner choice & tuning  ("Scott Marlowe" <smarlowe@qwest.net>)
Re: Planner choice & tuning  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
The planner is choosing a sequential scan for my query.  I am trying
to understand why since it is clearly not the fastest choice, and
what the proper tuning dial is to adjust here.  Here's the query:

    select count(1) from measurement where measure_id = 98;

TIA.  Here are the details:

% psql -c "\d measurement"
                                     Table "public.measurement"
   Column   |           Type           |                          Modifiers
------------+--------------------------+-------------------------------------------------------------
 id         | integer                  | not null default nextval('public.measurement_id_seq'::text)
 host_id    | integer                  | not null
 measure_id | integer                  | not null
 value      | double precision         | not null
 duration   | double precision         | not null default 0.0
 moment     | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone
 source_id  | integer                  | not null default 1
Indexes: measurement_duration_idx btree (duration),
         measurement_host_idx btree (host_id),
         measurement_measure_idx btree (measure_id),
         measurement_moment_idx btree (moment),
         measurement_source_idx btree (source_id),
         measurement_value_idx btree (value)
Foreign Key constraints: $2 FOREIGN KEY (measure_id) REFERENCES measure(id) ON UPDATE NO ACTION ON DELETE NO ACTION,
                         $1 FOREIGN KEY (source_id) REFERENCES measurement_source(id) ON UPDATE NO ACTION ON DELETE NO
ACTION

% psql -c "analyze measurement"
ANALYZE

% psql -c "select count(1) from measurement"
  count
---------
 1800866
(1 row)


% psql -c "select count(1) from measurement where measure_id = 98"
 count
-------
 38862
(1 row)

% time psql -c "set enable_seqscan=no; explain analyze select count(1) from measurement where measure_id = 98"
                                                                       QUERY PLAN
                                

--------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=215208.66..215208.66 rows=1 width=0) (actual time=904.58..904.58 rows=1 loops=1)
   ->  Index Scan using measurement_measure_idx on measurement  (cost=0.00..215062.64 rows=58408 width=0) (actual
time=0.17..843.78rows=38866 loops=1) 
         Index Cond: (measure_id = 98)
 Total runtime: 904.77 msec
(4 rows)


real    0m1.298s
user    0m0.010s
sys     0m0.000s

% time psql -c "explain analyze select count(1) from measurement where measure_id = 98"
                                                       QUERY PLAN
 

-------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=97857.09..97857.09 rows=1 width=0) (actual time=2493.90..2493.90 rows=1 loops=1)
   ->  Seq Scan on measurement  (cost=0.00..97711.07 rows=58408 width=0) (actual time=12.94..2430.08 rows=38866
loops=1)
         Filter: (measure_id = 98)
 Total runtime: 2494.11 msec
(4 rows)


real    0m2.885s
user    0m0.000s
sys     0m0.000s


This seems to be saying the planner thinks its less expensive to do the
sequential scan, but why?

Including pg_stats data in case it is relevant here.

% psql -c "select * from pg_stats where tablename = 'measurement' and attname = 'measure_id'"
 schemaname |  tablename  |  attname   | null_frac | avg_width | n_distinct |             most_common_vals
|                               most_common_freqs                                 |            histogram_bounds
   | correlation  

------------+-------------+------------+-----------+-----------+------------+-------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------+-------------
 public     | measurement | measure_id |         0 |         4 |         52 | {104,108,113,124,106,110,101,107,112,109}
|{0.084,0.0556667,0.052,0.05,0.048,0.0473333,0.0383333,0.0363333,0.034,0.0326667} |
{23,36,39,43,85,89,100,111,120,122,128}|    0.232087 
(1 row)

% psql -c "select name, setting from pg_settings where name like 'random%'"
       name       | setting
------------------+---------
 random_page_cost | 4
(1 row)




pgsql-general by date:

Previous
From: Doug McNaught
Date:
Subject: Re: flock user defined function
Next
From: Chris Goughnour
Date:
Subject: Re: flock user defined function