Planner choice & tuning - Mailing list pgsql-general
From | Ed L. |
---|---|
Subject | Planner choice & tuning |
Date | |
Msg-id | 200406221647.47507.pgsql@bluepolka.net Whole thread Raw |
Responses |
Re: Planner choice & tuning
Re: Planner choice & tuning |
List | pgsql-general |
The planner is choosing a sequential scan for my query. I am trying to understand why since it is clearly not the fastest choice, and what the proper tuning dial is to adjust here. Here's the query: select count(1) from measurement where measure_id = 98; TIA. Here are the details: % psql -c "\d measurement" Table "public.measurement" Column | Type | Modifiers ------------+--------------------------+------------------------------------------------------------- id | integer | not null default nextval('public.measurement_id_seq'::text) host_id | integer | not null measure_id | integer | not null value | double precision | not null duration | double precision | not null default 0.0 moment | timestamp with time zone | not null default ('now'::text)::timestamp(6) with time zone source_id | integer | not null default 1 Indexes: measurement_duration_idx btree (duration), measurement_host_idx btree (host_id), measurement_measure_idx btree (measure_id), measurement_moment_idx btree (moment), measurement_source_idx btree (source_id), measurement_value_idx btree (value) Foreign Key constraints: $2 FOREIGN KEY (measure_id) REFERENCES measure(id) ON UPDATE NO ACTION ON DELETE NO ACTION, $1 FOREIGN KEY (source_id) REFERENCES measurement_source(id) ON UPDATE NO ACTION ON DELETE NO ACTION % psql -c "analyze measurement" ANALYZE % psql -c "select count(1) from measurement" count --------- 1800866 (1 row) % psql -c "select count(1) from measurement where measure_id = 98" count ------- 38862 (1 row) % time psql -c "set enable_seqscan=no; explain analyze select count(1) from measurement where measure_id = 98" QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=215208.66..215208.66 rows=1 width=0) (actual time=904.58..904.58 rows=1 loops=1) -> Index Scan using measurement_measure_idx on measurement (cost=0.00..215062.64 rows=58408 width=0) (actual time=0.17..843.78rows=38866 loops=1) Index Cond: (measure_id = 98) Total runtime: 904.77 msec (4 rows) real 0m1.298s user 0m0.010s sys 0m0.000s % time psql -c "explain analyze select count(1) from measurement where measure_id = 98" QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=97857.09..97857.09 rows=1 width=0) (actual time=2493.90..2493.90 rows=1 loops=1) -> Seq Scan on measurement (cost=0.00..97711.07 rows=58408 width=0) (actual time=12.94..2430.08 rows=38866 loops=1) Filter: (measure_id = 98) Total runtime: 2494.11 msec (4 rows) real 0m2.885s user 0m0.000s sys 0m0.000s This seems to be saying the planner thinks its less expensive to do the sequential scan, but why? Including pg_stats data in case it is relevant here. % psql -c "select * from pg_stats where tablename = 'measurement' and attname = 'measure_id'" schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation ------------+-------------+------------+-----------+-----------+------------+-------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------+------------- public | measurement | measure_id | 0 | 4 | 52 | {104,108,113,124,106,110,101,107,112,109} |{0.084,0.0556667,0.052,0.05,0.048,0.0473333,0.0383333,0.0363333,0.034,0.0326667} | {23,36,39,43,85,89,100,111,120,122,128}| 0.232087 (1 row) % psql -c "select name, setting from pg_settings where name like 'random%'" name | setting ------------------+--------- random_page_cost | 4 (1 row)
pgsql-general by date: