estimated rows vs. actual rows - Mailing list pgsql-performance

From Jaime Casanova
Subject estimated rows vs. actual rows
Date
Msg-id c2d9e70e0502131327c95e63d@mail.gmail.com
Whole thread Raw
Responses Re: estimated rows vs. actual rows  (Josh Berkus <josh@agliodbs.com>)
List pgsql-performance
Hi,

in the #postgresql-es channel someone shows me this:

pgsql-7.4.5 + postgis

--- begin context ---

CREATE TABLE calles (
  gid int4 NOT NULL DEFAULT nextval('public.callesstgo_gid_seq'::text),
  nombre varchar,
  inicio int4,
  termino int4,
  comuna varchar,
  ciudad varchar,
  region numeric,
  pais varchar,
  the_geom geometry,
  id_comuna numeric,
  CONSTRAINT callesstgo_pkey PRIMARY KEY (gid),
  CONSTRAINT enforce_geotype_the_geom CHECK (geometrytype(the_geom) =
'MULTILINESTRING'::text OR the_geom IS NULL),
  CONSTRAINT enforce_srid_the_geom CHECK (srid(the_geom) = -1)
)
WITH OIDS;
 
CREATE INDEX idx_region_comunas ON calles USING btree
  (id_comuna, region);

select count(*) from calles;
143902

--- end context ---
 
Ok . here is the problem (BTW, the database has been analyzed just
before this query was execured)

explain analyze
select * from calles where id_comuna = 92 and region=13;

QUERY PLAN Seq Scan on calles  (cost=0.00..7876.53 rows=2610
width=279) (actual time=182.590..454.195 rows=4612 loops=1)
  Filter: ((id_comuna = 92::numeric) AND (region = 13::numeric))
Total runtime: 456.876 ms


Why is this query using a seq scan rather than a index scan? i notice
the diff between the estimated rows and actual rows (almost 2000).

Can this affect the query plan? i think this is a problem of
statistics, am i right? if so, what can be done?

regards,
Jaime Casanova

pgsql-performance by date:

Previous
From: Patrick Meylemans
Date:
Subject: Re: Benchmark
Next
From: Josh Berkus
Date:
Subject: Re: estimated rows vs. actual rows