Index usage without enable_seqscan - Mailing list pgsql-general
From | Shane |
---|---|
Subject | Index usage without enable_seqscan |
Date | |
Msg-id | 20050530160444.GA21175@cm.nu Whole thread Raw |
List | pgsql-general |
Hello list, I have a query which is running faster with set enable_seqscan=off and am wondering why this would be. The data is a web traffic database with the following relevant tables. Table "public.urlbase" Column | Type | Modifiers ----------+------------------------+--------------------------------------------------------- id | integer | not null default nextval('public.urlbase_id_seq'::text) protocol | character varying(32) | not null default 'http://'::character varying host | character varying(256) | not null Indexes: "urlbase_pkey" primary key, btree (id) "urlbase_protocol_key" unique, btree (protocol, host) Table "public.urls" Column | Type | Modifiers ------------+---------+------------------------------------------------------ id | integer | not null default nextval('public.urls_id_seq'::text) urlbase_id | integer | not null path | text | not null Indexes: "urls_pkey" primary key, btree (id) "urls_base_id_key" unique, btree (urlbase_id, "path") Foreign-key constraints: "$1" FOREIGN KEY (urlbase_id) REFERENCES urlbase(id) ON UPDATE CASCADE ON DELETE CASCADE Table "public.hits" Column | Type | Modifiers ------------+------------------------+-------------------------------------------------------- ip | inet | not null url_id | integer | not null javascript | boolean | not null default false browser_id | integer | hit_date | date | not null default ('now'::text)::date hit_time | time(0) with time zone | not null default ('now'::text)::time(6) with time zone Indexes: "hits_hit_date" btree (hit_date) "url_id" btree (url_id) Foreign-key constraints: "$3" FOREIGN KEY (browser_id) REFERENCES browsers(id) ON UPDATE CASCADE ON DELETE SET NULL "$1" FOREIGN KEY (url_id) REFERENCES urls(id) ON UPDATE CASCADE ON DELETE CASCADE The query is a basic stats ququestion, how many visits did a particular site receive grouped by the date. With enable_seqscan=yes, the query takes quite a bit longer than with enable_seqscan=off. From 84 sec to 50 sec on larger sites and from 15 sec to 5 sec on smaller sites. set enable_seqscan=yes explain select hit_date,count(*) as total from hits inner join urls on url_id=urls.id inner join urlbase on urlbase_id=urlbase.id where protocol='http://' and host='www.cm.nu' and hit_date >= '2005-01-01' group by hit_date order by hit_date QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- Sort (cost=204398.73..204399.45 rows=287 width=4) Sort Key: hits.hit_date -> HashAggregate (cost=204386.29..204387.01 rows=287 width=4) -> Hash Join (cost=26976.68..204328.54 rows=11550 width=4) Hash Cond: ("outer".url_id = "inner".id) -> Seq Scan on hits (cost=0.00..165226.88 rows=2401897 width=8) Filter: (hit_date >= '2005-01-01'::date) -> Hash (cost=26966.00..26966.00 rows=4272 width=4) -> Hash Join (cost=4.68..26966.00 rows=4272 width=4) Hash Cond: ("outer".urlbase_id = "inner".id) -> Seq Scan on urls (cost=0.00..22476.40 rows=888440 width=8) -> Hash (cost=4.68..4.68 rows=1 width=4) -> Index Scan using urlbase_protocol_key on urlbase (cost=0.00..4.68 rows=1 width=4) Index Cond: (((protocol)::text = 'http://'::text) AND ((host)::text = 'www.cm.nu'::text)) (14 rows) set enable_seqscan=off; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Sort (cost=6578335.31..6578336.03 rows=287 width=4) Sort Key: hits.hit_date -> HashAggregate (cost=6578322.87..6578323.59 rows=287 width=4) -> Nested Loop (cost=0.00..6578265.12 rows=11550 width=4) -> Nested Loop (cost=0.00..183390.60 rows=4272 width=4) -> Index Scan using urlbase_protocol_key on urlbase (cost=0.00..4.68 rows=1 width=4) Index Cond: (((protocol)::text = 'http://'::text) AND ((host)::text = 'www.cm.nu'::text)) -> Index Scan using urls_base_id_key on urls (cost=0.00..182768.94 rows=49358 width=8) Index Cond: (urls.urlbase_id = "outer".id) -> Index Scan using url_id on hits (cost=0.00..1495.23 rows=136 width=8) Index Cond: (hits.url_id = "outer".id) Filter: (hit_date >= '2005-01-01'::date) (12 rows) Any assistance would be appreicated. It's not an important query, that is it isn't executed very often but I am curious why this would be happening and what I am doing wrong. Best regards, Shane -- Shane Wegner http://www.cm.nu/~shane/
pgsql-general by date: