Hello list,
I have a query which is running faster with set
enable_seqscan=off and am wondering why this would be. The
data is a web traffic database with the following relevant
tables.
Table "public.urlbase"
Column | Type | Modifiers
----------+------------------------+---------------------------------------------------------
id | integer | not null default nextval('public.urlbase_id_seq'::text)
protocol | character varying(32) | not null default 'http://'::character varying
host | character varying(256) | not null
Indexes:
"urlbase_pkey" primary key, btree (id)
"urlbase_protocol_key" unique, btree (protocol, host)
Table "public.urls"
Column | Type | Modifiers
------------+---------+------------------------------------------------------
id | integer | not null default nextval('public.urls_id_seq'::text)
urlbase_id | integer | not null
path | text | not null
Indexes:
"urls_pkey" primary key, btree (id)
"urls_base_id_key" unique, btree (urlbase_id, "path")
Foreign-key constraints:
"$1" FOREIGN KEY (urlbase_id) REFERENCES urlbase(id) ON UPDATE CASCADE ON DELETE CASCADE
Table "public.hits"
Column | Type | Modifiers
------------+------------------------+--------------------------------------------------------
ip | inet | not null
url_id | integer | not null
javascript | boolean | not null default false
browser_id | integer |
hit_date | date | not null default ('now'::text)::date
hit_time | time(0) with time zone | not null default ('now'::text)::time(6) with time zone
Indexes:
"hits_hit_date" btree (hit_date)
"url_id" btree (url_id)
Foreign-key constraints:
"$3" FOREIGN KEY (browser_id) REFERENCES browsers(id) ON UPDATE CASCADE ON DELETE SET NULL
"$1" FOREIGN KEY (url_id) REFERENCES urls(id) ON UPDATE CASCADE ON DELETE CASCADE
The query is a basic stats ququestion, how many visits did
a particular site receive grouped by the date. With
enable_seqscan=yes, the query takes quite a bit longer than
with enable_seqscan=off. From 84 sec to 50 sec on larger
sites and from 15 sec to 5 sec on smaller sites.
set enable_seqscan=yes
explain select hit_date,count(*) as total
from hits inner join urls on url_id=urls.id
inner join urlbase on urlbase_id=urlbase.id
where protocol='http://' and host='www.cm.nu'
and hit_date >= '2005-01-01'
group by hit_date
order by hit_date
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Sort (cost=204398.73..204399.45 rows=287 width=4)
Sort Key: hits.hit_date
-> HashAggregate (cost=204386.29..204387.01 rows=287 width=4)
-> Hash Join (cost=26976.68..204328.54 rows=11550 width=4)
Hash Cond: ("outer".url_id = "inner".id)
-> Seq Scan on hits (cost=0.00..165226.88 rows=2401897 width=8)
Filter: (hit_date >= '2005-01-01'::date)
-> Hash (cost=26966.00..26966.00 rows=4272 width=4)
-> Hash Join (cost=4.68..26966.00 rows=4272 width=4)
Hash Cond: ("outer".urlbase_id = "inner".id)
-> Seq Scan on urls (cost=0.00..22476.40 rows=888440 width=8)
-> Hash (cost=4.68..4.68 rows=1 width=4)
-> Index Scan using urlbase_protocol_key on urlbase (cost=0.00..4.68 rows=1 width=4)
Index Cond: (((protocol)::text = 'http://'::text) AND ((host)::text =
'www.cm.nu'::text))
(14 rows)
set enable_seqscan=off;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Sort (cost=6578335.31..6578336.03 rows=287 width=4)
Sort Key: hits.hit_date
-> HashAggregate (cost=6578322.87..6578323.59 rows=287 width=4)
-> Nested Loop (cost=0.00..6578265.12 rows=11550 width=4)
-> Nested Loop (cost=0.00..183390.60 rows=4272 width=4)
-> Index Scan using urlbase_protocol_key on urlbase (cost=0.00..4.68 rows=1 width=4)
Index Cond: (((protocol)::text = 'http://'::text) AND ((host)::text = 'www.cm.nu'::text))
-> Index Scan using urls_base_id_key on urls (cost=0.00..182768.94 rows=49358 width=8)
Index Cond: (urls.urlbase_id = "outer".id)
-> Index Scan using url_id on hits (cost=0.00..1495.23 rows=136 width=8)
Index Cond: (hits.url_id = "outer".id)
Filter: (hit_date >= '2005-01-01'::date)
(12 rows)
Any assistance would be appreicated. It's not an important
query, that is it isn't executed very often but I am
curious why this would be happening and what I am doing
wrong.
Best regards,
Shane
--
Shane Wegner
http://www.cm.nu/~shane/