Re: Query performance question - Mailing list pgsql-novice
From | Lukas Ertl |
---|---|
Subject | Re: Query performance question |
Date | |
Msg-id | 20010329232607.O547-100000@localhost.localdomain Whole thread Raw |
In response to | Re: Query performance question ("D. Duccini" <duccini@backpack.com>) |
List | pgsql-novice |
Thanks for all of your suggestions, I found a way to speed up the queries using VIEWs: -- CREATE VIEW hits_per_referrer AS SELECT count(hits.id) AS anzahl, hits.referrer_id FROM hits GROUP BY hits.referrer_id; SELECT anzahl, url FROM hits_per_referrer, referrer WHERE referrer_id = referrer.id ORDER BY anzahl DESC LIMIT 10; -- This is really _a lot_ faster, as this EXPLAIN statement proves: -- httplog=# EXPLAIN SELECT anzahl, url FROM hits_per_referrer, referrer WHERE referrer_id = referrer.id ORDER BY anzahl DESC LIMIT 10; NOTICE: QUERY PLAN: Sort (cost=254.66..254.66 rows=1000 width=24) -> Aggregate (cost=69.83..204.83 rows=1000 width=24) -> Group (cost=69.83..179.83 rows=10000 width=24) -> Merge Join (cost=69.83..154.83 rows=10000 width=24) -> Index Scan using referrer_pkey on referrer (cost=0.00..60.00 rows=1000 width=16) -> Sort (cost=69.83..69.83 rows=1000 width=8) -> Seq Scan on hits (cost=0.00..20.00 rows=1000 width=8) EXPLAIN -- Thanks again, regards, le On Thu, 29 Mar 2001, D. Duccini wrote: > > i'd suggest either using an MD5 hash or some sort of CRC > > definitely a hash though > > > On Thu, 29 Mar 2001, Vijay Deval wrote: > > > URL is a very large field. If an extra field is created which gives a > > neumeric id to the url, run the query on the number, and then get the desired > > output > > > > Vijay > > > > Lukas Ertl wrote: > > > > > Hi, > > > > > > I'm having trouble with an obviously simple query that just doesn't > > > perform quite good IMO. > > > > > > I have two tables: > > > > > > httplog=# \d hits > > > Table "hits" > > > Attribute | Type | Modifier > > > -------------+-----------+----------------------------------------------- > > > id | integer | not null default nextval('hits_id_seq'::text) > > > page_id | integer | not null > > > referrer_id | integer | not null > > > ip_addr | inet | not null > > > time | timestamp | not null > > > domain_id | integer | not null > > > Index: hits_pkey > > > > > > httplog=# \d referrer > > > Table "referrer" > > > Attribute | Type | Modifier > > > -----------+--------------+---------- > > > id | integer | > > > url | varchar(300) | > > > Index: referrer_pkey > > > > > > These are part of an HTTP-log database. Table 'hits' has about 7000 > > > rows, table 'referrer' has about 350 rows. Now I want to know what the top > > > ten referrers are, and I issue this query: > > > > > > SELECT count(*), url FROM hits, referrer WHERE referrer.id = referrer_id > > > GROUP BY url ORDER BY count DESC LIMIT 10; -- Lukas Ertl eMail: l.ertl@univie.ac.at WWW-Redaktion Tel.: (+43 1) 4277-14073 Zentraler Informatikdienst (ZID) Fax.: (+43 1) 4277-9140 der Universität Wien
pgsql-novice by date: