Thread: Query performance question
Hi, I'm having trouble with an obviously simple query that just doesn't perform quite good IMO. I have two tables: httplog=# \d hits Table "hits" Attribute | Type | Modifier -------------+-----------+----------------------------------------------- id | integer | not null default nextval('hits_id_seq'::text) page_id | integer | not null referrer_id | integer | not null ip_addr | inet | not null time | timestamp | not null domain_id | integer | not null Index: hits_pkey httplog=# \d referrer Table "referrer" Attribute | Type | Modifier -----------+--------------+---------- id | integer | url | varchar(300) | Index: referrer_pkey These are part of an HTTP-log database. Table 'hits' has about 7000 rows, table 'referrer' has about 350 rows. Now I want to know what the top ten referrers are, and I issue this query: SELECT count(*), url FROM hits, referrer WHERE referrer.id = referrer_id GROUP BY url ORDER BY count DESC LIMIT 10; And this query takes almost five seconds to complete, and that seems to me quite slow. The database is running on a Duron 700 with 256 MB RAM and ATA-66 disks. Operating system is FreeBSD-4.2. Are there some tuning knobs I could turn? regards, le -- Lukas Ertl eMail: l.ertl@univie.ac.at WWW-Redaktion Tel.: (+43 1) 4277-14073 Zentraler Informatikdienst (ZID) Fax.: (+43 1) 4277-9140 der Universität Wien
On Wed, 28 Mar 2001, Lukas Ertl wrote: There are two things you should do first: 1) VACUUM ANALYZE hits; 2) VACUUM ANALYZE referrer; If it's still running slowly, try EXPLAINing your SELECT to us. Also, it's been my experience that LIMIT's slow down a query, though I could be dead wrong. > SELECT count(*), url FROM hits, referrer WHERE referrer.id = referrer_id > GROUP BY url ORDER BY count DESC LIMIT 10; > > And this query takes almost five seconds to complete, and that seems to me > quite slow. The database is running on a Duron 700 with 256 MB RAM and > ATA-66 disks. Operating system is FreeBSD-4.2. > > Are there some tuning knobs I could turn? -- Dave
On Wed, 28 Mar 2001, David Olbersen wrote: > There are two things you should do first: > > 1) VACUUM ANALYZE hits; > 2) VACUUM ANALYZE referrer; I did that before, too. > If it's still running slowly, try EXPLAINing your SELECT to us. Ok, VACUUMing the tables didn't increase performance. This is the EXPLAIN output: httplog=# EXPLAIN SELECT count(*), url FROM hits, referrer WHERE referrer.id = referrer_id GROUP BY url ORDER BY count DESC LIMIT 10; NOTICE: QUERY PLAN: Sort (cost=912.81..912.81 rows=678 width=20) -> Aggregate (cost=846.98..880.90 rows=678 width=20) -> Group (cost=846.98..863.94 rows=6785 width=20) -> Sort (cost=846.98..846.98 rows=6785 width=20) -> Hash Join (cost=10.46..415.17 rows=6785 width=20) -> Seq Scan on hits (cost=0.00..128.85 rows=6785 width=4) -> Hash (cost=9.57..9.57 rows=357 width=16) -> Seq Scan on referrer (cost=0.00..9.57 rows=357 width=16) EXPLAIN I hope someone can explain this to me :) Thanks for your quick answer, regards, le -- Lukas Ertl eMail: l.ertl@univie.ac.at WWW-Redaktion Tel.: (+43 1) 4277-14073 Zentraler Informatikdienst (ZID) Fax.: (+43 1) 4277-9140 der Universität Wien
On Thu, 29 Mar 2001, Vijay Deval wrote: > URL is a very large field. If an extra field is created which gives a > neumeric id to the url, run the query on the number, and then get the desired > output I have to admit, that I quite don't understand what you want me to do here... I already have that numeric id: > > httplog=# \d referrer > > Table "referrer" > > Attribute | Type | Modifier > > -----------+--------------+---------- > > id | integer | > > url | varchar(300) | > > Index: referrer_pkey If I say: SELECT count(*), referrer.id FROM hits, referrer WHERE referrer.id = referrer_id GROUP BY referrer.id ORDER BY count DESC LIMIT 10; it is really fast and I get the IDs of the referrer-urls but how would I get the URLs then? regards, le -- Lukas Ertl eMail: l.ertl@univie.ac.at WWW-Redaktion Tel.: (+43 1) 4277-14073 Zentraler Informatikdienst (ZID) Fax.: (+43 1) 4277-9140 der Universität Wien
URL is a very large field. If an extra field is created which gives a neumeric id to the url, run the query on the number, and then get the desired output Vijay Lukas Ertl wrote: > Hi, > > I'm having trouble with an obviously simple query that just doesn't > perform quite good IMO. > > I have two tables: > > httplog=# \d hits > Table "hits" > Attribute | Type | Modifier > -------------+-----------+----------------------------------------------- > id | integer | not null default nextval('hits_id_seq'::text) > page_id | integer | not null > referrer_id | integer | not null > ip_addr | inet | not null > time | timestamp | not null > domain_id | integer | not null > Index: hits_pkey > > httplog=# \d referrer > Table "referrer" > Attribute | Type | Modifier > -----------+--------------+---------- > id | integer | > url | varchar(300) | > Index: referrer_pkey > > These are part of an HTTP-log database. Table 'hits' has about 7000 > rows, table 'referrer' has about 350 rows. Now I want to know what the top > ten referrers are, and I issue this query: > > SELECT count(*), url FROM hits, referrer WHERE referrer.id = referrer_id > GROUP BY url ORDER BY count DESC LIMIT 10; > >
i'd suggest either using an MD5 hash or some sort of CRC definitely a hash though On Thu, 29 Mar 2001, Vijay Deval wrote: > URL is a very large field. If an extra field is created which gives a > neumeric id to the url, run the query on the number, and then get the desired > output > > Vijay > > Lukas Ertl wrote: > > > Hi, > > > > I'm having trouble with an obviously simple query that just doesn't > > perform quite good IMO. > > > > I have two tables: > > > > httplog=# \d hits > > Table "hits" > > Attribute | Type | Modifier > > -------------+-----------+----------------------------------------------- > > id | integer | not null default nextval('hits_id_seq'::text) > > page_id | integer | not null > > referrer_id | integer | not null > > ip_addr | inet | not null > > time | timestamp | not null > > domain_id | integer | not null > > Index: hits_pkey > > > > httplog=# \d referrer > > Table "referrer" > > Attribute | Type | Modifier > > -----------+--------------+---------- > > id | integer | > > url | varchar(300) | > > Index: referrer_pkey > > > > These are part of an HTTP-log database. Table 'hits' has about 7000 > > rows, table 'referrer' has about 350 rows. Now I want to know what the top > > ten referrers are, and I issue this query: > > > > SELECT count(*), url FROM hits, referrer WHERE referrer.id = referrer_id > > GROUP BY url ORDER BY count DESC LIMIT 10; > > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > ----------------------------------------------------------------------------- david@backpack.com BackPack Software, Inc. www.backpack.com +1 651.645.7550 voice "Life is an Adventure. +1 651.645.9798 fax Don't forget your BackPack!" -----------------------------------------------------------------------------
Thanks for all of your suggestions, I found a way to speed up the queries using VIEWs: -- CREATE VIEW hits_per_referrer AS SELECT count(hits.id) AS anzahl, hits.referrer_id FROM hits GROUP BY hits.referrer_id; SELECT anzahl, url FROM hits_per_referrer, referrer WHERE referrer_id = referrer.id ORDER BY anzahl DESC LIMIT 10; -- This is really _a lot_ faster, as this EXPLAIN statement proves: -- httplog=# EXPLAIN SELECT anzahl, url FROM hits_per_referrer, referrer WHERE referrer_id = referrer.id ORDER BY anzahl DESC LIMIT 10; NOTICE: QUERY PLAN: Sort (cost=254.66..254.66 rows=1000 width=24) -> Aggregate (cost=69.83..204.83 rows=1000 width=24) -> Group (cost=69.83..179.83 rows=10000 width=24) -> Merge Join (cost=69.83..154.83 rows=10000 width=24) -> Index Scan using referrer_pkey on referrer (cost=0.00..60.00 rows=1000 width=16) -> Sort (cost=69.83..69.83 rows=1000 width=8) -> Seq Scan on hits (cost=0.00..20.00 rows=1000 width=8) EXPLAIN -- Thanks again, regards, le On Thu, 29 Mar 2001, D. Duccini wrote: > > i'd suggest either using an MD5 hash or some sort of CRC > > definitely a hash though > > > On Thu, 29 Mar 2001, Vijay Deval wrote: > > > URL is a very large field. If an extra field is created which gives a > > neumeric id to the url, run the query on the number, and then get the desired > > output > > > > Vijay > > > > Lukas Ertl wrote: > > > > > Hi, > > > > > > I'm having trouble with an obviously simple query that just doesn't > > > perform quite good IMO. > > > > > > I have two tables: > > > > > > httplog=# \d hits > > > Table "hits" > > > Attribute | Type | Modifier > > > -------------+-----------+----------------------------------------------- > > > id | integer | not null default nextval('hits_id_seq'::text) > > > page_id | integer | not null > > > referrer_id | integer | not null > > > ip_addr | inet | not null > > > time | timestamp | not null > > > domain_id | integer | not null > > > Index: hits_pkey > > > > > > httplog=# \d referrer > > > Table "referrer" > > > Attribute | Type | Modifier > > > -----------+--------------+---------- > > > id | integer | > > > url | varchar(300) | > > > Index: referrer_pkey > > > > > > These are part of an HTTP-log database. Table 'hits' has about 7000 > > > rows, table 'referrer' has about 350 rows. Now I want to know what the top > > > ten referrers are, and I issue this query: > > > > > > SELECT count(*), url FROM hits, referrer WHERE referrer.id = referrer_id > > > GROUP BY url ORDER BY count DESC LIMIT 10; -- Lukas Ertl eMail: l.ertl@univie.ac.at WWW-Redaktion Tel.: (+43 1) 4277-14073 Zentraler Informatikdienst (ZID) Fax.: (+43 1) 4277-9140 der Universität Wien
As the tables are linked by WHERE hits.referrer_id =referrer.id just adding to select referrer.url should display url I did not see right away that referrer.id was unique. Hence the suggestion to add a unique ID for the tuples. Vijay Lukas Ertl wrote: > On Thu, 29 Mar 2001, Vijay Deval wrote: > > > URL is a very large field. If an extra field is created which gives a > > neumeric id to the url, run the query on the number, and then get the desired > > output > > I have to admit, that I quite don't understand what you want me to do > here... I already have that numeric id: > > > > httplog=# \d referrer > > > Table "referrer" > > > Attribute | Type | Modifier > > > -----------+--------------+---------- > > > id | integer | > > > url | varchar(300) | > > > Index: referrer_pkey > > If I say: > > SELECT count(*), referrer.id FROM hits, referrer WHERE referrer.id = > referrer_id GROUP BY referrer.id ORDER BY count DESC LIMIT 10; > > it is really fast and I get the IDs of the referrer-urls but how would I > get the URLs then? > > regards, > le > > -- > Lukas Ertl eMail: l.ertl@univie.ac.at > WWW-Redaktion Tel.: (+43 1) 4277-14073 > Zentraler Informatikdienst (ZID) Fax.: (+43 1) 4277-9140 > der Universität Wien