Thread: Query performance question

Query performance question

From
Lukas Ertl
Date:
Hi,

I'm having trouble with an obviously simple query that just doesn't
perform quite good IMO.

I have two tables:

httplog=# \d hits
                              Table "hits"
  Attribute  |   Type    |                   Modifier
-------------+-----------+-----------------------------------------------
 id          | integer   | not null default nextval('hits_id_seq'::text)
 page_id     | integer   | not null
 referrer_id | integer   | not null
 ip_addr     | inet      | not null
 time        | timestamp | not null
 domain_id   | integer   | not null
Index: hits_pkey

httplog=# \d referrer
          Table "referrer"
 Attribute |     Type     | Modifier
-----------+--------------+----------
 id        | integer      |
 url       | varchar(300) |
Index: referrer_pkey

These are part of an HTTP-log database. Table 'hits' has about 7000
rows, table 'referrer' has about 350 rows. Now I want to know what the top
ten referrers are, and I issue this query:

SELECT count(*), url FROM hits, referrer WHERE referrer.id = referrer_id
GROUP BY url ORDER BY count DESC LIMIT 10;

And this query takes almost five seconds to complete, and that seems to me
quite slow. The database is running on a Duron 700 with 256 MB RAM and
ATA-66 disks. Operating system is FreeBSD-4.2.

Are there some tuning knobs I could turn?

regards,
le

--
Lukas Ertl                          eMail: l.ertl@univie.ac.at
WWW-Redaktion                       Tel.:  (+43 1) 4277-14073
Zentraler Informatikdienst (ZID)    Fax.:  (+43 1) 4277-9140
der Universität Wien


Re: Query performance question

From
David Olbersen
Date:
On Wed, 28 Mar 2001, Lukas Ertl wrote:

There are two things you should do first:

1) VACUUM ANALYZE hits;
2) VACUUM ANALYZE referrer;

If it's still running slowly, try EXPLAINing your SELECT to us.

Also, it's been my experience that LIMIT's slow down a query, though I could be
dead wrong.


> SELECT count(*), url FROM hits, referrer WHERE referrer.id = referrer_id
> GROUP BY url ORDER BY count DESC LIMIT 10;
>
> And this query takes almost five seconds to complete, and that seems to me
> quite slow. The database is running on a Duron 700 with 256 MB RAM and
> ATA-66 disks. Operating system is FreeBSD-4.2.
>
> Are there some tuning knobs I could turn?

-- Dave


Re: Query performance question

From
Lukas Ertl
Date:
On Wed, 28 Mar 2001, David Olbersen wrote:

> There are two things you should do first:
>
> 1) VACUUM ANALYZE hits;
> 2) VACUUM ANALYZE referrer;

I did that before, too.

> If it's still running slowly, try EXPLAINing your SELECT to us.

Ok, VACUUMing the tables didn't increase performance. This is the EXPLAIN
output:

httplog=# EXPLAIN SELECT count(*), url FROM hits, referrer WHERE
referrer.id = referrer_id GROUP BY url ORDER BY count DESC LIMIT 10;
NOTICE:  QUERY PLAN:

Sort  (cost=912.81..912.81 rows=678 width=20)
  ->  Aggregate  (cost=846.98..880.90 rows=678 width=20)
        ->  Group  (cost=846.98..863.94 rows=6785 width=20)
              ->  Sort  (cost=846.98..846.98 rows=6785 width=20)
                    ->  Hash Join  (cost=10.46..415.17 rows=6785 width=20)
                          ->  Seq Scan on hits  (cost=0.00..128.85
rows=6785 width=4)
                          ->  Hash  (cost=9.57..9.57 rows=357 width=16)
                                ->  Seq Scan on referrer  (cost=0.00..9.57
rows=357 width=16)

EXPLAIN

I hope someone can explain this to me :)

Thanks for your quick answer,

regards,
le

--
Lukas Ertl                          eMail: l.ertl@univie.ac.at
WWW-Redaktion                       Tel.:  (+43 1) 4277-14073
Zentraler Informatikdienst (ZID)    Fax.:  (+43 1) 4277-9140
der Universität Wien


Re: Query performance question

From
Lukas Ertl
Date:
On Thu, 29 Mar 2001, Vijay Deval wrote:

> URL is a very large field.  If an extra field is created which gives a
> neumeric id to the url, run the query on the number, and then get the desired
> output

I have to admit, that I quite don't understand what you want me to do
here... I already have that numeric id:

> > httplog=# \d referrer
> >           Table "referrer"
> >  Attribute |     Type     | Modifier
> > -----------+--------------+----------
> >  id        | integer      |
> >  url       | varchar(300) |
> > Index: referrer_pkey

If I say:

SELECT count(*), referrer.id FROM hits, referrer WHERE referrer.id =
referrer_id GROUP BY referrer.id ORDER BY count DESC LIMIT 10;

it is really fast and I get the IDs of the referrer-urls but how would I
get the URLs then?

regards,
le

--
Lukas Ertl                          eMail: l.ertl@univie.ac.at
WWW-Redaktion                       Tel.:  (+43 1) 4277-14073
Zentraler Informatikdienst (ZID)    Fax.:  (+43 1) 4277-9140
der Universität Wien


Re: Query performance question

From
Vijay Deval
Date:
URL is a very large field.  If an extra field is created which gives a
neumeric id to the url, run the query on the number, and then get the desired
output

Vijay

Lukas Ertl wrote:

> Hi,
>
> I'm having trouble with an obviously simple query that just doesn't
> perform quite good IMO.
>
> I have two tables:
>
> httplog=# \d hits
>                               Table "hits"
>   Attribute  |   Type    |                   Modifier
> -------------+-----------+-----------------------------------------------
>  id          | integer   | not null default nextval('hits_id_seq'::text)
>  page_id     | integer   | not null
>  referrer_id | integer   | not null
>  ip_addr     | inet      | not null
>  time        | timestamp | not null
>  domain_id   | integer   | not null
> Index: hits_pkey
>
> httplog=# \d referrer
>           Table "referrer"
>  Attribute |     Type     | Modifier
> -----------+--------------+----------
>  id        | integer      |
>  url       | varchar(300) |
> Index: referrer_pkey
>
> These are part of an HTTP-log database. Table 'hits' has about 7000
> rows, table 'referrer' has about 350 rows. Now I want to know what the top
> ten referrers are, and I issue this query:
>
> SELECT count(*), url FROM hits, referrer WHERE referrer.id = referrer_id
> GROUP BY url ORDER BY count DESC LIMIT 10;
>
>



Re: Query performance question

From
"D. Duccini"
Date:
i'd suggest either using an MD5 hash or some sort of CRC

definitely a hash though


On Thu, 29 Mar 2001, Vijay Deval wrote:

> URL is a very large field.  If an extra field is created which gives a
> neumeric id to the url, run the query on the number, and then get the desired
> output
>
> Vijay
>
> Lukas Ertl wrote:
>
> > Hi,
> >
> > I'm having trouble with an obviously simple query that just doesn't
> > perform quite good IMO.
> >
> > I have two tables:
> >
> > httplog=# \d hits
> >                               Table "hits"
> >   Attribute  |   Type    |                   Modifier
> > -------------+-----------+-----------------------------------------------
> >  id          | integer   | not null default nextval('hits_id_seq'::text)
> >  page_id     | integer   | not null
> >  referrer_id | integer   | not null
> >  ip_addr     | inet      | not null
> >  time        | timestamp | not null
> >  domain_id   | integer   | not null
> > Index: hits_pkey
> >
> > httplog=# \d referrer
> >           Table "referrer"
> >  Attribute |     Type     | Modifier
> > -----------+--------------+----------
> >  id        | integer      |
> >  url       | varchar(300) |
> > Index: referrer_pkey
> >
> > These are part of an HTTP-log database. Table 'hits' has about 7000
> > rows, table 'referrer' has about 350 rows. Now I want to know what the top
> > ten referrers are, and I issue this query:
> >
> > SELECT count(*), url FROM hits, referrer WHERE referrer.id = referrer_id
> > GROUP BY url ORDER BY count DESC LIMIT 10;
> >
> >
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>


-----------------------------------------------------------------------------
david@backpack.com            BackPack Software, Inc.        www.backpack.com
+1 651.645.7550 voice       "Life is an Adventure.
+1 651.645.9798 fax            Don't forget your BackPack!"
-----------------------------------------------------------------------------


Re: Query performance question

From
Lukas Ertl
Date:
Thanks for all of your suggestions, I found a way to speed up the queries
using VIEWs:

--
CREATE VIEW hits_per_referrer AS SELECT count(hits.id) AS anzahl,
hits.referrer_id FROM hits GROUP BY hits.referrer_id;

SELECT anzahl, url FROM hits_per_referrer, referrer WHERE referrer_id =
referrer.id ORDER BY anzahl DESC LIMIT 10;
--

This is really _a lot_ faster, as this EXPLAIN statement proves:

--
httplog=# EXPLAIN SELECT anzahl, url FROM hits_per_referrer, referrer
WHERE referrer_id = referrer.id ORDER BY anzahl DESC LIMIT 10;
NOTICE:  QUERY PLAN:

Sort  (cost=254.66..254.66 rows=1000 width=24)
  ->  Aggregate  (cost=69.83..204.83 rows=1000 width=24)
        ->  Group  (cost=69.83..179.83 rows=10000 width=24)
              ->  Merge Join  (cost=69.83..154.83 rows=10000 width=24)
                    ->  Index Scan using referrer_pkey on referrer
(cost=0.00..60.00 rows=1000 width=16)
                    ->  Sort  (cost=69.83..69.83 rows=1000 width=8)
                          ->  Seq Scan on hits  (cost=0.00..20.00
rows=1000 width=8)

EXPLAIN
--

Thanks again,

regards,
le


On Thu, 29 Mar 2001, D. Duccini wrote:

>
> i'd suggest either using an MD5 hash or some sort of CRC
>
> definitely a hash though
>
>
> On Thu, 29 Mar 2001, Vijay Deval wrote:
>
> > URL is a very large field.  If an extra field is created which gives a
> > neumeric id to the url, run the query on the number, and then get the desired
> > output
> >
> > Vijay
> >
> > Lukas Ertl wrote:
> >
> > > Hi,
> > >
> > > I'm having trouble with an obviously simple query that just doesn't
> > > perform quite good IMO.
> > >
> > > I have two tables:
> > >
> > > httplog=# \d hits
> > >                               Table "hits"
> > >   Attribute  |   Type    |                   Modifier
> > > -------------+-----------+-----------------------------------------------
> > >  id          | integer   | not null default nextval('hits_id_seq'::text)
> > >  page_id     | integer   | not null
> > >  referrer_id | integer   | not null
> > >  ip_addr     | inet      | not null
> > >  time        | timestamp | not null
> > >  domain_id   | integer   | not null
> > > Index: hits_pkey
> > >
> > > httplog=# \d referrer
> > >           Table "referrer"
> > >  Attribute |     Type     | Modifier
> > > -----------+--------------+----------
> > >  id        | integer      |
> > >  url       | varchar(300) |
> > > Index: referrer_pkey
> > >
> > > These are part of an HTTP-log database. Table 'hits' has about 7000
> > > rows, table 'referrer' has about 350 rows. Now I want to know what the top
> > > ten referrers are, and I issue this query:
> > >
> > > SELECT count(*), url FROM hits, referrer WHERE referrer.id = referrer_id
> > > GROUP BY url ORDER BY count DESC LIMIT 10;

--
Lukas Ertl                          eMail: l.ertl@univie.ac.at
WWW-Redaktion                       Tel.:  (+43 1) 4277-14073
Zentraler Informatikdienst (ZID)    Fax.:  (+43 1) 4277-9140
der Universität Wien


Re: Query performance question

From
Vijay Deval
Date:
As the tables are linked by

WHERE hits.referrer_id =referrer.id

just adding to select referrer.url should display url
I did not see right away that referrer.id was unique. Hence the
suggestion to add a
unique ID for the tuples.

Vijay

Lukas Ertl wrote:

> On Thu, 29 Mar 2001, Vijay Deval wrote:
>
> > URL is a very large field.  If an extra field is created which gives
a
> > neumeric id to the url, run the query on the number, and then get
the desired
> > output
>
> I have to admit, that I quite don't understand what you want me to do
> here... I already have that numeric id:
>
> > > httplog=# \d referrer
> > >           Table "referrer"
> > >  Attribute |     Type     | Modifier
> > > -----------+--------------+----------
> > >  id        | integer      |
> > >  url       | varchar(300) |
> > > Index: referrer_pkey
>
> If I say:
>
> SELECT count(*), referrer.id FROM hits, referrer WHERE referrer.id =
> referrer_id GROUP BY referrer.id ORDER BY count DESC LIMIT 10;
>
> it is really fast and I get the IDs of the referrer-urls but how would
I
> get the URLs then?
>
> regards,
> le
>
> --
> Lukas Ertl                          eMail: l.ertl@univie.ac.at
> WWW-Redaktion                       Tel.:  (+43 1) 4277-14073
> Zentraler Informatikdienst (ZID)    Fax.:  (+43 1) 4277-9140
> der Universität Wien