Re: Query performance question - Mailing list pgsql-novice

From Lukas Ertl
Subject Re: Query performance question
Date
Msg-id 20010329232607.O547-100000@localhost.localdomain
Whole thread Raw
In response to Re: Query performance question  ("D. Duccini" <duccini@backpack.com>)
List pgsql-novice
Thanks for all of your suggestions, I found a way to speed up the queries
using VIEWs:

--
CREATE VIEW hits_per_referrer AS SELECT count(hits.id) AS anzahl,
hits.referrer_id FROM hits GROUP BY hits.referrer_id;

SELECT anzahl, url FROM hits_per_referrer, referrer WHERE referrer_id =
referrer.id ORDER BY anzahl DESC LIMIT 10;
--

This is really _a lot_ faster, as this EXPLAIN statement proves:

--
httplog=# EXPLAIN SELECT anzahl, url FROM hits_per_referrer, referrer
WHERE referrer_id = referrer.id ORDER BY anzahl DESC LIMIT 10;
NOTICE:  QUERY PLAN:

Sort  (cost=254.66..254.66 rows=1000 width=24)
  ->  Aggregate  (cost=69.83..204.83 rows=1000 width=24)
        ->  Group  (cost=69.83..179.83 rows=10000 width=24)
              ->  Merge Join  (cost=69.83..154.83 rows=10000 width=24)
                    ->  Index Scan using referrer_pkey on referrer
(cost=0.00..60.00 rows=1000 width=16)
                    ->  Sort  (cost=69.83..69.83 rows=1000 width=8)
                          ->  Seq Scan on hits  (cost=0.00..20.00
rows=1000 width=8)

EXPLAIN
--

Thanks again,

regards,
le


On Thu, 29 Mar 2001, D. Duccini wrote:

>
> i'd suggest either using an MD5 hash or some sort of CRC
>
> definitely a hash though
>
>
> On Thu, 29 Mar 2001, Vijay Deval wrote:
>
> > URL is a very large field.  If an extra field is created which gives a
> > neumeric id to the url, run the query on the number, and then get the desired
> > output
> >
> > Vijay
> >
> > Lukas Ertl wrote:
> >
> > > Hi,
> > >
> > > I'm having trouble with an obviously simple query that just doesn't
> > > perform quite good IMO.
> > >
> > > I have two tables:
> > >
> > > httplog=# \d hits
> > >                               Table "hits"
> > >   Attribute  |   Type    |                   Modifier
> > > -------------+-----------+-----------------------------------------------
> > >  id          | integer   | not null default nextval('hits_id_seq'::text)
> > >  page_id     | integer   | not null
> > >  referrer_id | integer   | not null
> > >  ip_addr     | inet      | not null
> > >  time        | timestamp | not null
> > >  domain_id   | integer   | not null
> > > Index: hits_pkey
> > >
> > > httplog=# \d referrer
> > >           Table "referrer"
> > >  Attribute |     Type     | Modifier
> > > -----------+--------------+----------
> > >  id        | integer      |
> > >  url       | varchar(300) |
> > > Index: referrer_pkey
> > >
> > > These are part of an HTTP-log database. Table 'hits' has about 7000
> > > rows, table 'referrer' has about 350 rows. Now I want to know what the top
> > > ten referrers are, and I issue this query:
> > >
> > > SELECT count(*), url FROM hits, referrer WHERE referrer.id = referrer_id
> > > GROUP BY url ORDER BY count DESC LIMIT 10;

--
Lukas Ertl                          eMail: l.ertl@univie.ac.at
WWW-Redaktion                       Tel.:  (+43 1) 4277-14073
Zentraler Informatikdienst (ZID)    Fax.:  (+43 1) 4277-9140
der Universität Wien


pgsql-novice by date:

Previous
From: Patrick Coulombe
Date:
Subject: 3 options
Next
From: Vijay Deval
Date:
Subject: Re: Query performance question