Re: efficient storing of urls - Mailing list pgsql-general

From Shane Wegner
Subject Re: efficient storing of urls
Date
Msg-id 20040228000009.GA3231@cm.nu
Whole thread Raw
In response to Re: efficient storing of urls  (Sean Shanny <shannyconsulting@earthlink.net>)
Responses Re: efficient storing of urls
Re: efficient storing of urls
List pgsql-general
On Fri, Feb 27, 2004 at 06:00:36PM -0500, Sean Shanny wrote:
> Shane,
>
> Can you give an example of a query that has gotten slower due to the
> increasing size of the urls table with an explain analyze?

The database is a simple traffic monitoring tool so we have
a hits table which gets a new row for every url accessed.
Very simple table

create table hits(
hit_date date not null,
hit_time time(0) without time zone not null,
url_id int references urls(id)
);

A select to display the 100 most popular pages:
explain analyze select count(*) as c,url from hits,urls where hit_date between '2004-01-01' and '2004-01-31' and
url_id=urls.idgroup by url order by c desc limit 100; 
                                                                      QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=320189.71..320189.96 rows=100 width=68) (actual time=34156.080..34156.324 rows=100 loops=1)
   ->  Sort  (cost=320189.71..320700.06 rows=204138 width=68) (actual time=34156.068..34156.208 rows=100 loops=1)
         Sort Key: count(*)
         ->  GroupAggregate  (cost=281214.19..283255.57 rows=204138 width=68) (actual time=32457.857..33584.861
rows=53460loops=1) 
               ->  Sort  (cost=281214.19..281724.54 rows=204138 width=68) (actual time=32457.690..32873.446 rows=248888
loops=1)
                     Sort Key: urls.url
                     ->  Merge Join  (cost=239594.05..244280.05 rows=204138 width=68) (actual time=21363.547..24385.213
rows=248888loops=1) 
                           Merge Cond: ("outer".url_id = "inner".id)
                           ->  Sort  (cost=168400.38..168914.15 rows=205508 width=4) (actual time=14785.934..15156.772
rows=249350loops=1) 
                                 Sort Key: hits.url_id
                                 ->  Seq Scan on hits  (cost=0.00..148512.07 rows=205508 width=4) (actual
time=40.265..12081.506rows=249350 loops=1) 
                                       Filter: ((hit_date >= '2004-01-01'::date) AND (hit_date <= '2004-01-31'::date))
                           ->  Sort  (cost=71193.67..72005.68 rows=324805 width=72) (actual time=6577.430..7422.945
rows=519307loops=1) 
                                 Sort Key: urls.id
                                 ->  Seq Scan on urls  (cost=0.00..7389.05 rows=324805 width=72) (actual
time=0.110..1187.617rows=324805 loops=1) 
 Total runtime: 34221.250 ms
(16 rows)

Time: 34224.959 ms

S

pgsql-general by date:

Previous
From: Robert Treat
Date:
Subject: clear defunct entries from pg_stat_activity
Next
From: Bill Moran
Date:
Subject: Regarding BITs vs. INTs