Home > mailing lists

Re: efficient storing of urls - Mailing list pgsql-general

From	scott.marlowe
Subject	Re: efficient storing of urls
Date	March 1, 2004 15:10:08
Msg-id	Pine.LNX.4.33.0403010853300.32148-100000@css120.ihs.com Whole thread Raw
In response to	Re: efficient storing of urls (Shane Wegner <shane-keyword-pgsql.a1e0d9@cm.nu>)
Responses	Re: efficient storing of urls
List	pgsql-general

Tree view

On Fri, 27 Feb 2004, Shane Wegner wrote:

> On Fri, Feb 27, 2004 at 06:00:36PM -0500, Sean Shanny wrote:
> > Shane,
> >
> > Can you give an example of a query that has gotten slower due to the
> > increasing size of the urls table with an explain analyze?
>
> The database is a simple traffic monitoring tool so we have
> a hits table which gets a new row for every url accessed.
> Very simple table
>
> create table hits(
> hit_date date not null,
> hit_time time(0) without time zone not null,
> url_id int references urls(id)
> );
>
> A select to display the 100 most popular pages:
> explain analyze select count(*) as c,url from hits,urls where hit_date between '2004-01-01' and '2004-01-31' and
url_id=urls.idgroup by url order by c desc limit 100; 
>                                                                       QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=320189.71..320189.96 rows=100 width=68) (actual time=34156.080..34156.324 rows=100 loops=1)
>    ->  Sort  (cost=320189.71..320700.06 rows=204138 width=68) (actual time=34156.068..34156.208 rows=100 loops=1)
>          Sort Key: count(*)
>          ->  GroupAggregate  (cost=281214.19..283255.57 rows=204138 width=68) (actual time=32457.857..33584.861
rows=53460loops=1) 
>                ->  Sort  (cost=281214.19..281724.54 rows=204138 width=68) (actual time=32457.690..32873.446
rows=248888loops=1) 
>                      Sort Key: urls.url
>                      ->  Merge Join  (cost=239594.05..244280.05 rows=204138 width=68) (actual
time=21363.547..24385.213rows=248888 loops=1) 
>                            Merge Cond: ("outer".url_id = "inner".id)
>                            ->  Sort  (cost=168400.38..168914.15 rows=205508 width=4) (actual
time=14785.934..15156.772rows=249350 loops=1) 
>                                  Sort Key: hits.url_id
>                                  ->  Seq Scan on hits  (cost=0.00..148512.07 rows=205508 width=4) (actual
time=40.265..12081.506rows=249350 loops=1) 
>                                        Filter: ((hit_date >= '2004-01-01'::date) AND (hit_date <=
'2004-01-31'::date))
>                            ->  Sort  (cost=71193.67..72005.68 rows=324805 width=72) (actual time=6577.430..7422.945
rows=519307loops=1) 
>                                  Sort Key: urls.id
>                                  ->  Seq Scan on urls  (cost=0.00..7389.05 rows=324805 width=72) (actual
time=0.110..1187.617rows=324805 loops=1) 
>  Total runtime: 34221.250 ms
> (16 rows)

Your single biggest jump in actual time seems to be coming from the seq
scan on hits, which takes you from 40 ms to 12,000 ms.

Can you index the hit_date and see if that helps?

pgsql-general by date:

From: Ron St-Pierre
Date: 01 March 2004, 14:52:48
Subject: Re: How to drop sequence?

From: "scott.marlowe"
Date: 01 March 2004, 15:14:29
Subject: Re: value too long error

Re: efficient storing of urls - Mailing list pgsql-general

Previous

Next