Speeding up Aggregates - Mailing list pgsql-performance

From Dror Matalon
Subject Speeding up Aggregates
Date
Msg-id 20031003202120.GO87525@rlx11.zapatec.com
Whole thread Raw
Responses Re: Speeding up Aggregates
Re: Speeding up Aggregates
List pgsql-performance
Hi,

I have a query that ran quite well initially, but slowed down quite a
bit once I introduced an aggregate into the equation. The average
execution time went up from around 15 msec to around 300 msec.

The original query fetches a bunch of articles:

select articlenumber, channel, description, title, link, dtstamp  from
    items, my_channels where items.channel = '22222' and my_channels.id =
    '22222' and owner = 'drormata'  and dtstamp > last_viewed and
    articlenumber not in (select item from viewed_items where channel
    ='22222' and owner = 'drormata');


I then added a call to a function:

and (dtstamp = item_max_date(22222, link))


item_max_date() looks like this:
   select max(dtstamp) from items where channel = $1 and link = $2;

This should eliminate duplicate articles and only show the most recent
one.

resulting in the following query

select articlenumber, channel, description, title, link, dtstamp  from
    items, my_channels where items.channel = '22222' and my_channels.id =
    '22222' and owner = 'drormata'  and dtstamp > last_viewed and
    articlenumber not in (select item from viewed_items where channel
    ='22222' and owner = 'drormata') and (dtstamp = item_max_date(22222,
    link));



Any suggestions on optimizing the query/function? It makes sense that
it slowed down, but I wonder if I can do better.

I'm including index list as well as "explain analyze" of both versions.

Indexes:
    "item_channel_link" btree (channel, link)
    "item_created" btree (dtstamp)
    "item_signature" btree (signature)
    "items_channel_article" btree (channel, articlenumber)


explain analyze select articlenumber, channel, description, title, link, dtstamp  from items, my_channels where
items.channel= '22222' and my_channels.id = '22222' and owner = 'drormata'  and dtstamp > last_viewed and articlenumber
notin (select item from viewed_items where channel ='22222' and owner = 'drormata');
                                                          QUERY PLAN 

-------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=8.19..6982.58 rows=302 width=259) (actual time=16.95..17.16 rows=8 loops=1)
   Join Filter: ("inner".dtstamp > "outer".last_viewed)
   ->  Seq Scan on my_channels  (cost=0.00..3.23 rows=1 width=8) (actual time=0.36..0.38 rows=1 loops=1)
         Filter: ((id = 22222) AND (("owner")::text = 'drormata'::text))
   ->  Index Scan using items_channel_article on items  (cost=8.19..6968.05 rows=904 width=259) (actual
time=0.68..13.94rows=899 loops=1) 
         Index Cond: (channel = 22222)
         Filter: (NOT (hashed subplan))
         SubPlan
           ->  Seq Scan on viewed_items  (cost=0.00..8.19 rows=2 width=4) (actual time=0.48..0.48 rows=0 loops=1)
                 Filter: ((channel = 22222) AND (("owner")::text = 'drormata'::text))
 Total runtime: 17.42 msec
(11 rows)


explain analyze select articlenumber, channel, description, title, link, dtstamp  from items, my_channels where
items.channel= '22222' and my_channels.id = '22222' and owner = 'drormata'  and dtstamp > last_viewed and articlenumber
notin (select item from viewed_items where channel ='22222' and owner = 'drormata') and (dtstamp = item_max_date(22222,
link));
                                                                QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=8.19..6980.33 rows=1 width=259) (actual time=262.94..265.14 rows=7 loops=1)
   Join Filter: ("outer".dtstamp > "inner".last_viewed)
   ->  Index Scan using items_channel_article on items  (cost=8.19..6977.08 rows=1 width=259) (actual time=1.94..150.55
rows=683loops=1) 
         Index Cond: (channel = 22222)
         Filter: ((dtstamp = item_max_date(22222, link)) AND (NOT (hashed subplan)))
         SubPlan
           ->  Seq Scan on viewed_items  (cost=0.00..8.19 rows=2 width=4) (actual time=0.43..0.43 rows=0 loops=1)
                 Filter: ((channel = 22222) AND (("owner")::text = 'drormata'::text))
   ->  Seq Scan on my_channels  (cost=0.00..3.23 rows=1 width=8) (actual time=0.14..0.15 rows=1 loops=683)
         Filter: ((id = 22222) AND (("owner")::text = 'drormata'::text))
 Total runtime: 265.39 msec




--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

pgsql-performance by date:

Previous
From: Richard Welty
Date:
Subject: Re: Postgres low end processing.
Next
From: apb18@cornell.edu
Date:
Subject: Re: Joins on inherited tables