Help with a seq scan on multi-million row table - Mailing list pgsql-sql

From
Subject Help with a seq scan on multi-million row table
Date
Msg-id 20060510181359.58201.qmail@web50306.mail.yahoo.com
Whole thread Raw
Responses Re: Help with a seq scan on multi-million row table
Re: Help with a seq scan on multi-million row table
List pgsql-sql
Hello,

I have a little 2-table JOIN, GROUP BY, ORDER BY query that does a sequential scan on a multi-million row table.  I
_thought_I had all the appropriate indices, but apparently I do not.  I was wondering if anyone can spot a way I can
speedup this query.
 
The query currently takes... *gulp*: 381119.201 ms :( 
There are only 2 tables in the game: user_url and user_url_tag.  The latter has FKs pointing to the former.  The
sequentialscan happens on the latter - user_url_tag:
 

EXPLAIN ANALYZE select DISTINCT userurltag0_.tag as x0_0_, COUNT(*) as x1_0_ from user_url_tag userurltag0_, user_url
userurl1_WHERE (((userurl1_.user_id=1 )) AND ((userurltag0_.user_url_id=userurl1_.id ))) GROUP BY userurltag0_.tag
ORDERBY  count(*) DESC;                                                                                        QUERY
PLAN  
 

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique  (cost=140972.22..140996.28 rows=3207 width=10) (actual time=381082.868..381110.094 rows=2546 loops=1)    ->
Sort (cost=140972.22..140980.24 rows=3207 width=10) (actual time=381082.858..381091.733 rows=2546 loops=1)
SortKey: count(*), userurltag0_.tag          ->  HashAggregate  (cost=140777.45..140785.46 rows=3207 width=10) (actual
time=381032.844..381064.068rows=2546 loops=1)                ->  Hash Join  (cost=2797.65..140758.50 rows=3790
width=10)(actual time=248.530..380635.132 rows=8544 loops=1)                      Hash Cond: ("outer".user_url_id =
"inner".id)                     ->  Seq Scan on user_url_tag userurltag0_  (cost=0.00..106650.30 rows=6254530 width=14)
(actualtime=0.017..212256.630 rows=6259553 loops=1)                      ->  Hash  (cost=2795.24..2795.24 rows=962
width=4)(actual time=199.840..199.840 rows=0 loops=1)                            ->  Index Scan using
ix_user_url_user_id_url_idon user_url userurl1_  (cost=0.00..2795.24 rows=962 width=4) (actual time=0.048..193.707
rows=1666loops=1)                                  Index Cond: (user_id = 1)  Total runtime: 381119.201 ms  
 
(11 rows)     This is what the two tables look like (extra colums removed):
                                   Table "public.user_url_tag"    Column    |         Type          |
      Modifiers  
 
-------------+-----------------------+--------------------------------------------------------------  id          |
integer              | not null default nextval('public.user_url_tag_id_seq'::text)  user_url_id | integer
|  tag         | character varying(64) |  
 
Indexes:     "pk_user_url_tag_id" PRIMARY KEY, btree (id)     "ix_user_url_tag_tag" btree (tag)
"ix_user_url_tag_user_url_id"btree (user_url_id)  
 
Foreign-key constraints:     "fk_user_url_tag_user_url_id" FOREIGN KEY (user_url_id) REFERENCES user_url(id)
                               Table "public.user_url"       Column      |            Type             |
       Modifiers  
 
------------------+-----------------------------+----------------------------------------------------------  id
     | integer                     | not null default nextval('public.user_url_id_seq'::text)  user_id          |
integer                    |  url_id           | integer                     |  
 
Indexes:     "pk_user_url_id" PRIMARY KEY, btree (id)     "ix_user_url_url_id_user_id" UNIQUE, btree (url_id, user_id)
  "ix_user_url_user_id_url_id" UNIQUE, btree (user_id, url_id)     Does anyone see a way to speed up this s-l-o-w
query?
I cache DB results, but I'd love to get rid of that sequential scan.

Thanks,
Otis





pgsql-sql by date:

Previous
From: Markus Schaber
Date:
Subject: Re: Multi-column index not used, new flipped column index is
Next
From: Andreas Kretschmer
Date:
Subject: Re: Multi-column index not used, new flipped column index is