Re: Help with a seq scan on multi-million row table - Mailing list pgsql-sql
From | |
---|---|
Subject | Re: Help with a seq scan on multi-million row table |
Date | |
Msg-id | 20060511013410.83818.qmail@web50307.mail.yahoo.com Whole thread Raw |
In response to | Re: Help with a seq scan on multi-million row table (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Help with a seq scan on multi-million row table
|
List | pgsql-sql |
Aha! set hashjoin=off did the trick. The PG version is: 8.0.3 NB: I removed that redundant "DISTINCT" after the SELECT. EXPLAIN ANALYZE select userurltag0_.tag as x0_0_, COUNT(*) as x1_0_ from user_url_tag userurltag0_, user_url userurl1_ where(((userurl1_.user_id=1 ))AND((userurltag0_.user_url_id=userurl1_.id ))) group by userurltag0_.tag order by count(*)DESC; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Sort (cost=155766.79..155774.81 rows=3207 width=10) (actual time=2387.756..2396.578 rows=2546 loops=1) Sort Key: count(*) -> HashAggregate (cost=155572.02..155580.03 rows=3207 width=10) (actual time=2365.643..2376.626 rows=2546 loops=1) -> Nested Loop (cost=0.00..155552.68 rows=3867 width=10) (actual time=0.135..2222.028 rows=8544 loops=1) -> Index Scan using ix_user_url_user_id_url_id on user_url userurl1_ (cost=0.00..2798.12 rows=963 width=4) (actual time=0.067..9.744rows=1666 loops=1) Index Cond: (user_id = 1) -> Index Scan using ix_user_url_tag_user_url_idon user_url_tag userurltag0_ (cost=0.00..157.34 rows=103 width=14) (actual time=1.223..1.281rows=5 loops=1666) Index Cond: (userurltag0_.user_url_id = "outer".id)Total runtime:2405.691 ms (9 rows) Are you still interested in other "its second-choice join type"? If you are, please tell me what join types those are, thisis a bit beyond me. :( Is there a way to force PG to use the index automatically? This query is executed from something called Hibernate, and I'mnot sure if that will let me set enable_hashjoin=off through its API... Thanks, Otis ----- Original Message ---- From: Tom Lane <tgl@sss.pgh.pa.us> To: ogjunk-pgjedan@yahoo.com Cc: pgsql-sql@postgresql.org Sent: Wednesday, May 10, 2006 8:27:01 PM Subject: Re: [SQL] Help with a seq scan on multi-million row table <ogjunk-pgjedan@yahoo.com> writes: > -> Hash Join (cost=2797.65..140758.50 rows=3790 width=10) (actual time=248.530..380635.132 rows=8544 loops=1) > Hash Cond: ("outer".user_url_id = "inner".id) > -> Seq Scan on user_url_tag userurltag0_ (cost=0.00..106650.30 rows=6254530 width=14) (actual time=0.017..212256.630rows=6259553 loops=1) > -> Hash (cost=2795.24..2795.24 rows=962 width=4) (actual time=199.840..199.840 rows=0 loops=1) > -> Index Scan using ix_user_url_user_id_url_id on user_url userurl1_ (cost=0.00..2795.24 rows=962width=4) (actual time=0.048..193.707 rows=1666 loops=1) > Index Cond: (user_id = 1) Hm, I'm not sure why it's choosing that join plan. A nestloop indexscan wouldn't be terribly cheap, but just counting on my fingers it seems like it ought to come in at less than 100000 cost units. What do you get if you set enable_hashjoin off? (Then try disabling its second-choice join type too --- I'm interested to see EXPLAIN ANALYZE output for all three join types.) What PG version is this exactly? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend