Thread: join optimization problem
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, I'm trying to optimize what I had thought was a simple enough query but I'm not having much success. It's from a forum based application, and the query in question is operating on two tables - fbof_msg (8563 rows) and fbof_thread(1537 rows) and it looks like this: select t.thread_id, t.forum_id, t.thread_title, t.thread_owner, t.thread_owner_id, t.date_created, t.thread_hits, t.poem_reference, t.bArchived, count(m.msg_id) as msgcount, max(m.date_created) as lastpostfrom fbof_thread t LEFT OUTER JOIN fbof_msg m ON m.thread_id = t.thread_idwhere t.forum_id = 1and t.bArchived= 0and t.bSticky = 0group by t.thread_id, t.thread_title, t.thread_owner, t.thread_owner_id, t.date_created, t.thread_hits, t.forum_id, t.poem_reference, t.bArchived, t.bStickyorder by t.date_created desc the explain data I'm currently getting is this: Sort (cost=1660930.18..1660946.63 rows=6581 width=568) Sort Key: max(m.date_created) -> Aggregate (cost=0.00..1659452.99rows=6581 width=568) -> Group (cost=0.00..1659123.95 rows=65807 width=568) -> Nested Loop (cost=0.00..1657643.30 rows=65807 width=568) Join Filter: (("outer".thread_id)::doubleprecision = "inner".thread_id) -> Index Scan using fbof_group_idx on fbof_thread t (cost=0.00..642.03 rows=1537 width=548) Filter: ((forum_id = 1::double precision) AND (barchived = 0) AND (bsticky = 0)) -> Seq Scan on fbof_msg m (cost=0.00..949.63 rows=8563 width=20) (9 rows) The index it's using is the only one I've been able to get it to use, and looks like this: (\d fbof_group_idx) Index "public.fbof_group_idx" Column | Type - -----------------+------------------thread_id | integerthread_title | character(255)thread_owner | character(255)thread_owner_id| integerdate_created | datethread_hits | integerforum_id | double precisionpoem_reference | integerbarchived | smallint btree, for table "public.fbof_thread" can anyone point out to me where I'm going wrong here? I can't seem to make it faster for the life of me.... I've tried adding indices on all the main fields etc but nada. I'm not subscribed the list currently so please reply to my address as well as the list. cheers, Toby - -- - -------------------------------- Life is poetry - write it in your own words - -------------------------------- Toby Tremayne Code Poet and Zen Master of the Heavy Sleep Senior Technical Consultant Lyricist Software www.lyricist.com.au +61 416 048 090 ICQ: 13107913 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2-rc1-SuSE (GNU/Linux) iD8DBQE/JKKz6KYxVcB16DkRAjl+AKCCrZswP4TL9aAzZUs7CkY9ajjoYwCfZO/v SGR8GSi++ZZ+DrNXicabzvo= =kx4x -----END PGP SIGNATURE-----
Toby Tremayne <toby@lyricist.com.au> writes: > the explain data I'm currently getting is this: "explain analyze" would be much more useful. Also, I assume you've vacuum analyzed these tables recently? regards, tom lane
How about this? It should spit out the same number, but be quite a bit quicker at it. Untested of course... select t.thread_id, t.forum_id, t.thread_title, t.thread_owner, t.thread_owner_id, t.date_created, t.thread_hits, t.poem_reference, t.bArchived, count, maxdate from fbof_thread t LEFT OUTER JOIN (select thread_id , count(msg_id) as count , max(date_created) as maxdate from msg group by thread_id )as tab ON m.thread_id = t.thread_id where t.forum_id = 1 and t.bArchived = 0 and t.bSticky = 0 order by t.date_created desc > can anyone point out to me where I'm going wrong here? I can't seem to make > it faster for the life of me.... I've tried adding indices on all the main > fields etc but nada. I'm not subscribed the list currently so please reply > to my address as well as the list.
> thanks for that - I tried it out, fixed the minor typos and it ran at almost > the same speed! Any other ideas? Could you post an explain analyze for both queries? There is always an application change. Add a trigger onto msg that will bump the date and count on msg_sum when a msg is added. This will support virtually unlimited msg table growth (any aggregate will eventually be slow on a large table). Just be sure to vacuum the msg_sum table regularly (once per 25% turnover) -- possibly as often as once a minute!
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Tom, yes I re-run vacuum analyze every time I muck about with the tables/ indexes. The output of explain analyze is this: Sort (cost=1660930.18..1660946.63 rows=6581 width=568) (actual time=545668.58..545675.05 rows=1537 loops=1) Sort Key: max(m.date_created) -> Aggregate (cost=0.00..1659452.99 rows=6581width=568) (actual time=614.83..544598.17 rows=1537 loops=1) -> Group (cost=0.00..1659123.95 rows=65807 width=568) (actual time=539.87..543988.21 rows=1537 loops=1) -> Nested Loop (cost=0.00..1657643.30 rows=65807 width=568) (actual time=539.82..543404.25 rows=1537 loops=1) Join Filter: ("inner".thread_id = ("outer".thread_id)::double precision) -> Index Scan using fbof_group_idx on fbof_thread t (cost=0.00..642.03 rows=1537 width=548) (actual time=29.27..1043.40 rows=1537 loops=1) Filter: ((forum_id = 1::double precision) AND (barchived = 0) AND (bsticky = 0)) -> Seq Scan on fbof_msg m (cost=0.00..949.63 rows=8563 width=20) (actual time=0.08.. 294.28 rows=8563 loops=1537)Total runtime: 545763.83 msec (10 rows) any input at all would be great. cheers, Toby On Friday 01 August 2003 10:36, Tom Lane wrote: > Toby Tremayne <toby@lyricist.com.au> writes: > > the explain data I'm currently getting is this: > > "explain analyze" would be much more useful. > > Also, I assume you've vacuum analyzed these tables recently? > > regards, tom lane - -- - -------------------------------- Life is poetry - write it in your own words - -------------------------------- Toby Tremayne Code Poet and Zen Master of the Heavy Sleep Senior Technical Consultant Lyricist Software www.lyricist.com.au +61 416 048 090 ICQ: 13107913 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2-rc1-SuSE (GNU/Linux) iD8DBQE/LOIC6KYxVcB16DkRAm/GAJ99ZmFU1iqNyFyQemwfCxcihC1aTwCfWMeo hxTkh1K9qOwp9XkLsmE4XFQ= =HXUi -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Rod, thanks for that - I tried it out, fixed the minor typos and it ran at almost the same speed! Any other ideas? cheers, Toby On Friday 01 August 2003 10:33, Rod Taylor wrote: > How about this? It should spit out the same number, but be quite a bit > quicker at it. Untested of course... > > > select t.thread_id, > t.forum_id, > t.thread_title, > t.thread_owner, > t.thread_owner_id, > t.date_created, > t.thread_hits, > t.poem_reference, > t.bArchived, > count, maxdate > from fbof_thread t > LEFT OUTER JOIN > (select thread_id > , count(msg_id) as count > , max(date_created) as maxdate > from msg > group by thread_id > ) as tab ON m.thread_id = t.thread_id > where t.forum_id = 1 > and t.bArchived = 0 > and t.bSticky = 0 > order by t.date_created desc > > > can anyone point out to me where I'm going wrong here? I can't seem to > > make it faster for the life of me.... I've tried adding indices on all > > the main fields etc but nada. I'm not subscribed the list currently so > > please reply to my address as well as the list. - -- - -------------------------------- Life is poetry - write it in your own words - -------------------------------- Toby Tremayne Code Poet and Zen Master of the Heavy Sleep Senior Technical Consultant Lyricist Software www.lyricist.com.au +61 416 048 090 ICQ: 13107913 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2-rc1-SuSE (GNU/Linux) iD8DBQE/LOI36KYxVcB16DkRAg7RAKCIALKF4TExS9Q38WiM8jTzRxFctgCgttI3 jbfhQ4GrW2BKPU5uhRoK4rc= =F+c9 -----END PGP SIGNATURE-----
Toby Tremayne <toby@lyricist.com.au> writes: > The output of explain analyze is this: > -> Nested Loop (cost=0.00..1657643.30 rows=65807 width=568) > (actual time=539.82..543404.25 rows=1537 loops=1) > Join Filter: ("inner".thread_id = > ("outer".thread_id)::double precision) Hm. It would appear that you've been inconsistent about the datatype of thread_id. Current PG releases are not capable of using anything smarter than a nested-loop join when there is a forced datatype coercion involved in the join condition. Try making your column datatypes the same. Also, do you really need a LEFT JOIN, or would plain inner join do? It's hard for me to imagine a zero-message thread. If you do have such things, do you care whether this query finds them? (I'm not sure that eliminating the LEFT would really help any, but it couldn't hurt, since LEFT restricts the planner's choices.) regards, tom lane