Thread: join optimization problem

join optimization problem

From

Toby Tremayne

Date:

31 July 2003, 21:18:23

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,
I'm trying to optimize what I had thought was a simple enough query but I'm
not having much success.  It's from a forum based application, and the query
in question is operating on two tables - fbof_msg (8563 rows) and
fbof_thread(1537 rows) and it looks like this:

select    t.thread_id,    t.forum_id,    t.thread_title,    t.thread_owner,    t.thread_owner_id,    t.date_created,
t.thread_hits,   t.poem_reference,    t.bArchived,    count(m.msg_id) as msgcount,    max(m.date_created) as
lastpostfrom    fbof_thread t LEFT OUTER JOIN fbof_msg m ON m.thread_id = t.thread_idwhere    t.forum_id = 1and
t.bArchived= 0and        t.bSticky = 0group by     t.thread_id,         t.thread_title,        t.thread_owner,
t.thread_owner_id,       t.date_created,        t.thread_hits,        t.forum_id,        t.poem_reference,
t.bArchived,       t.bStickyorder by    t.date_created desc 

the explain data I'm currently getting is this:

Sort  (cost=1660930.18..1660946.63 rows=6581 width=568)  Sort Key: max(m.date_created)  ->  Aggregate
(cost=0.00..1659452.99rows=6581 width=568)        ->  Group  (cost=0.00..1659123.95 rows=65807 width=568)
-> Nested Loop  (cost=0.00..1657643.30 rows=65807 width=568)                    Join Filter:
(("outer".thread_id)::doubleprecision =  
"inner".thread_id)                    ->  Index Scan using fbof_group_idx on fbof_thread t
(cost=0.00..642.03 rows=1537 width=548)                          Filter: ((forum_id = 1::double precision) AND
(barchived = 0) AND (bsticky = 0))                    ->  Seq Scan on fbof_msg m  (cost=0.00..949.63 rows=8563
width=20)
(9 rows)


The index it's using is the only one I've been able to get it to use, and
looks like this: (\d fbof_group_idx)

Index "public.fbof_group_idx"    Column      |       Type
- -----------------+------------------thread_id       | integerthread_title    | character(255)thread_owner    |
character(255)thread_owner_id| integerdate_created    | datethread_hits     | integerforum_id        | double
precisionpoem_reference | integerbarchived       | smallint 
btree, for table "public.fbof_thread"


can anyone point out to me where I'm going wrong here?  I can't seem to make
it faster for the life of me....  I've tried adding indices on all the main
fields etc but nada.  I'm not subscribed the list currently so please reply
to my address as well as the list.

cheers,
Toby
- --

- --------------------------------
 Life is poetry -    write it in your own words

- --------------------------------

Toby Tremayne
Code Poet and Zen Master of the Heavy Sleep
Senior Technical Consultant
Lyricist Software
www.lyricist.com.au
+61 416 048 090
ICQ: 13107913


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2-rc1-SuSE (GNU/Linux)

iD8DBQE/JKKz6KYxVcB16DkRAjl+AKCCrZswP4TL9aAzZUs7CkY9ajjoYwCfZO/v
SGR8GSi++ZZ+DrNXicabzvo=
=kx4x
-----END PGP SIGNATURE-----

Re: join optimization problem

From

Tom Lane

Date:

31 July 2003, 21:39:51

Toby Tremayne <toby@lyricist.com.au> writes:
> the explain data I'm currently getting is this:

"explain analyze" would be much more useful.

Also, I assume you've vacuum analyzed these tables recently?
        regards, tom lane

Re: join optimization problem

From

Rod Taylor

Date:

31 July 2003, 21:42:19

How about this? It should spit out the same number, but be quite a bit
quicker at it.  Untested of course...


select  t.thread_id,               t.forum_id,               t.thread_title,               t.thread_owner,
t.thread_owner_id,               t.date_created,               t.thread_hits,               t.poem_reference,
   t.bArchived,               count, maxdate  from  fbof_thread t  LEFT OUTER JOIN         (select thread_id
 , count(msg_id) as count              , max(date_created) as maxdate           from msg       group by thread_id
)as tab ON m.thread_id = t.thread_id     where   t.forum_id = 1       and             t.bArchived = 0       and
   t.bSticky = 0  order by        t.date_created desc        
> can anyone point out to me where I'm going wrong here?  I can't seem to make
> it faster for the life of me....  I've tried adding indices on all the main
> fields etc but nada.  I'm not subscribed the list currently so please reply
> to my address as well as the list.

Re: join optimization problem

From

Rod Taylor

Date:

03 August 2003, 10:04:42

>     thanks for that - I tried it out, fixed the minor typos and it ran at almost
> the same speed!  Any other ideas?

Could you post an explain analyze for both queries?


There is always an application change.  Add a trigger onto msg that will
bump the date and count on msg_sum when a msg is added.  This will
support virtually unlimited msg table growth (any aggregate will
eventually be slow on a large table).

Just be sure to vacuum the msg_sum table regularly (once per 25%
turnover) -- possibly as often as once a minute!

Re: join optimization problem

From

Toby Tremayne

Date:

03 August 2003, 13:46:07

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Tom,
yes I re-run vacuum analyze every time I muck about with the tables/ indexes.
The output of explain analyze is this:

Sort  (cost=1660930.18..1660946.63 rows=6581 width=568) (actual
time=545668.58..545675.05 rows=1537 loops=1)  Sort Key: max(m.date_created)  ->  Aggregate  (cost=0.00..1659452.99
rows=6581width=568) (actual  
time=614.83..544598.17 rows=1537 loops=1)        ->  Group  (cost=0.00..1659123.95 rows=65807 width=568) (actual
time=539.87..543988.21 rows=1537 loops=1)              ->  Nested Loop  (cost=0.00..1657643.30 rows=65807 width=568)
(actual time=539.82..543404.25 rows=1537 loops=1)                    Join Filter: ("inner".thread_id =
("outer".thread_id)::double precision)                    ->  Index Scan using fbof_group_idx on fbof_thread t
(cost=0.00..642.03 rows=1537 width=548) (actual time=29.27..1043.40 rows=1537
loops=1)                          Filter: ((forum_id = 1::double precision) AND
(barchived = 0) AND (bsticky = 0))                    ->  Seq Scan on fbof_msg m  (cost=0.00..949.63 rows=8563
width=20) (actual time=0.08.. 294.28 rows=8563 loops=1537)Total runtime: 545763.83 msec
(10 rows)

any input at all would be great.
cheers,
Toby


On Friday 01 August 2003 10:36, Tom Lane wrote:
> Toby Tremayne <toby@lyricist.com.au> writes:
> > the explain data I'm currently getting is this:
>
> "explain analyze" would be much more useful.
>
> Also, I assume you've vacuum analyzed these tables recently?
>
>             regards, tom lane

- --

- --------------------------------
 Life is poetry -    write it in your own words

- --------------------------------

Toby Tremayne
Code Poet and Zen Master of the Heavy Sleep
Senior Technical Consultant
Lyricist Software
www.lyricist.com.au
+61 416 048 090
ICQ: 13107913


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2-rc1-SuSE (GNU/Linux)

iD8DBQE/LOIC6KYxVcB16DkRAm/GAJ99ZmFU1iqNyFyQemwfCxcihC1aTwCfWMeo
hxTkh1K9qOwp9XkLsmE4XFQ=
=HXUi
-----END PGP SIGNATURE-----

Re: join optimization problem

From

Toby Tremayne

Date:

03 August 2003, 13:49:11

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Rod,
thanks for that - I tried it out, fixed the minor typos and it ran at almost
the same speed!  Any other ideas?

cheers,
Toby

On Friday 01 August 2003 10:33, Rod Taylor wrote:
> How about this? It should spit out the same number, but be quite a bit
> quicker at it.  Untested of course...
>
>
> select  t.thread_id,
>                 t.forum_id,
>                 t.thread_title,
>                 t.thread_owner,
>                 t.thread_owner_id,
>                 t.date_created,
>                 t.thread_hits,
>                 t.poem_reference,
>                 t.bArchived,
>                 count, maxdate
>    from  fbof_thread t
>    LEFT OUTER JOIN
>          (select thread_id
>                , count(msg_id) as count
>                , max(date_created) as maxdate
>             from msg
>         group by thread_id
>         ) as tab ON m.thread_id = t.thread_id
>       where   t.forum_id = 1
>         and             t.bArchived = 0
>         and             t.bSticky = 0
>    order by        t.date_created desc
>
> > can anyone point out to me where I'm going wrong here?  I can't seem to
> > make it faster for the life of me....  I've tried adding indices on all
> > the main fields etc but nada.  I'm not subscribed the list currently so
> > please reply to my address as well as the list.

- --

- --------------------------------
 Life is poetry -    write it in your own words

- --------------------------------

Toby Tremayne
Code Poet and Zen Master of the Heavy Sleep
Senior Technical Consultant
Lyricist Software
www.lyricist.com.au
+61 416 048 090
ICQ: 13107913


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2-rc1-SuSE (GNU/Linux)

iD8DBQE/LOI36KYxVcB16DkRAg7RAKCIALKF4TExS9Q38WiM8jTzRxFctgCgttI3
jbfhQ4GrW2BKPU5uhRoK4rc=
=F+c9
-----END PGP SIGNATURE-----

Re: join optimization problem

From

Tom Lane

Date:

03 August 2003, 14:01:28

Toby Tremayne <toby@lyricist.com.au> writes:
> The output of explain analyze is this:

>                ->  Nested Loop  (cost=0.00..1657643.30 rows=65807 width=568) 
> (actual time=539.82..543404.25 rows=1537 loops=1)
>                      Join Filter: ("inner".thread_id = 
> ("outer".thread_id)::double precision)

Hm.  It would appear that you've been inconsistent about the datatype of
thread_id.  Current PG releases are not capable of using anything smarter
than a nested-loop join when there is a forced datatype coercion
involved in the join condition.  Try making your column datatypes the
same.

Also, do you really need a LEFT JOIN, or would plain inner join do?
It's hard for me to imagine a zero-message thread.  If you do have such
things, do you care whether this query finds them?  (I'm not sure that
eliminating the LEFT would really help any, but it couldn't hurt, since
LEFT restricts the planner's choices.)
        regards, tom lane