Re: Lot'sa joins - performance tip-up, please? - Mailing list pgsql-performance

From Mario Splivalo
Subject Re: Lot'sa joins - performance tip-up, please?
Date
Msg-id 1146753957.8538.30.camel@localhost.localdomain
Whole thread Raw
In response to Re: Lot'sa joins - performance tip-up, please?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Lot'sa joins - performance tip-up, please?
List pgsql-performance
On Wed, 2006-05-03 at 13:58 -0400, Tom Lane wrote:
> Mario Splivalo <msplival@jagor.srce.hr> writes:
> > I have a quite large query that takes over a minute to run on my laptop.
>
> The EXPLAIN output you provided doesn't seem to agree with the stated
> query.  Where'd the "service_id = 1102" condition come from?

I guess I copypasted the additional WHERE to te EXPLAIN ANALYZE query.
This is the correct one, without the WHERE:

 Hash Left Join  (cost=198628.35..202770.61 rows=121 width=264) (actual
time=998008.264..999645.322 rows=5706 loops=1)
   Hash Cond: ("outer".message_id = "inner".message_id)
   ->  Merge Left Join  (cost=21943.23..21950.96 rows=121 width=238)
(actual time=4375.510..4540.772 rows=5706 loops=1)
         Merge Cond: ("outer".message_id = "inner".message_id)
         ->  Sort  (cost=21847.62..21847.92 rows=121 width=230) (actual
time=3304.787..3378.515 rows=5706 loops=1)
               Sort Key: messages.id
               ->  Hash Join  (cost=20250.16..21843.43 rows=121
width=230) (actual time=1617.370..3102.470 rows=5706 loops=1)
                     Hash Cond: ("outer".message_id = "inner".id)
                     ->  Seq Scan on ticketing_messages
(cost=0.00..1212.37 rows=75937 width=14) (actual time=10.554..609.967
rows=75937 loops=1)
                     ->  Hash  (cost=20244.19..20244.19 rows=2391
width=216) (actual time=1572.889..1572.889 rows=5706 loops=1)
                           ->  Nested Loop  (cost=1519.21..20244.19
rows=2391 width=216) (actual time=385.582..1449.207 rows=5706 loops=1)
                                 ->  Seq Scan on services
(cost=0.00..4.20 rows=3 width=54) (actual time=20.829..20.859 rows=2
loops=1)
                                       Filter: (type_id = 10)
                                 ->  Bitmap Heap Scan on messages
(cost=1519.21..6726.74 rows=1594 width=162) (actual
time=182.346..678.800 rows=2853 loops=2)
                                       Recheck Cond: (("outer".id =
messages.service_id) AND (messages.receiving_time >= '2006-02-12
00:00:00+01'::timestamp with time zone) AND (messages.receiving_time <=
'2006-03-18 23:00:00+01'::timestamp with time zone))
                                       ->  BitmapAnd
(cost=1519.21..1519.21 rows=1594 width=0) (actual time=164.311..164.311
rows=0 loops=2)
                                             ->  Bitmap Index Scan on
idx_service_id  (cost=0.00..84.10 rows=14599 width=0) (actual
time=66.809..66.809 rows=37968 loops=2)
                                                   Index Cond:
("outer".id = messages.service_id)
                                             ->  Bitmap Index Scan on
idx_messages_receiving_time  (cost=0.00..1434.87 rows=164144 width=0)
(actual time=192.633..192.633 rows=184741 loops=1)
                                                   Index Cond:
((receiving_time >= '2006-02-12 00:00:00+01'::timestamp with time zone)
AND (receiving_time <= '2006-03-18 23:00:00+01'::timestamp with time
zone))
         ->  Sort  (cost=95.62..99.17 rows=1421 width=8) (actual
time=1070.678..1072.999 rows=482 loops=1)
               Sort Key: ticketing_winners.message_id
               ->  Seq Scan on ticketing_winners  (cost=0.00..21.21
rows=1421 width=8) (actual time=424.836..1061.834 rows=1421 loops=1)
   ->  Hash  (cost=176144.30..176144.30 rows=57925 width=26) (actual
time=993592.980..993592.980 rows=57925 loops=1)
         ->  Nested Loop  (cost=0.00..176144.30 rows=57925 width=26)
(actual time=1074.984..992536.243 rows=57925 loops=1)
               ->  Seq Scan on ticketing_codes_played
(cost=0.00..863.25 rows=57925 width=8) (actual time=74.479..2047.993
rows=57925 loops=1)
               ->  Index Scan using ticketing_codes_pk on
ticketing_codes  (cost=0.00..3.01 rows=1 width=18) (actual
time=17.044..17.052 rows=1 loops=57925)
                     Index Cond: (ticketing_codes.code_id =
"outer".code_id)
 Total runtime: 999778.981 ms


> In general, I'd suggest playing around with the join order.  Existing
> releases of PG tend to throw up their hands when faced with a mixture of
> outer joins and regular joins, and just join the tables in the order
> listed.  8.2 will be smarter about this, but for now you have to do it
> by hand ...

No luck for me there. But, I found out that if I first do join on
ticketing_codes and ticketing_codes_played, put the result to temporary
table, and then join that temporary table with the rest of the query
(the SELECT that is in parenthesis is transfered to a temporary table)
the query is almost twice as fast.

As mentioned before, ticketing_codes has 11000000 records.

    Mario

P.S. Is it just me, or posting to psql-perofrmance is laged, quite a
bit?


pgsql-performance by date:

Previous
From: Mario Splivalo
Date:
Subject: Re: Lot'sa joins - performance tip-up, please?
Next
From: "Gregory Stewart"
Date:
Subject: Re: Performance Issues on Opteron Dual Core