Re: slow joins? - Mailing list pgsql-performance

From Greg Williamson
Subject Re: slow joins?
Date
Msg-id 1365219176.2155.YahooMailNeo@web125904.mail.ne1.yahoo.com
Whole thread Raw
In response to Re: slow joins?  (Joe Van Dyk <joe@tanga.com>)
List pgsql-performance

Joe --

>________________________________
> From: Joe Van Dyk <joe@tanga.com>
>To: Greg Williamson <gwilliamson39@yahoo.com>
>Cc: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
>Sent: Friday, April 5, 2013 7:56 PM
>Subject: Re: [PERFORM] slow joins?
>
>
>On Fri, Apr 5, 2013 at 6:54 PM, Greg Williamson <gwilliamson39@yahoo.com> wrote:
>
>Joe --
>>
>>>________________________________
>>> From: Joe Van Dyk <joe@tanga.com>
>>>To: pgsql-performance@postgresql.org
>>>Sent: Friday, April 5, 2013 6:42 PM
>>>Subject: Re: [PERFORM] slow joins?
>>
>>>
>>>
>>>(https://gist.github.com/joevandyk/df0df703f3fda6d14ae1/raw/c15cae813913b7f8c35b24b467a0c732c0100d79/gistfile1.txt
showsa non-wrapped version of the queries and plan) 
>>>
>>>
>>>
>>>
>>>On Fri, Apr 5, 2013 at 6:38 PM, Joe Van Dyk <joe@tanga.com> wrote:
>>>
>>>On 9.2.4, running two identical queries except for the value of a column in the WHERE clause. Postgres is picking
verydifferent query plans, the first is much slower than the second. 
>>>>
>>>>
>>>>Any ideas on how I can speed this up?  I have btree indexes for all the columns used in the query.
>>>>
>>>>explain analyze                                                                                    
>>>>SELECT COUNT(*)                                                                                    
>>>>FROM purchased_items pi                                                                            
>>>>inner join line_items li on li.id = pi.line_item_id                                                
>>>>inner join products      on products.id = li.product_id                                            
>>>>WHERE products.drop_shipper_id = 221;
>>>>
>>>> Aggregate  (cost=193356.31..193356.32 rows=1 width=0) (actual time=2425.225..2425.225 rows=1 loops=1)
>>>>   ->  Hash Join  (cost=78864.43..193160.41 rows=78360 width=0) (actual time=726.612..2424.206 rows=8413 loops=1)
>>>>         Hash Cond: (pi.line_item_id = li.id)
>>>>         ->  Seq Scan on purchased_items pi  (cost=0.00..60912.39 rows=3724639 width=4) (actual time=0.008..616.812
rows=3724639loops=1) 
>>>>         ->  Hash  (cost=77937.19..77937.19 rows=56499 width=4) (actual time=726.231..726.231 rows=8178 loops=1)
>>>>               Buckets: 4096  Batches: 4  Memory Usage: 73kB
>>>>               ->  Hash Join  (cost=1684.33..77937.19 rows=56499 width=4) (actual time=1.270..723.222 rows=8178
loops=1)
>>>>                     Hash Cond: (li.product_id = products.id)
>>>>                     ->  Seq Scan on line_items li  (cost=0.00..65617.18 rows=2685518 width=8) (actual
time=0.081..392.926rows=2685499 loops=1) 
>>>>                     ->  Hash  (cost=1676.60..1676.60 rows=618 width=4) (actual time=0.835..0.835 rows=618 loops=1)
>>>>                           Buckets: 1024  Batches: 1  Memory Usage: 22kB
>>>>                           ->  Bitmap Heap Scan on products  (cost=13.07..1676.60 rows=618 width=4) (actual
time=0.185..0.752rows=618 loops=1) 
>>>>                                 Recheck Cond: (drop_shipper_id = 221)
>>>>                                 ->  Bitmap Index Scan on index_products_on_drop_shipper_id  (cost=0.00..12.92
rows=618width=0) (actual time=0.125..0.125 rows=618 loops=1) 
>>>>                                       Index Cond: (drop_shipper_id = 221)
>>>> Total runtime: 2425.302 ms
>>>>
>>>>
>>>>explain analyze                                                                                    
>>>>SELECT COUNT(*)                                                                                    
>>>>FROM purchased_items pi                                                                            
>>>>inner join line_items li on li.id = pi.line_item_id                                                
>>>>inner join products      on products.id = li.product_id                                            
>>>>WHERE products.drop_shipper_id = 2;                                                                
>>>>                                                                                                                   
 
>>>>
>>>> Aggregate  (cost=29260.40..29260.41 rows=1 width=0) (actual time=0.906..0.906 rows=1 loops=1)
>>>>   ->  Nested Loop  (cost=0.00..29254.38 rows=2409 width=0) (actual time=0.029..0.877 rows=172 loops=1)
>>>>         ->  Nested Loop  (cost=0.00..16011.70 rows=1737 width=4) (actual time=0.021..0.383 rows=167 loops=1)
>>>>               ->  Index Scan using index_products_on_drop_shipper_id on products  (cost=0.00..80.41 rows=19
width=4)(actual time=0.010..0.074 rows=70 loops=1) 
>>>>                     Index Cond: (drop_shipper_id = 2)
>>>>               ->  Index Scan using index_line_items_on_product_id on line_items li  (cost=0.00..835.70 rows=279
width=8)(actual time=0.002..0.004 rows=2 loops=70) 
>>>>                     Index Cond: (product_id = products.id)
>>>>         ->  Index Only Scan using purchased_items_line_item_id_idx on purchased_items pi  (cost=0.00..7.60 rows=2
width=4)(actual time=0.002..0.003 rows=1 loops=167) 
>>>>               Index Cond: (line_item_id = li.id)
>>>>               Heap Fetches: 5
>>>> Total runtime: 0.955 ms
>>>>(11 rows)
>>>>
>>>
>>
>>
>>Does drop_shipper+id have a much larger number of rows which is making the scanner want to avoid an indexed scan or
otherwiseprefer a sequential scan on products and on line_items ? 
>>
>
>
>Assuming you mean products.drop_shipper_id? There are more rows matched for the first one vs the second one. 
>70 products rows match drop_shipper_id=2, 618 match drop_shipper_id=221.
> 
>What are the stats settings for these tables ?
>>
>
>
>Whatever the defaults are.
> 

I mis-pasted the tables -- both line_items and purchased items are getting sequential scans for the relevant rows; it
ispossible that that there's enough difference to tip the planner to use sequential scans. 

You might try increasing the stats being collected on those two tables, run analyze on all the tables in the query, and
tryit again. 

GW



pgsql-performance by date:

Previous
From: Joe Van Dyk
Date:
Subject: Re: slow joins?
Next
From: Franck Routier
Date:
Subject: Re: What happens between end of explain analyze and end of query execution ?