Re: Optimizer use of index slows down query by factor - Mailing list pgsql-performance

From Michael Ruf
Subject Re: Optimizer use of index slows down query by factor
Date
Msg-id 4B4591EC.6030709@inxmail.de
Whole thread Raw
In response to Re: Optimizer use of index slows down query by factor  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Hi,

Tom Lane wrote:
 >
 > I think you need to see about getting this rowcount estimate to be more
 > accurate:
 >
 >>                             ->  Index Scan using idx_link_1 on link
 >> (cost=0.00..680.51 rows=13477 width=26) (actual time=5.707..12.043
 >> rows=126 loops=1)
 >>                                   Index Cond: (task_id = 1556)
 >>                                   Filter: (((deletable IS NULL) OR (NOT
 >> deletable)) AND ((link_type = 8) OR (link_type = 9)))
 >
 > If it realized there'd be only 126 rows out of that scan, it'd probably
 > have gone for a nestloop join against the big table, which I think would
 > be noticeably faster than either of the plans you show here.
 >
 > You already did crank up default_statistics_target, so I'm not sure if
 > raising it further would help any.

After i've increased the statistic target for the specific column on the
link table "alter table link alter task_id set statistics 200;", the sql
runs fine ( < 1 second ):

Limit  (cost=448478.40..448492.17 rows=1 width=30) (actual
time=850.698..860.838 rows=12 loops=1)
    ->  GroupAggregate  (cost=448478.40..448492.17 rows=1 width=30)
(actual time=850.695..860.824 rows=12 loops=1)
          ->  Sort  (cost=448478.40..448481.15 rows=1100 width=30)
(actual time=850.569..853.985 rows=6445 loops=1)
                Sort Key: link.link_type, link.link_alias
                Sort Method:  quicksort  Memory: 696kB
                ->  Nested Loop Left Join  (cost=0.00..448422.84
rows=1100 width=30) (actual time=819.519..838.422 rows=6445 loops=1)
                      ->  Seq Scan on link  (cost=0.00..142722.52
rows=203 width=26) (actual time=819.486..820.016 rows=126 loops=1)
                            Filter: (((deletable IS NULL) OR (NOT
deletable)) AND (task_id = 1556) AND ((link_type = 8) OR (link_type = 9)))
                      ->  Index Scan using idx_click_1 on click
(cost=0.00..1370.01 rows=10872 width=12) (actual time=0.003..0.088
rows=51 loops=126)
                            Index Cond: (link.link_id = click.link_id)
  Total runtime: 860.929 ms


 > What I'd suggest is trying to avoid
 > using non-independent AND/OR conditions.  For instance recasting the
 > first OR as just "deletable is not true" would probably result in a
 > better estimate.  The size of the error seems to be more than that would
 > account for though, so I suspect that the deletable and link_type
 > conditions are interdependent.  Is it practical to recast your data
 > representation to avoid that?
 >

I've tried that, but with no positive/negative effects.

Thanks for your help.

Michael

pgsql-performance by date:

Previous
From: "Carlo Stonebanks"
Date:
Subject: Massive table (500M rows) update nightmare
Next
From: Scott Marlowe
Date:
Subject: Re: Massive table (500M rows) update nightmare