Re: Searching for the cause of a bad plan - Mailing list pgsql-performance

From Tom Lane
Subject Re: Searching for the cause of a bad plan
Date
Msg-id 27857.1190417434@sss.pgh.pa.us
Whole thread Raw
In response to Re: Searching for the cause of a bad plan  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: Searching for the cause of a bad plan
List pgsql-performance
Simon Riggs <simon@2ndquadrant.com> writes:
> That's not my perspective. If the LIMIT had been applied accurately to
> the cost then the hashjoin would never even have been close to the
> nested join in the first place.

[ shrug... ]  Your perspective is mistaken.  There is nothing wrong with
the way the LIMIT estimation is being done.  The plan in question was

Limit  (cost=0.00..498511.80 rows=10 width=1804) (actual time=17.729..21.672 rows=2 loops=1)
   ->  Nested Loop  (cost=0.00..1794642.48 rows=36 width=1804) (actual time=17.729..21.671 rows=2 loops=1)
         ->  Index Scan using pk_table_a on table_a ta  (cost=0.00..324880.88 rows=388638 width=16) (actual
time=0.146..0.198rows=2 loops=1) 
               Index Cond: (a = $1)
         ->  Index Scan using pk_table_b2 on table_b2 tb  (cost=0.00..3.77 rows=1 width=1788) (actual
time=10.729..10.731rows=1 loops=2) 
               Index Cond: (ta.b = tb.b)
 Total runtime: 21.876 ms

and there are two fairly serious estimation errors here, neither related
at all to the LIMIT:

* five-orders-of-magnitude overestimate of the number of table_a rows
that will match the condition on a;

* enormous underestimate of the number of join rows --- it's apparently
thinking only 0.0001 of the table_a rows will have a join partner,
whereas at least for this case they all do.

Had the latter estimate been right, the cost of pulling results this
way would indeed have been something like 50K units per joined row,
because of the large number of inner index probes per successful join.

It might be interesting to look into why those estimates are so far
off; the stats Csaba displayed don't seem to have any obvious oddity
that would justify such bizarre results.  But the LIMIT has got
nothing to do with this.

            regards, tom lane

pgsql-performance by date:

Previous
From: brauagustin-susc@yahoo.com.ar
Date:
Subject: Re: Low CPU Usage
Next
From: Denes Daniel
Date:
Subject: Re: Query planner unaware of possibly best plan