Re: order of nested loop - Mailing list pgsql-general

From Joseph Shraibman
Subject Re: order of nested loop
Date
Msg-id 3EEF5E51.4030306@selectacast.net
Whole thread Raw
In response to Re: order of nested loop  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: order of nested loop
List pgsql-general
Tom Lane wrote:
> Joseph Shraibman <jks@selectacast.net> writes:
>
>>The planner shows this for the scan on d:
>>(cost=0.00..2380577.42 rows=525568 width=49)
>>Maybe it thinks it will reach the limit of 25 before it actually does,
>>which is why it is willing to try something so expensive?
>
>
> Yeah, exactly, it's extrapolating that it will only actually have to
> process a relatively small part of that scan.  Which would be true if
> it were getting 4492 rows out of the join per estimate, and not just 1
> per reality.  This is the same estimation failure as in the other plan,
> I think, but it's a lot simpler to see in the other plan.
>
>
>>... Thus it would make sense to first get the entries in u,
>>filter them, then filter by their status in d.
>
>
> Right, but the problem is to know how many u entries will get through
> the filter.  When that estimate is off by a factor of ~5000, it's no
> surprise that the planner is likely to choose the wrong plan.  If you
> could cut that estimation error by even a factor of 2, it would have
> made the right choices here.
>
> So we're back to my previous question: why is that estimate so far off?
> You might try comparing
>     explain select * from usertable where podkey = 20;
>     select count(*) from usertable where podkey = 20;

=> explain select * from usertable where podkey = 20;
                                           QUERY PLAN

-----------------------------------------------------------------------------------------------
  Index Scan using usertable_podkey_key on usertable
(cost=0.00..16019.99 rows=5923 width=625)
    Index Cond: (podkey = 20)
(2 rows)

=> select count(*) from usertable where podkey = 20;
  count
-------
      3
(1 row)

=> select * from pg_stats where tablename = 'usertable' and  attname
in('podkey','status','banned');
  schemaname | tablename | attname | null_frac | avg_width | n_distinct
|             most_common_vals             |
    most_common_freqs                               |
histogram_bounds               | correlation

------------+-----------+---------+-----------+-----------+------------+------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------+-------------
  public     | usertable | podkey  |         0 |         4 |         66
| {<actual numbers deleted, but 20 isn't on of them>} |
{0.208,0.156,0.112,0.0696667,0.0306667,0.028,0.0273333,0.025,0.0243333,0.023}
| {10,90,137,140,197,207,246,264,267,269,300} |     0.53816
  public     | usertable | status  |         0 |         2 |          4
| {2,4,1,3}                                |
{0.938,0.0496667,0.0103333,0.002}
       |                                             |    0.840237
  public     | usertable | banned  |         0 |         1 |          2
| {f,t}                                    | {0.982,0.018}
                                                    |
                           |      0.9339
(3 rows)

=> select count(*) from usertable;
   count
---------
  1121190
(1 row)


> to see whether the estimate is failing on the basic podkey=20 part.
> If that doesn't seem too far off, add in the status = 2 and/or
> (NOT banned) parts to see what confuses it.  I'd like to see the
> pg_stats rows for these three columns, too.
>


> BTW, you have done an ANALYZE recently on usertable, I hope.

Yeah, every week by cron job.



pgsql-general by date:

Previous
From: Michael Meskes
Date:
Subject: Re: postgreSQL on NAS/SAN?
Next
From: Antti Haapala
Date:
Subject: Re: Sort memory not being released