Re: order of nested loop - Mailing list pgsql-general
From | Joseph Shraibman |
---|---|
Subject | Re: order of nested loop |
Date | |
Msg-id | 3EEF5E51.4030306@selectacast.net Whole thread Raw |
In response to | Re: order of nested loop (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: order of nested loop
|
List | pgsql-general |
Tom Lane wrote: > Joseph Shraibman <jks@selectacast.net> writes: > >>The planner shows this for the scan on d: >>(cost=0.00..2380577.42 rows=525568 width=49) >>Maybe it thinks it will reach the limit of 25 before it actually does, >>which is why it is willing to try something so expensive? > > > Yeah, exactly, it's extrapolating that it will only actually have to > process a relatively small part of that scan. Which would be true if > it were getting 4492 rows out of the join per estimate, and not just 1 > per reality. This is the same estimation failure as in the other plan, > I think, but it's a lot simpler to see in the other plan. > > >>... Thus it would make sense to first get the entries in u, >>filter them, then filter by their status in d. > > > Right, but the problem is to know how many u entries will get through > the filter. When that estimate is off by a factor of ~5000, it's no > surprise that the planner is likely to choose the wrong plan. If you > could cut that estimation error by even a factor of 2, it would have > made the right choices here. > > So we're back to my previous question: why is that estimate so far off? > You might try comparing > explain select * from usertable where podkey = 20; > select count(*) from usertable where podkey = 20; => explain select * from usertable where podkey = 20; QUERY PLAN ----------------------------------------------------------------------------------------------- Index Scan using usertable_podkey_key on usertable (cost=0.00..16019.99 rows=5923 width=625) Index Cond: (podkey = 20) (2 rows) => select count(*) from usertable where podkey = 20; count ------- 3 (1 row) => select * from pg_stats where tablename = 'usertable' and attname in('podkey','status','banned'); schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation ------------+-----------+---------+-----------+-----------+------------+------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------+------------- public | usertable | podkey | 0 | 4 | 66 | {<actual numbers deleted, but 20 isn't on of them>} | {0.208,0.156,0.112,0.0696667,0.0306667,0.028,0.0273333,0.025,0.0243333,0.023} | {10,90,137,140,197,207,246,264,267,269,300} | 0.53816 public | usertable | status | 0 | 2 | 4 | {2,4,1,3} | {0.938,0.0496667,0.0103333,0.002} | | 0.840237 public | usertable | banned | 0 | 1 | 2 | {f,t} | {0.982,0.018} | | 0.9339 (3 rows) => select count(*) from usertable; count --------- 1121190 (1 row) > to see whether the estimate is failing on the basic podkey=20 part. > If that doesn't seem too far off, add in the status = 2 and/or > (NOT banned) parts to see what confuses it. I'd like to see the > pg_stats rows for these three columns, too. > > BTW, you have done an ANALYZE recently on usertable, I hope. Yeah, every week by cron job.
pgsql-general by date: