Re: BUG #5059: Planner ignores estimates when planning an IN () subquery - Mailing list pgsql-bugs
From | Kenaniah Cerny |
---|---|
Subject | Re: BUG #5059: Planner ignores estimates when planning an IN () subquery |
Date | |
Msg-id | f647f4600909162232h27d115f4of2e299177a1d3a67@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #5059: Planner ignores estimates when planning an IN () subquery (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-bugs |
My apologies, I meant user_activity_log, and not user_anime_log in the previous email. This isn't a huge issue as the query could easily be rewritten using a more pragmatic join structure, but I thought it would be worth mentioning as occasionally bugs involve more than just what meets the eye. Thank you for the help. On Wed, Sep 16, 2009 at 7:35 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Sep 16, 2009 at 6:39 PM, Kenaniah Cerny <kenaniah@gmail.com> > wrote: > > I can provide the output of statistics queries if you would like. Just > let > > me know which statistics you want and I'll pastebin them. > > > > As far as selectivity goes, the selectivity estimate for the > > user_anime_log.user_account_id was definitely miscalculated. The > > user_anime_log contains up to 15 entries per user (70,000 users, but only > > 475,811 rows). The default statistics target on that relation is set to > > 1000. But even with poor statistics, guessing 62,000 rows when there's a > > maximum of 15 per user account is still quite a miss. > > You've changed the table name on me, vs. what you pasted in the query, > which had a user_activity_log but no user_anime_log... > > > Analysis of SELECT * FROM user_activity_log WHERE user_account_id = > 17238; > > estimates 13 rows and returns 15, which is quite reasonable considering > the > > statistics targets. > > > > Please forgive my ignorance, but in the case of my subquery, is the > > estimated number of rows and cost being taken into account (or only the > > selectivity)? > > Well, selectivity is just a term that refers to the fraction of rows > that match some condition (rows themselves do not have selectivity). > Usually the initial estimating is done in terms of selectivity, which > is then multiplied by the total number of rows to find the number of > rows that will remain after the condition is applied. > > So, yes, rows and cost are taken into account. The problem here is > that the planner is mis-estimating the selectivity, therefore it > computes the wrong number of rows (way too high), therefore it makes > the wrong decision. > > > Granted I don't understand much about the planner internals, > > but it seems strange that a nested loop semi join would be chosen when > the > > inner table is estimated to return an extremely low number of rows with > low > > cost and low selectivity. Shouldn't the planner also estimate the cost of > an > > inner (er, left?) join in that scenario? > > Well... you can't replace a semi join with an inner or left join, > because it doesn't do the same thing. You could use a hash semi join > or merge join semi join, but that doesn't make sense if, as you say, > the inner table is estimated to return an extremely low number of > rows. > > It might be a bit easier to analyze this if you stripped out all the > joins that aren't necessary to reproduce the problem. Also, I would > prefer EXPLAIN ANALYZE output posted in-line to the mailing list > rather than pasted to a separate web site - it screws up the > formatting. > > But honestly I'm not sure how much time it's worth spending on this. > You have a way to rewrite the query that works... and fixing the > estimator is going to be hard... so I suggest doing it the way that > works, and moving on! > > ...Robert >
pgsql-bugs by date: