Thread: not in(subselect) in 8.4
Just as a question to Tom and team, I saw a post a bit ago, about plans for 8.4, and Tom said it is very likely that 8.4 will rewrite subselects into left joins, is it still in plans? I mean query like: select id from foo where id not in ( select id from bar); into: select f.id from foo f left join bar b on f.id=b.id where b.id is null; the latter is most often much much faster on 8.1-8.3; thanks. -- GJ
On Fri, Feb 20, 2009 at 11:14 AM, marcin mank <marcin.mank@gmail.com> wrote: >> Just as a question to Tom and team, > > maybe it`s time for asktom.postgresql.org? Oracle has it :) hehe, on the other hand - that would make my ppl here very skilfull, the only reason I started to praise them about joins, and stuff - is because subselects were slow. (no wonder, when you check two tables against each other, and each holds few M of rows). -- GJ
> Just as a question to Tom and team, maybe it`s time for asktom.postgresql.org? Oracle has it :)
=?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= <gryzman@gmail.com> writes: > I mean query like: > select id from foo where id not in ( select id from bar); > into: > select f.id from foo f left join bar b on f.id=b.id where b.id is null; Postgres does not do that, because they don't mean the same thing --- the behavior for NULLs in bar.id is different. 8.4 does understand that NOT EXISTS is an antijoin, though. regards, tom lane
On Fri, Feb 20, 2009 at 3:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > =?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= <gryzman@gmail.com> writes: >> I mean query like: >> select id from foo where id not in ( select id from bar); >> into: >> select f.id from foo f left join bar b on f.id=b.id where b.id is null; > > Postgres does not do that, because they don't mean the same thing --- > the behavior for NULLs in bar.id is different. yes, the obvious assumption here is that all columns are 'not null'; > 8.4 does understand that NOT EXISTS is an antijoin, though. Yes, I noticed that it actually assumes lesser cost. -- GJ
On Fri, Feb 20, 2009 at 6:14 AM, marcin mank <marcin.mank@gmail.com> wrote: > On Fri, Feb 20, 2009 at 4:56 AM, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote: >> Just as a question to Tom and team, > > maybe it`s time for asktom.postgresql.org? Oracle has it :) +1
after your recent commit Tom, the cost is sky-high, and also it takes ages again with subselect version. In case of two table join. I have to try the three way one.
=?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= <gryzman@gmail.com> writes: > after your recent commit Tom, the cost is sky-high, and also it takes > ages again with subselect version. In case of two table join. I have > to try the three way one. Which commit, and what example are you talking about? regards, tom lane
the foo bar example above, with notion that all columns are NOT NULL behaves much different now. I noticed, that some of the 'anti join' stuff has changed in cvs recently, but I don't know if that's to blame. Basically, what I can see, is that the subselect case is no longer of lower cost, to the left join - but is quite substantially more expensive. Just an observation, I don't intend to use subselects anyhow, because these are very much slower on 8.3, which we use in production here.
=?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= <gryzman@gmail.com> writes: > the foo bar example above, with notion that all columns are NOT NULL > behaves much different now. AFAIK the treatment of NOT IN subselects hasn't changed a bit since 8.3. So I still find your complaint uninformative. regards, tom lane
Are there any optimizations planned for the case where columns are defined as NOT NULL? Or other special path filteringfor cases where the planner can know that the set of values in the subselect won't contain NULLs (such as in (selecta from b where (a > 0 and a < 10000). It turns out to be a rare use case for someone to write a subselect for a NOT IN or IN clause that will have NULL values. In the common case, the subselect does not contain nulls. I would like to see Postgres optimize for the common case. ________________________________________ From: pgsql-performance-owner@postgresql.org [pgsql-performance-owner@postgresql.org] On Behalf Of Tom Lane [tgl@sss.pgh.pa.us] Sent: Friday, February 20, 2009 7:33 AM To: Grzegorz Jaśkiewicz Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] not in(subselect) in 8.4 =?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= <gryzman@gmail.com> writes: > I mean query like: > select id from foo where id not in ( select id from bar); > into: > select f.id from foo f left join bar b on f.id=b.id where b.id is null; Postgres does not do that, because they don't mean the same thing --- the behavior for NULLs in bar.id is different. 8.4 does understand that NOT EXISTS is an antijoin, though. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Scott Carey <scott@richrelevance.com> writes: > Are there any optimizations planned for the case where columns are > defined as NOT NULL? We might get around to recognizing that case as an antijoin sometime. It's nontrivial though, because you have to check for an intermediate outer join causing the column to be possibly nullable after all. > It turns out to be a rare use case for someone to write a subselect > for a NOT IN or IN clause that will have NULL values. Judging from the steady flow of "why doesn't my NOT IN query work" newbie questions, I don't think it's so rare as all that. There's surely some population of people who know enough or could be trained to be careful about using NOT NULL columns, but they could also be trained to use NOT EXISTS, and dodge the whole bullet from the start. regards, tom lane
On Sat, Feb 21, 2009 at 10:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Scott Carey <scott@richrelevance.com> writes: >> Are there any optimizations planned for the case where columns are >> defined as NOT NULL? > > We might get around to recognizing that case as an antijoin sometime. > It's nontrivial though, because you have to check for an intermediate > outer join causing the column to be possibly nullable after all. > >> It turns out to be a rare use case for someone to write a subselect >> for a NOT IN or IN clause that will have NULL values. > > Judging from the steady flow of "why doesn't my NOT IN query work" > newbie questions, I don't think it's so rare as all that. I think it's rare to do it on purpose, precisely because of the weird semantics we all hate. I have done it by accident, more than once, and then fixed it by adding WHERE blah IS NOT NULL to the subquery. So I think Scott is basically right. > There's surely some population of people who know enough or could be > trained to be careful about using NOT NULL columns, but they could also > be trained to use NOT EXISTS, and dodge the whole bullet from the start. There are far more important reasons to make columns NOT NULL than avoiding strange results from NOT IN. Personally, I have gotten used to the fact that the planner sucks at handling NOT IN and so always write LEFT JOIN ... WHERE pk IS NULL, so it's not important to me that we fix it. But it's certainly a foot-gun for the inexperienced, as it is both the most compact and (at least IMO) the most intuitive formulation of an anti-join. ...Robert
but then you have 10 questions a week from windows people about password, and yet you haven't remove that :P