Thread: not in(subselect) in 8.4

not in(subselect) in 8.4

From

Grzegorz Jaśkiewicz

Date:

20 February 2009, 05:56:16

Just as a question to Tom and team,
I saw a post a bit ago, about plans for 8.4, and Tom said it is very
likely that 8.4 will rewrite subselects into left joins, is it still
in plans?

I mean query like:
select id from foo where id not in ( select id from bar);
into:

select f.id from foo f left join bar b on f.id=b.id where b.id is null;

the latter is most often much much faster on 8.1-8.3;

thanks.

--
GJ

Re: not in(subselect) in 8.4

From

Grzegorz Jaśkiewicz

Date:

20 February 2009, 08:57:59

On Fri, Feb 20, 2009 at 11:14 AM, marcin mank <marcin.mank@gmail.com> wrote:
>> Just as a question to Tom and team,
>
> maybe it`s time for asktom.postgresql.org?  Oracle has it :)

hehe,
on the other hand - that would make my ppl here very skilfull, the
only reason I started to praise them about joins, and stuff - is
because subselects were slow. (no wonder, when you check two tables
against each other, and each holds few M of rows).

--
GJ

Re: not in(subselect) in 8.4

From

marcin mank

Date:

20 February 2009, 09:07:28

> Just as a question to Tom and team,

maybe it`s time for asktom.postgresql.org?  Oracle has it :)

Re: not in(subselect) in 8.4

From

Tom Lane

Date:

20 February 2009, 11:34:04

=?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= <gryzman@gmail.com> writes:
> I mean query like:
> select id from foo where id not in ( select id from bar);
> into:
> select f.id from foo f left join bar b on f.id=b.id where b.id is null;

Postgres does not do that, because they don't mean the same thing ---
the behavior for NULLs in bar.id is different.

8.4 does understand that NOT EXISTS is an antijoin, though.

            regards, tom lane

Re: not in(subselect) in 8.4

From

Grzegorz Jaśkiewicz

Date:

20 February 2009, 11:40:03

On Fri, Feb 20, 2009 at 3:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> =?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= <gryzman@gmail.com> writes:
>> I mean query like:
>> select id from foo where id not in ( select id from bar);
>> into:
>> select f.id from foo f left join bar b on f.id=b.id where b.id is null;
>
> Postgres does not do that, because they don't mean the same thing ---
> the behavior for NULLs in bar.id is different.
yes, the obvious assumption here is that all columns are 'not null';


> 8.4 does understand that NOT EXISTS is an antijoin, though.

Yes, I noticed that it actually assumes lesser cost.



--
GJ

Re: not in(subselect) in 8.4

From

Rodrigo E. De León Plicet

Date:

20 February 2009, 20:03:16

On Fri, Feb 20, 2009 at 6:14 AM, marcin mank <marcin.mank@gmail.com> wrote:
> On Fri, Feb 20, 2009 at 4:56 AM, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote:
>> Just as a question to Tom and team,
>
> maybe it`s time for asktom.postgresql.org?  Oracle has it :)

+1

Re: not in(subselect) in 8.4

From

Grzegorz Jaśkiewicz

Date:

21 February 2009, 07:49:37

after your recent commit Tom, the cost is sky-high, and also it takes
ages again with subselect version. In case of two table join. I have
to try the three way one.

Re: not in(subselect) in 8.4

From

Tom Lane

Date:

21 February 2009, 12:56:55

=?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= <gryzman@gmail.com> writes:
> after your recent commit Tom, the cost is sky-high, and also it takes
> ages again with subselect version. In case of two table join. I have
> to try the three way one.

Which commit, and what example are you talking about?

            regards, tom lane

Re: not in(subselect) in 8.4

From

Grzegorz Jaśkiewicz

Date:

21 February 2009, 13:10:37

the foo bar example above, with notion that all columns are NOT NULL
behaves much different now. I noticed, that some of the 'anti join'
stuff has changed in cvs recently, but I don't know if that's to
blame.
Basically, what I can see, is that the subselect case is no longer of
lower cost, to the left join - but is quite substantially more
expensive.

Just an observation, I don't intend to use subselects anyhow, because
these are very much slower on 8.3, which we use in production here.

Re: not in(subselect) in 8.4

From

Tom Lane

Date:

21 February 2009, 13:43:17

=?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= <gryzman@gmail.com> writes:
> the foo bar example above, with notion that all columns are NOT NULL
> behaves much different now.

AFAIK the treatment of NOT IN subselects hasn't changed a bit since 8.3.
So I still find your complaint uninformative.

            regards, tom lane

Re: not in(subselect) in 8.4

From

Scott Carey

Date:

21 February 2009, 15:29:20

Are there any optimizations planned for the case where columns are defined as NOT NULL?  Or other special path
filteringfor cases where the planner can know that the set of values in the subselect won't contain NULLs  (such as in
(selecta from b where (a > 0 and a < 10000). 

It turns out to be a rare use case for someone to write a subselect for a NOT IN  or IN clause that will have NULL
values. In the common case, the subselect does not contain nulls.  I would like to see Postgres optimize for the common
case.

________________________________________
From: pgsql-performance-owner@postgresql.org [pgsql-performance-owner@postgresql.org] On Behalf Of Tom Lane
[tgl@sss.pgh.pa.us]
Sent: Friday, February 20, 2009 7:33 AM
To: Grzegorz Jaśkiewicz
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] not in(subselect) in 8.4

=?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= <gryzman@gmail.com> writes:
> I mean query like:
> select id from foo where id not in ( select id from bar);
> into:
> select f.id from foo f left join bar b on f.id=b.id where b.id is null;

Postgres does not do that, because they don't mean the same thing ---
the behavior for NULLs in bar.id is different.

8.4 does understand that NOT EXISTS is an antijoin, though.

                        regards, tom lane

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: not in(subselect) in 8.4

From

Tom Lane

Date:

21 February 2009, 23:41:50

Scott Carey <scott@richrelevance.com> writes:
> Are there any optimizations planned for the case where columns are
> defined as NOT NULL?

We might get around to recognizing that case as an antijoin sometime.
It's nontrivial though, because you have to check for an intermediate
outer join causing the column to be possibly nullable after all.

> It turns out to be a rare use case for someone to write a subselect
> for a NOT IN  or IN clause that will have NULL values.

Judging from the steady flow of "why doesn't my NOT IN query work"
newbie questions, I don't think it's so rare as all that.

There's surely some population of people who know enough or could be
trained to be careful about using NOT NULL columns, but they could also
be trained to use NOT EXISTS, and dodge the whole bullet from the start.

            regards, tom lane

Re: not in(subselect) in 8.4

From

Robert Haas

Date:

22 February 2009, 00:39:56

On Sat, Feb 21, 2009 at 10:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Scott Carey <scott@richrelevance.com> writes:
>> Are there any optimizations planned for the case where columns are
>> defined as NOT NULL?
>
> We might get around to recognizing that case as an antijoin sometime.
> It's nontrivial though, because you have to check for an intermediate
> outer join causing the column to be possibly nullable after all.
>
>> It turns out to be a rare use case for someone to write a subselect
>> for a NOT IN  or IN clause that will have NULL values.
>
> Judging from the steady flow of "why doesn't my NOT IN query work"
> newbie questions, I don't think it's so rare as all that.

I think it's rare to do it on purpose, precisely because of the weird
semantics we all hate.  I have done it by accident, more than once,
and then fixed it by adding WHERE blah IS NOT NULL to the subquery.
So I think Scott is basically right.

> There's surely some population of people who know enough or could be
> trained to be careful about using NOT NULL columns, but they could also
> be trained to use NOT EXISTS, and dodge the whole bullet from the start.

There are far more important reasons to make columns NOT NULL than
avoiding strange results from NOT IN.  Personally, I have gotten used
to the fact that the planner sucks at handling NOT IN and so always
write LEFT JOIN ... WHERE pk IS NULL, so it's not important to me that
we fix it.  But it's certainly a foot-gun for the inexperienced, as it
is both the most compact and (at least IMO) the most intuitive
formulation of an anti-join.

...Robert

Re: not in(subselect) in 8.4

From

Grzegorz Jaśkiewicz

Date:

22 February 2009, 15:10:22

but then you have 10 questions a week from windows people about
password, and yet you haven't remove that :P