Re: Allowing NOT IN to use ANTI joins - Mailing list pgsql-hackers

From David Rowley
Subject Re: Allowing NOT IN to use ANTI joins
Date
Msg-id CAApHDvqX1vB6qYx6GhFNf0W4ygATDWVJFom809OPL-AGrSeiuA@mail.gmail.com
Whole thread Raw
In response to Re: Allowing NOT IN to use ANTI joins  (Marti Raudsepp <marti@juffo.org>)
Responses Re: Allowing NOT IN to use ANTI joins  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Wed, Jun 11, 2014 at 9:32 PM, Marti Raudsepp <marti@juffo.org> wrote:
On Sun, Jun 8, 2014 at 3:36 PM, David Rowley <dgrowleyml@gmail.com> wrote:
> Currently pull_up_sublinks_qual_recurse only changes the plan for NOT EXISTS
> queries and leaves NOT IN alone. The reason for this is because the values
> returned by a subquery in the IN clause could have NULLs.

There's a bug in targetListIsGuaranteedNotToHaveNulls, you have to
drill deeper into the query to guarantee the nullability of a result
column. If a table is OUTER JOINed, it can return NULLs even if the
original column specification has NOT NULL.

This test case produces incorrect results with your patch:

create table a (x int not null);
create table b (x int not null, y int not null);
insert into a values(1);
select * from a where x not in (select y from a left join b using (x));

Unpatched version correctly returns 0 rows since "y" will be NULL.
Your patch returns the value 1 from a.


I'm a bit stuck on fixing this and I can't quite figure out how I should tell if the TargetEntry is coming from an outer join.

My first attempt does not work as it seems that I'm looking up the wrong RangeTblEntry with the following:

rte =  rt_fetch(tlevar->varno, query->rtable);

if (IS_OUTER_JOIN(rte->jointype))
return true; /* Var from an outer join */

The jointype returns JOIN_INNER when loooking up the RangeTblEntry from the TargetEntry's varno. It seems that the RangeTblEntry that I need is stored in query->rtable, but I've just no idea how to tell which item in the list it is. So if anyone can point me in the right direction then that would be really useful.

On a more positive or even slightly exciting note I think I've managed to devise a way that ANTI JOINS can be used for NOT IN much more often. It seems that find_nonnullable_vars will analyse a quals list to find expressions that mean that the var cannot be NULL. This means we can perform ANTI JOINS for NOT IN with queries like:

SELECT * FROM a WHERE id NOT IN(SELECT nullable_col FROM b WHERE nullable_col = 1);
or
SELECT * FROM a WHERE id NOT IN(SELECT nullable_col FROM b WHERE nullable_col IS NOT NULL);

(The attached patch implements this)

the nullable_col =1 will mean that nullable_col cannot be NULL, so the ANTI JOIN can be performed safely. I think this combined with the NOT NULL check will cover probably just about all valid uses of NOT IN with a subquery... unless of course I've assumed something wrongly about find_nonnullable_vars. I just need the correct RangeTblEntry in order to determine if the TargetEntry is from an out join.

The attached patch is a broken implemention that still needs the lookup code fixed to reference the correct RTE. The failing regression tests show where the problems lie.

Any help on this would be really appreciated.

Regards

David Rowley

Attachment

pgsql-hackers by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: Add a filed to PageHeaderData
Next
From: Heikki Linnakangas
Date:
Subject: Re: A question about code in DefineRelation()