> Currently pull_up_sublinks_qual_recurse only changes the plan for NOT EXISTS > queries and leaves NOT IN alone. The reason for this is because the values > returned by a subquery in the IN clause could have NULLs.
There's a bug in targetListIsGuaranteedNotToHaveNulls, you have to drill deeper into the query to guarantee the nullability of a result column. If a table is OUTER JOINed, it can return NULLs even if the original column specification has NOT NULL.
This test case produces incorrect results with your patch:
create table a (x int not null); create table b (x int not null, y int not null); insert into a values(1); select * from a where x not in (select y from a left join b using (x));
Unpatched version correctly returns 0 rows since "y" will be NULL. Your patch returns the value 1 from a.
I'm a bit stuck on fixing this and I can't quite figure out how I should tell if the TargetEntry is coming from an outer join.
My first attempt does not work as it seems that I'm looking up the wrong RangeTblEntry with the following:
rte = rt_fetch(tlevar->varno, query->rtable);
if (IS_OUTER_JOIN(rte->jointype))
return true; /* Var from an outer join */
The jointype returns JOIN_INNER when loooking up the RangeTblEntry from the TargetEntry's varno. It seems that the RangeTblEntry that I need is stored in query->rtable, but I've just no idea how to tell which item in the list it is. So if anyone can point me in the right direction then that would be really useful.
On a more positive or even slightly exciting note I think I've managed to devise a way that ANTI JOINS can be used for NOT IN much more often. It seems that find_nonnullable_vars will analyse a quals list to find expressions that mean that the var cannot be NULL. This means we can perform ANTI JOINS for NOT IN with queries like:
SELECT * FROM a WHERE id NOT IN(SELECT nullable_col FROM b WHERE nullable_col = 1);
or
SELECT * FROM a WHERE id NOT IN(SELECT nullable_col FROM b WHERE nullable_col IS NOT NULL);
(The attached patch implements this)
the nullable_col =1 will mean that nullable_col cannot be NULL, so the ANTI JOIN can be performed safely. I think this combined with the NOT NULL check will cover probably just about all valid uses of NOT IN with a subquery... unless of course I've assumed something wrongly about find_nonnullable_vars. I just need the correct RangeTblEntry in order to determine if the TargetEntry is from an out join.
The attached patch is a broken implemention that still needs the lookup code fixed to reference the correct RTE. The failing regression tests show where the problems lie.