Re: Allowing NOT IN to use ANTI joins - Mailing list pgsql-hackers

From David Rowley
Subject Re: Allowing NOT IN to use ANTI joins
Date
Msg-id CAApHDvo2Li6ZxEvL5H-MgzJMp9wN71jK3x-133xgALYA8g-4cA@mail.gmail.com
Whole thread Raw
In response to Re: Allowing NOT IN to use ANTI joins  (Marti Raudsepp <marti@juffo.org>)
Responses Re: Allowing NOT IN to use ANTI joins  (Marti Raudsepp <marti@juffo.org>)
List pgsql-hackers
On Mon, Jun 9, 2014 at 11:20 PM, Marti Raudsepp <marti@juffo.org> wrote:
On Sun, Jun 8, 2014 at 3:36 PM, David Rowley <dgrowleyml@gmail.com> wrote:
> Currently pull_up_sublinks_qual_recurse only changes the plan for NOT EXISTS
> queries and leaves NOT IN alone. The reason for this is because the values
> returned by a subquery in the IN clause could have NULLs.

I believe the reason why this hasn't been done yet, is that the plan
becomes invalid when another backend modifies the nullability of the
column. To get it to replan, you'd have to introduce a dependency on
the "NOT NULL" constraint, but it's impossible for now because there's
no pg_constraint entry for NOT NULLs.

The only way to consistently guarantee nullability is through primary
key constraints. Fortunately that addresses most of the use cases of
NOT IN(), in my experience.


I tried to break this by putting a break point in convert_ANY_sublink_to_join in session 1. Not that it really had to be in that function, I just wanted it to stop during planning and before the plan is executed.

-- session 1
select * from n1 where id not in(select id from n1); -- hits breakpoint in convert_ANY_sublink_to_join 

-- session 2
alter table n2 alter column id drop not null;

insert into n2 values(null);

I see that session 2 blocks in the alter table until session 1 completes.

I've not really checked out the code in detail around when the snapshot is taken and the transaction ID is generated, but as long as the transaction id is taken before we start planning in session 1 then it should not matter if another session drops the constraint and inserts a NULL value as we won't see that NULL value in our transaction... I'd assume that the transaction has to start before it grabs the table defs that are required for planning. Or have I got something wrong?

 
See the comment in check_functional_grouping:

 * Currently we only check to see if the rel has a primary key that is a
 * subset of the grouping_columns.  We could also use plain unique constraints
 * if all their columns are known not null, but there's a problem: we need
 * to be able to represent the not-null-ness as part of the constraints added
 * to *constraintDeps.  FIXME whenever not-null constraints get represented
 * in pg_constraint.


I saw that, but I have to say I've not fully got my head around why that's needed just yet.
 
The behavior you want seems somewhat similar to
check_functional_grouping; maybe you could unify it with your
targetListIsGuaranteedNotToHaveNulls at some level. (PS: that's one
ugly function name :)


Agreed :)  Originally I had put the code that does that in convert_ANY_sublink_to_join, but at the last minute before posting the patch I decided that it might be useful and reusable so moved it out to that function. I'll try and think of something better, but I'm open to ideas.

Regards

David Rowley
 

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [bug fix] Memory leak in dblink
Next
From: David Rowley
Date:
Subject: Re: Allowing NOT IN to use ANTI joins