Re: NOT IN subquery optimization - Mailing list pgsql-hackers

From David Rowley
Subject Re: NOT IN subquery optimization
Date
Msg-id CAKJS1f9DfW0PFYzf1hw_PzkkbEVDhzqDg1xJkgSyBd+0T79jHg@mail.gmail.com
Whole thread Raw
In response to Re: NOT IN subquery optimization  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: NOT IN subquery optimization
List pgsql-hackers
On Sat, 2 Mar 2019 at 12:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> "Li, Zheng" <zhelli@amazon.com> writes:
> > Although adding "or var is NULL" to the anti join condition forces the planner to choose nested loop anti join, it
isalways faster compared to the original plan.
 
>
> TBH, I am *really* skeptical of sweeping claims like that.  The existing
> code will typically produce a hashed-subplan plan, which ought not be
> that awful as long as the subquery result doesn't blow out memory.
> It certainly is going to beat a naive nested loop.

It's pretty easy to show the claim is false using master and NOT EXISTS.

create table small(a int not null);
create table big (a int not null);
insert into small select generate_Series(1,1000);
insert into big select x%1000+1 from generate_Series(1,1000000) x;

select count(*) from big b where not exists(select 1 from small s
where s.a = b.a);
Time: 178.575 ms

select count(*) from big b where not exists(select 1 from small s
where s.a = b.a or s.a is null);
Time: 38049.969 ms (00:38.050)


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: NOT IN subquery optimization
Next
From: Andrew Dunstan
Date:
Subject: Re: [HACKERS] Incomplete startup packet errors