David Rysdam <drysdam@ll.mit.edu> writes:
> Sometimes I want to know if I have any orphans in mags, so I do a query
> like this:
> select signum from lp.Mags where signum is not null and signum not
> in (select lp.Sigs.signum from lp.Sigs)
> (I do this as a subquery because we originally had a old Sybase DB where
> outer joins were a non-standard pain and this way works the same and is
> DBMS-agnostic.)
DBMS-agnostic frequently means "sucks equally badly on all DBMSes" :-(
This query is hard to optimize because of the weird behavior of NOT IN
when nulls are involved. Since you aren't complaining that the query
fails entirely, I'm supposing that lp.Sigs.signum contains no nulls,
but the planner doesn't know that. If you can transform it to a NOT
EXISTS, you'll likely get a much better plan:
select signum from lp.Mags where signum is not null and
not exists (select 1 from lp.Sigs where lp.Sigs.signum = lp.Mags.signum)
What you want is an "anti join" plan, or at least a plan that mentions
a "hashed subplan". Plain subplans are death performance-wise, because
they amount to being nestloop joins rather than anything smarter. (In
this case it's likely not choosing a hashed subplan because work_mem is
too small to allow that.)
> I also decided to try doing the query a different way:
> select lp.mags.signum from lp.mags left join lp.sigs on
> lp.mags.signum = lp.sigs.signum where lp.mags.signum is not null
> and lp.sigs.signum is null;
That's another way to get an anti-join (at least on recent PGs, I forget
if 9.0 recognizes it).
> This one runs fast for both of us. So I guess my second question is: why
> can't the query planner tell these are the same query?
They aren't. See comment about behavior with NULLs.
regards, tom lane