On Sun, 28 Sep 2003, Tom Lane wrote:
> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> > On Sat, 27 Sep 2003, Tom Lane wrote:
> >> I thought of what seems to be a better design for the check query: use
> >> a LEFT JOIN and check for NULL in the righthand joined column.
>
> > Hmm, my initial testing showed that it really was a little slower
> > than a more complicated one with NOT EXISTS so I'd abandoned it. How does
> > it fare for you compared to:
> > select f1, f2 from fk where not exists (select 1 from pk where pk.f1=fk.f1
> > and pk.f2=pk.f2) where fk.f1 is not null and fk.f2 is not null;
>
> Were you testing against 7.3 or 7.4? On what kinds of tables?
>
> In 7.4 I think that the JOIN would yield as good or better a plan. The
> best possible plan for the NOT EXISTS query is effectively a nestloop
> with inner indexscan, which is great if the FK table is small and the
> PK table is large, but it sucks otherwise. The planner should choose a
> plan of this form for the LEFT JOIN given that combination of table
> sizes, and so there shouldn't be any great difference in runtime in that
> case. But in other combinations, such as large FK and small PK, other
> plan types will beat the pants off nestloop.
As an update, so far I still am getting better results with NOT EXISTS
than the left join. For a 50m row fk, 10k row pk where the rows are just
the keys, I'm getting a plan like
Merge JoinIndex scan on pktableSort Seqscan on fktable
which is taking about 2-4 times longer for me than the not exists
depending on sort_mem (at 4096,64000,128000).
When I lowered random_page_cost to 1, I got an indexscan on fktable, but
that hadn't seemed to finish after about 2 hours (as opposed to about
30-35 minutes for the not exists and about 30 min - 1 1/2 hours for the
sort+seqscan version.
I want to do some more tests where there's extraneous data in both tables
and see what that does to the results.