mergejoin null handling (was Re: [PERFORM] merge join killing performance) - Mailing list pgsql-hackers

From Tom Lane
Subject mergejoin null handling (was Re: [PERFORM] merge join killing performance)
Date
Msg-id 28813.1274818009@sss.pgh.pa.us
Whole thread Raw
List pgsql-hackers
Scott Marlowe <scott.marlowe@gmail.com> writes:
> So, Tom, so you think it's possible that the planner isn't noticing
> all those nulls and thinks it'll just take a row or two to get to the
> value it needs to join on?

I dug through this and have concluded that it's really an oversight in
the patch I wrote some years ago in response to this:
http://archives.postgresql.org/pgsql-performance/2005-05/msg00219.php

That patch taught nodeMergejoin that a row containing a NULL key can't
possibly match anything on the other side.  However, its response to
observing a NULL is just to advance to the next row of that input.
What we should do, if the NULL is in the first merge column and the sort
order is nulls-high, is realize that every following row in that input
must also contain a NULL and so we can just terminate the mergejoin
immediately.  The original patch works well for cases where there are
just a few nulls in one input and the important factor is to not read
all the rest of the other input --- but it fails to cover the case where
there are many nulls and the important factor is to not read all the
rest of the nulls.  The problem can be demonstrated if you modify the
example given in the above-referenced message so that table t1 contains
lots of nulls rather than just a few: explain analyze will show that
all of t1 gets read by the mergejoin, and that's not necessary.

I'm inclined to think this is a performance bug and should be
back-patched, assuming the fix is simple (which I think it is, but
haven't coded/tested yet).  It'd probably be reasonable to go back to
8.3; before that, sorting nulls high versus nulls low was pretty poorly
defined and so there'd be risk of breaking cases that gave the right
answers before.

Comments?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Jan Wieck
Date:
Subject: Re: Exposing the Xact commit order to the user
Next
From: Joseph Adams
Date:
Subject: Re: Fwd: Hiding data in postgresql