Re: Improving worst-case merge join performance with often-null foreign key - Mailing list pgsql-hackers

From Steinar Kaldager
Subject Re: Improving worst-case merge join performance with often-null foreign key
Date
Msg-id CANcDffeUuU5OezYPMPb4ypYFpxxWs5pgVwGGEAEe4hD1+1PP1g@mail.gmail.com
Whole thread Raw
In response to Re: Improving worst-case merge join performance with often-null foreign key  (Richard Guo <guofenglinux@gmail.com>)
List pgsql-hackers
On Sun, Apr 23, 2023 at 11:30 AM Richard Guo <guofenglinux@gmail.com> wrote:
> On Sat, Apr 22, 2023 at 11:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hmm.  I don't entirely understand why the existing stop-at-nulls logic
>> in nodeMergejoin.c didn't fix this for you.  Maybe somebody has broken
>> that?  See the commentary for MJEvalOuterValues/MJEvalInnerValues.
>
>
> I think it's just because the MergeJoin didn't see a NULL foo_id value
> from test_bar tuples because all such tuples are removed by the filter
> 'test_bar.active', thus it does not have a chance to stop at nulls.

Agreed, this is also my understanding.

Note that this isn't just a contrived test case, it's also the
situation we ran into in prod. (We had a table with a lot of old
inactive rows with null values for Historical Reasons, essentially
kept for accounting/archival purposes. Newer, active, rows all had the
foreign key column set to non-null.)

I had initially considered whether this could be fixed in the
merge-join execution code instead of by altering the plan, but at
first glance that feels like it might be a more awkward fit. It's easy
enough to stop the merge join if a null actually appears, but because
of the filter, no null will ever appear. You'd have to somehow break
the "stream of values" abstraction and look at where the values are
actually coming from and/or which values would have appeared if they
weren't filtered out. I don't know the codebase well, but to me that
feels fairly hacky compared to altering the plan for the index scan.

Steinar



pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Should vacuum process config file reload more often
Next
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: [PoC] pg_upgrade: allow to upgrade publisher node