Thread: Improving worst-case merge join performance with often-null foreign key

Improving worst-case merge join performance with often-null foreign key

From

Steinar Kaldager

Date:

21 April 2023, 19:13:33

Hi,

First-time potential contributor here. We recently had an incident due
to a sudden 1000x slowdown of a Postgres query (from ~10ms to ~10s)
due to a join with a foreign key that was often null. We found that it
was caused by a merge join with an index scan on one join path --
whenever the non-null data happened to be such that the merge join
couldn't be terminated early, the index would proceed to scan all of
the null rows and filter each one out individually. Since this was an
inner join, this was pointless; the nulls would never have matched the
join clause anyway.

Test script to reproduce + example explain output:
https://paste.depesz.com/s/VUj

Once you're aware of it for a given index, this is a solvable issue.
We solved it by adding a partial index. Including an IS NOT NULL
clause in the query also seems to solve it.

However, I've gotten the notion that it should be possible to solve
this on the Postgres side, with no user action required. When doing an
inner-join merge join with a normal equality operator, we should
already "know" that we don't have to look at rows where the join keys
are null, since any such comparison involving NULL will not return
true. If we can just push that knowledge down to the index scan node,
then we should be able to avoid this entire problem, leaving one less
performance trap to trip people up.

Proof-of-concept patch (please ignore style):
https://github.com/steinarvk-oda/postgres/pull/1/files

I have a few questions:

(1) Does this approach seem reasonable and worthwhile?
(2) Can I determine programmatically at query planning time whether
the binary operator in an OpExpr has the property that all comparisons
involving nulls will be non-true? Or, failing that, can I at least
somehow hardcode and identify the built-in equality operators (which
have this property)? (Integer equality and UUID equality probably
covers a lot when it comes to joins.)
(3) Is there a systematic way to check whether adding an "IS NOT NULL"
condition would be redundant? E.g. if there is such a check in the
query already, adding another is just messy.
(4) Should this sort of thing be made conditional on a high null_frac?
Or is that unnecessary complexity, and the simplicity of just always
doing it would be better?

Thanks!
Steinar Kaldager

Re: Improving worst-case merge join performance with often-null foreign key

From

Tom Lane

Date:

22 April 2023, 18:21:12

Steinar Kaldager <steinar.kaldager@oda.com> writes:
> First-time potential contributor here. We recently had an incident due
> to a sudden 1000x slowdown of a Postgres query (from ~10ms to ~10s)
> due to a join with a foreign key that was often null. We found that it
> was caused by a merge join with an index scan on one join path --
> whenever the non-null data happened to be such that the merge join
> couldn't be terminated early, the index would proceed to scan all of
> the null rows and filter each one out individually. Since this was an
> inner join, this was pointless; the nulls would never have matched the
> join clause anyway.

Hmm.  I don't entirely understand why the existing stop-at-nulls logic
in nodeMergejoin.c didn't fix this for you.  Maybe somebody has broken
that?  See the commentary for MJEvalOuterValues/MJEvalInnerValues.

Pushing down an IS NOT NULL restriction could possibly be of value
if the join is being done in the nulls-first direction, but that's
an extreme minority use-case.  I'm dubious that it'd be worth the
overhead in general.  It'd probably be more useful to make sure that
the planner's cost model is aware of this effect, so that it's prodded
to use nulls-last not nulls-first sort order when there are enough
nulls to make a difference.

            regards, tom lane

Re: Improving worst-case merge join performance with often-null foreign key

From

Richard Guo

Date:

23 April 2023, 12:29:59

On Sat, Apr 22, 2023 at 11:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Steinar Kaldager <steinar.kaldager@oda.com> writes:
> First-time potential contributor here. We recently had an incident due
> to a sudden 1000x slowdown of a Postgres query (from ~10ms to ~10s)
> due to a join with a foreign key that was often null. We found that it
> was caused by a merge join with an index scan on one join path --
> whenever the non-null data happened to be such that the merge join
> couldn't be terminated early, the index would proceed to scan all of
> the null rows and filter each one out individually. Since this was an
> inner join, this was pointless; the nulls would never have matched the
> join clause anyway.

Hmm. I don't entirely understand why the existing stop-at-nulls logic
in nodeMergejoin.c didn't fix this for you. Maybe somebody has broken
that? See the commentary for MJEvalOuterValues/MJEvalInnerValues.

I think it's just because the MergeJoin didn't see a NULL foo_id value
from test_bar tuples because all such tuples are removed by the filter
'test_bar.active', thus it does not have a chance to stop at nulls.

# select count(*) from test_bar where foo_id is null and active;
count
-------
0
(1 row)

Instead, the index scan on test_bar will have to scan all the tuples
with NULL foo_id because none of them satisfies the qual clause.

Thanks
Richard

Re: Improving worst-case merge join performance with often-null foreign key

From

Richard Guo

Date:

24 April 2023, 10:24:14

On Sun, Apr 23, 2023 at 5:29 PM Richard Guo <guofenglinux@gmail.com> wrote:

On Sat, Apr 22, 2023 at 11:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Steinar Kaldager <steinar.kaldager@oda.com> writes:
> First-time potential contributor here. We recently had an incident due
> to a sudden 1000x slowdown of a Postgres query (from ~10ms to ~10s)
> due to a join with a foreign key that was often null. We found that it
> was caused by a merge join with an index scan on one join path --
> whenever the non-null data happened to be such that the merge join
> couldn't be terminated early, the index would proceed to scan all of
> the null rows and filter each one out individually. Since this was an
> inner join, this was pointless; the nulls would never have matched the
> join clause anyway.

Hmm. I don't entirely understand why the existing stop-at-nulls logic
in nodeMergejoin.c didn't fix this for you. Maybe somebody has broken
that? See the commentary for MJEvalOuterValues/MJEvalInnerValues.

I think it's just because the MergeJoin didn't see a NULL foo_id value
from test_bar tuples because all such tuples are removed by the filter
'test_bar.active', thus it does not have a chance to stop at nulls.

# select count(*) from test_bar where foo_id is null and active;
count
-------
0
(1 row)

Instead, the index scan on test_bar will have to scan all the tuples
with NULL foo_id because none of them satisfies the qual clause.

BTW, in Steinar's case the query runs much faster with nestloop with a
parameterized inner path, since test_foo is small and test_bar is very
large, and there is an index on test_bar.foo_id. You can see that by
"set enable_mergejoin to off":

# EXPLAIN (costs off)
SELECT SUM(test_foo.id) FROM test_bar, test_foo WHERE test_bar.foo_id = test_foo.id AND test_foo.active AND test_bar.active;
QUERY PLAN
--------------------------------------------------------------
Aggregate
-> Nested Loop
-> Seq Scan on test_foo
Filter: active
-> Index Scan using test_bar_foo_id_idx on test_bar
Index Cond: (foo_id = test_foo.id)
Filter: active
(7 rows)

In my box the total cost and execution time of mergejoin vs nestloop
are:

mergejoin nestloop

Cost estimate 1458.40 12355.15
Actual (best of 3) 3644.685 ms 13.114 ms

So it seems we have holes in cost estimate for mergejoin or nestloop
with parameterized inner path, or both.

Thanks
Richard

Re: Improving worst-case merge join performance with often-null foreign key

From

Steinar Kaldager

Date:

25 April 2023, 15:51:00

On Sun, Apr 23, 2023 at 11:30 AM Richard Guo <guofenglinux@gmail.com> wrote:
> On Sat, Apr 22, 2023 at 11:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hmm.  I don't entirely understand why the existing stop-at-nulls logic
>> in nodeMergejoin.c didn't fix this for you.  Maybe somebody has broken
>> that?  See the commentary for MJEvalOuterValues/MJEvalInnerValues.
>
>
> I think it's just because the MergeJoin didn't see a NULL foo_id value
> from test_bar tuples because all such tuples are removed by the filter
> 'test_bar.active', thus it does not have a chance to stop at nulls.

Agreed, this is also my understanding.

Note that this isn't just a contrived test case, it's also the
situation we ran into in prod. (We had a table with a lot of old
inactive rows with null values for Historical Reasons, essentially
kept for accounting/archival purposes. Newer, active, rows all had the
foreign key column set to non-null.)

I had initially considered whether this could be fixed in the
merge-join execution code instead of by altering the plan, but at
first glance that feels like it might be a more awkward fit. It's easy
enough to stop the merge join if a null actually appears, but because
of the filter, no null will ever appear. You'd have to somehow break
the "stream of values" abstraction and look at where the values are
actually coming from and/or which values would have appeared if they
weren't filtered out. I don't know the codebase well, but to me that
feels fairly hacky compared to altering the plan for the index scan.

Steinar