Re: An incorrect check in get_memoize_path - Mailing list pgsql-hackers
From | wenhui qiu |
---|---|
Subject | Re: An incorrect check in get_memoize_path |
Date | |
Msg-id | CAGjGUAL5cdFyrXnRO_DiRPeUuGiO9vd5riMRuGCVQjUPue0g6Q@mail.gmail.com Whole thread Raw |
In response to | An incorrect check in get_memoize_path (Richard Guo <guofenglinux@gmail.com>) |
List | pgsql-hackers |
- * - * XXX this could be enabled if the remaining join quals were made part of - * the inner scan's filter instead of the join filter. Maybe it's worth - * considering doing that? */ - if (extra->inner_unique && - (inner_path->param_info == NULL || - bms_num_members(inner_path->param_info->ppi_serials) < - list_length(extra->restrictlist))) - return NULL; + if (extra->inner_unique) + { + Bitmapset *ppi_serials; + + if (inner_path->param_info == NULL) + return NULL; + + ppi_serials = inner_path->param_info->ppi_serials; + + foreach_node(RestrictInfo, rinfo, extra->restrictlist) + { + if (!bms_is_member(rinfo->rinfo_serial, ppi_serials)) + return NULL; + } + } The refined version introduces a more precise validation by:
1. Explicitly verifying each restriction – Instead of relying on a count comparison, it checks whether every restriction's serial number is included in the inner path’s parameter info (ppi_serials).
2. Reducing false negatives – The optimizer now only discards a path if a restriction is definitively unhandled, allowing more valid plans to be considered.
While reviewing another patch [1], I noticed an incorrect check in
get_memoize_path.
if (extra->inner_unique &&
(inner_path->param_info == NULL ||
bms_num_members(inner_path->param_info->ppi_serials) <
list_length(extra->restrictlist)))
return NULL;
This check is to ensure that, for unique joins, all the restriction
clauses are parameterized to enable the use of Memoize nodes.
However, ppi_clauses includes join clauses available from all outer
rels, not just the current outer rel, while extra->restrictlist only
includes the restriction clauses for the current join. This means the
check could pass even if a restriction clause isn't parameterized, as
long as another join clause, which doesn't belong to the current join,
is included in ppi_clauses. It's not very hard to construct a query
that exposes this issue.
create table t (a int, b int);
explain (costs off)
select * from t t0 left join
(t t1 left join t t2 on t1.a = t2.a
join lateral
(select distinct on (a, b) a, b, t0.a+t1.a+t2.a as x from t t3 offset 0) t3
on t1.a = t3.a and coalesce(t2.b) = t3.b)
on t0.b = t3.b;
QUERY PLAN
---------------------------------------------------------
Nested Loop Left Join
-> Seq Scan on t t0
-> Nested Loop
Join Filter: (COALESCE(t2.b) = t3.b)
-> Hash Left Join
Hash Cond: (t1.a = t2.a)
-> Seq Scan on t t1
-> Hash
-> Seq Scan on t t2
-> Subquery Scan on t3
Filter: ((t0.b = t3.b) AND (t1.a = t3.a))
-> Unique
-> Sort
Sort Key: t3_1.a, t3_1.b
-> Seq Scan on t t3_1
(15 rows)
Consider the join to t3. It is a unique join, and not all of its
restriction clauses are parameterized. Despite this, the check still
passes.
Interestingly, although the check passes, no Memoize nodes are added
on top of t3's paths because paraminfo_get_equal_hashops insists that
all ppi_clauses must satisfy clause_sides_match_join (though I
actually question whether this is necessary, but that's a separate
issue). However, this does not justify the check being correct.
Hence, I propose the attached patch for the fix.
BTW, there is a XXX comment there saying that maybe we can make the
remaining join quals part of the inner scan's filter instead of the
join filter. I don't think this is possible in all cases. In the
above query, 'coalesce(t2.b) = t3.b' cannot be made part of t3's scan
filter, according to join_clause_is_movable_into. So I removed that
comment in the patch while we're here.
Any thoughts?
[1] https://postgr.es/m/60bf8d26-7c7e-4915-b544-afdb9020011d@gmail.com
Thanks
Richard
pgsql-hackers by date: