Re: An incorrect check in get_memoize_path - Mailing list pgsql-hackers

From wenhui qiu
Subject Re: An incorrect check in get_memoize_path
Date
Msg-id CAGjGUAL5cdFyrXnRO_DiRPeUuGiO9vd5riMRuGCVQjUPue0g6Q@mail.gmail.com
Whole thread Raw
In response to An incorrect check in get_memoize_path  (Richard Guo <guofenglinux@gmail.com>)
List pgsql-hackers
Hi Richard
Thank you work on this

- * - * XXX this could be enabled if the remaining join quals were made part of - * the inner scan's filter instead of the join filter. Maybe it's worth - * considering doing that? */ - if (extra->inner_unique && - (inner_path->param_info == NULL || - bms_num_members(inner_path->param_info->ppi_serials) < - list_length(extra->restrictlist))) - return NULL; + if (extra->inner_unique) + { + Bitmapset *ppi_serials; + + if (inner_path->param_info == NULL) + return NULL; + + ppi_serials = inner_path->param_info->ppi_serials; + + foreach_node(RestrictInfo, rinfo, extra->restrictlist) + { + if (!bms_is_member(rinfo->rinfo_serial, ppi_serials)) + return NULL; + } + } The refined version introduces a more precise validation by:
1. Explicitly verifying each restriction – Instead of relying on a count comparison, it checks whether every restriction's serial number is included in the inner path’s parameter info (ppi_serials).
2. Reducing false negatives – The optimizer now only discards a path if a restriction is definitively unhandled, allowing more valid plans to be considered.


On Mon, Apr 7, 2025 at 3:50 PM Richard Guo <guofenglinux@gmail.com> wrote:
While reviewing another patch [1], I noticed an incorrect check in
get_memoize_path.

    if (extra->inner_unique &&
        (inner_path->param_info == NULL ||
         bms_num_members(inner_path->param_info->ppi_serials) <
         list_length(extra->restrictlist)))
        return NULL;

This check is to ensure that, for unique joins, all the restriction
clauses are parameterized to enable the use of Memoize nodes.
However, ppi_clauses includes join clauses available from all outer
rels, not just the current outer rel, while extra->restrictlist only
includes the restriction clauses for the current join.  This means the
check could pass even if a restriction clause isn't parameterized, as
long as another join clause, which doesn't belong to the current join,
is included in ppi_clauses.  It's not very hard to construct a query
that exposes this issue.

create table t (a int, b int);

explain (costs off)
select * from t t0 left join
  (t t1 left join t t2 on t1.a = t2.a
   join lateral
     (select distinct on (a, b) a, b, t0.a+t1.a+t2.a as x from t t3 offset 0) t3
         on t1.a = t3.a and coalesce(t2.b) = t3.b)
  on t0.b = t3.b;
                       QUERY PLAN
---------------------------------------------------------
 Nested Loop Left Join
   ->  Seq Scan on t t0
   ->  Nested Loop
         Join Filter: (COALESCE(t2.b) = t3.b)
         ->  Hash Left Join
               Hash Cond: (t1.a = t2.a)
               ->  Seq Scan on t t1
               ->  Hash
                     ->  Seq Scan on t t2
         ->  Subquery Scan on t3
               Filter: ((t0.b = t3.b) AND (t1.a = t3.a))
               ->  Unique
                     ->  Sort
                           Sort Key: t3_1.a, t3_1.b
                           ->  Seq Scan on t t3_1
(15 rows)

Consider the join to t3.  It is a unique join, and not all of its
restriction clauses are parameterized.  Despite this, the check still
passes.

Interestingly, although the check passes, no Memoize nodes are added
on top of t3's paths because paraminfo_get_equal_hashops insists that
all ppi_clauses must satisfy clause_sides_match_join (though I
actually question whether this is necessary, but that's a separate
issue).  However, this does not justify the check being correct.

Hence, I propose the attached patch for the fix.

BTW, there is a XXX comment there saying that maybe we can make the
remaining join quals part of the inner scan's filter instead of the
join filter.  I don't think this is possible in all cases.  In the
above query, 'coalesce(t2.b) = t3.b' cannot be made part of t3's scan
filter, according to join_clause_is_movable_into.  So I removed that
comment in the patch while we're here.

Any thoughts?

[1] https://postgr.es/m/60bf8d26-7c7e-4915-b544-afdb9020011d@gmail.com

Thanks
Richard

pgsql-hackers by date:

Previous
From: Bernd Helmle
Date:
Subject: Re: Modern SHA2- based password hashes for pgcrypto
Next
From: Jim Jones
Date:
Subject: Re: [PATCH] Add CANONICAL option to xmlserialize