Re: Row pattern recognition - Mailing list pgsql-hackers

From Henson Choi
Subject Re: Row pattern recognition
Date
Msg-id CAAAe_zBHrBBM2KYKJYSvP=vr=6fv7kFDr8qZZWFd7==sb4VMxg@mail.gmail.com
Whole thread
In response to Re: Row pattern recognition  (Tatsuo Ishii <ishii@postgresql.org>)
List pgsql-hackers
 Hi Tatsuo,

Thank you for the careful review and for confirming the build on
your side.

Ok, I will not rebase current v46 and continue to apply your
incremental patches and review them, until we agree to ship v47.  I
may see conflicts while creating v47 patch sets. I will ask help if if
your assistance needed. Thanks in advance.

Understood.  Please let me know whenever a conflict comes up while
assembling v47 and I will help resolve it on my side.
 
I confirmed 0001-0008 applied cleanly and see no compile
warning. Regression test passed, no crash.

Thank you for confirming.
 
> Regarding the README.rpr suggestion from the 0008 review: the
> documentation in execRPR.c has dependencies spread across the patch
> series, so separating it mid-review would be disruptive. I plan to
> split it out as part of the final patch list once all 31 patches have
> been reviewed.

Ok.

Thank you.  I will prepare the README.rpr split as a follow-up patch
after the current 0031 once the series review is complete.

Looking forward to seeing revised incremental patch sets.

Attached is the revised v47 series, with the review comments on
0006, 0007, and 0008 incorporated.  Numbering and subjects are
unchanged from the previous send.

Since the edits in 0006-0008 propagate into the later patches --
mostly as context-line shifts, and in 0017/0023 as updates to the
rpr_integration expected output -- I have attached the full
0001-0031 set rather than only the updated patches.  Applying this
set on top of v46 should give the same final tree as before, just
with the review adjustments folded in.

Summary of changes from the previous v47:

  - 0006 (Fix DEFINE expression handling): per your comment, the
    first occurrence of "RPR" now carries the "(Row Pattern
    Recognition)" expansion and the later occurrence is the bare
    abbreviation.

  - 0007 (Add RPR planner integration tests): overall
    strengthening of the tests and their comments.  Each block
    now carries an intent comment describing what the block is
    trying to cover and what the expected plan or result is,
    the "Non-RPR / RPR" comment pair is reworded as complete
    sentences, and B8 is reworked so that the EXPLAIN plan
    actually exercises Incremental Sort.

    Because some blocks cover features that are only
    introduced later in the series, a few tests in 0007
    temporarily produce error output.  Concretely, B4
    (RPR + prepared statements) uses the 2-argument form
    PREV(val, $1), which is not yet available at the 0007
    point: it is introduced in 0017 (1-slot PREV/NEXT
    navigation), and the accompanying "Nav Mark Lookback"
    EXPLAIN property referenced in the B4 comments is added
    in 0023.  The rpr_integration expected-output file is
    updated by 0017 (to clear the PREV/prepared-statement
    errors) and then again by 0023 (to add the
    "Nav Mark Lookback" lines to the EXPLAIN output), so that
    by the end of the series the output is clean.

    XXX notes have been added in three places where a test
    records behaviour that may be revisited later:

      (1) The subquery-output test documents why the column
          guard in allpaths.c has to be conservative: a column
          referenced only by DEFINE is indistinguishable from
          an exposed column at the targetlist level, so
          remove_unused_subquery_outputs() cannot apply its
          NULL-replacement selectively without risking DEFINE
          evaluation on NULL inputs.  The XXX flags this as a
          candidate for a precise follow-up optimization.

      (2) The B7 Recursive CTE test carries an XXX noting that
          it is unclear whether this case falls under the
          ISO/IEC 9075-2 4.18.5 / 6.17.5 prohibition, and even
          if it does not, whether a query that does trigger
          the prohibition can be constructed at all.  The
          disposition is left to the community.

      (3) The B9 volatile-in-DEFINE test records current
          behaviour using "random() >= 0.0" (structurally
          equivalent to TRUE, so the expected output stays
          deterministic).  The XXX flags that if we decide to
          reject volatile functions in DEFINE, this test must
          be converted into an error-case test.  See the
          dedicated section at the end of this mail.

  - 0008 (Replace reduced frame map with single match result):
      * Comment block X-1 now uses the full field names
        rpr_match_valid / rpr_match_matched / rpr_match_length.
      * get_reduced_frame_status() sentence rephrased in the
        passive voice, as you suggested:
          "Row's status against the current match result can be
           obtained by calling get_reduced_frame_status()."
      * README.rpr split deferred per the above.

  - 0009-0031: no semantic code changes.  Most patches only
    carry context-line shifts from the edits above.  The two
    exceptions are 0017 and 0023, whose rpr_integration.out
    hunks are updated to match the revised B4 test in 0007
    (clearing the temporary errors and adding the
    "Nav Mark Lookback" lines, respectively).

On volatile functions in DEFINE (your 2026-04-13 mail): the SQL
standard itself is relatively narrow here -- the only explicit
restriction it places is that a navigation function's offset must
be a runtime constant.  Volatile expressions in the DEFINE
predicate are not forbidden by the standard.

That said, once we move into the pattern-matching machinery, the
observable semantics of a volatile DEFINE become quite murky.
Depending on the situation, the DEFINE predicate may be evaluated
once per row, or separately per active matching context, or some
combination of both as the NFA explores alternatives.  A user
writing "random() > 0.5" in DEFINE would have a hard time
predicting, let alone relying on, the evaluation count -- and I
have not been able to think of a realistic use case that actually
benefits from this flexibility.

Given the gap between "permitted by the standard" and "meaningful
to the user," my own leaning is toward prohibiting volatile
functions in DEFINE at transformation time, and -- if we go that
way -- also examining whether the new check would make the
existing runtime-constant check on the navigation-function offset
redundant, or whether the two could be folded into a single check
pass.  That said, this is ultimately a policy call, and I would rather defer
the final decision to your judgment as a senior member of the
community: does prohibition seem like the right direction, or
would you prefer to keep the current behaviour in place?

On sequencing, if we do take it up, I would suggest handling it
after the 31-patch set, alongside the README.rpr split as
follow-up work on top of 0031.  Whether it ultimately lands
inside v47 or as a separate piece on top does not need to be
decided right now -- there is still room to discuss it as the
review progresses, and I am happy to adjust either way based on
your direction.

Regards,
Henson
 
Attachment

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: First draft of PG 19 release notes
Next
From: "Greg Burd"
Date:
Subject: Re: [PATCH] Add tests for Bitmapset