Re: Row pattern recognition - Mailing list pgsql-hackers

From Henson Choi
Subject Re: Row pattern recognition
Date
Msg-id CAAAe_zBbrnx2fjK2s+Jgx6TSOdnKAPawXbHeX49WqmX9ji+Hdg@mail.gmail.com
Whole thread Raw
In response to Re: Row pattern recognition  (Tatsuo Ishii <ishii@postgresql.org>)
Responses Re: Row pattern recognition
List pgsql-hackers
Hi Tatsuo,

Sorry for sending these before your next revision of v45 --
I'll be busy with a project proposal for the next couple of
weeks, so I wanted to share them before that.

Here are the patches on top of v45, incorporating the fixes and
improvements I mentioned earlier, plus additional changes from a
thorough code review. There are quite a few patches since I went
ahead without prior discussion -- please review each one individually
and feel free to drop any that you disagree with.

Here's a summary of each patch (to be applied on top of v45):

  0001: Fix mergeGroupPrefixSuffix() max increment
        When absorbing prefix/suffix, child->max was not incremented,
        causing incorrect quantifier bounds. (reported by Zsolt)

  0002: Fix RPR error codes and GROUPS typo
        Use ERRCODE_WINDOWING_ERROR instead of ERRCODE_SYNTAX_ERROR
        for semantic window-frame violations. Fix "GROUP" typo to
        "GROUPS" in frame validation error message. (reported by Zsolt)

  0003: Add check_stack_depth() and CHECK_FOR_INTERRUPTS()
        Add stack depth protection to recursive pattern optimization
        and interrupt checks to NFA engine loops.

  0004: Fix window_last_value set_mark during RPR
        Restore set_mark=true in window_last_value (normal behavior)
        and add an RPR-specific override inside WinGetFuncArgInFrame.
        This keeps the RPR workaround localized rather than changing
        the caller's semantics unconditionally.

  0005: Fix row_is_in_reduced_frame in WINDOW_SEEK_TAIL
        Pass frameheadpos directly to row_is_in_reduced_frame
        instead of frameheadpos+relpos. Currently only last_value()
        calls this path with relpos=0 so no actual bug, but the
        old expression would be incorrect for negative relpos.
        Also add a bounds check for future callers.

  0006: Clarify ST_NONE intent
        Add comment explaining ST_NONE = 0 is the default for
        non-RPR windows.

  0007: Clarify inverse transition optimization comment
        Document why RPR disables inverse transition: the reduced
        frame changes row by row.

  0008: Reject unused DEFINE variables
        I know you preferred silently ignoring unused DEFINE
        variables [1], and I agree the standard doesn't mandate
        an error. However, if we later add qualified column
        references (e.g. B AS A.price > 100), B's expression
        depends on A being present. If A is not used in PATTERN
        and the planner silently removes it, B's reference to A
        becomes dangling. I worry that silently allowing this now
        could create forward-compatibility problems once qualified
        references are introduced. For that reason, I'm inclined
        to think rejecting them now may be safer than changing
        behavior later, which would be a user-visible compatibility
        break. This is also consistent with Oracle's behavior
        (ORA-62503), as SungJun reported, and it helps catch user
        typos in DEFINE variable names at parse time.

  0009: Clarify RPR documentation in advanced.sgml
        Improve absorption explanation and clarify non-match row
        aggregation behavior with concrete examples.

  0010: Fix typos in RPR comments and parser README

  0011: Clarify excludeLocation and empty quantifier in gram.y
        Add comments explaining the conditional location assignment
        pattern and the empty quantifier rule.

  0012: Clarify RPR_VARID_MAX definition
        Document varId range 0-250 and reserved control element
        values 252+.

  0013: Move local variables to function scope
        In row_is_in_reduced_frame, move declarations out of
        switch case blocks.

  0014: Reset reduced_frame_map pointer in release_partition
        Set reduced_frame_map = NULL and alloc_sz = 0 to prevent
        dangling pointer after partition context reset.

  0015: Remove redundant list manipulation in nfa_add_matched_state
        Simplify doubly-linked list operation that was duplicated
        by the subsequent ExecRPRFreeContext() call.

  experimental: Implement 1-slot PREV/NEXT navigation
        The slot-based PREV/NEXT approach I mentioned earlier.
        This no longer needs attno_map, so the plan cache mutation
        issue is gone. PREV/NEXT now also accepts an optional
        second argument for offset, e.g. PREV(price, 2).

        Note: all existing regression tests pass with this patch,
        but my understanding of TupleSlot internals and JIT is still
        shallow -- the implementation largely copies surrounding
        patterns, so there may be issues I haven't caught yet. I'd
        like to keep this as a separate patch on top of v46 until it
        is trustworthy enough, rather than folding it into the main
        series. I'll continue reviewing and hardening it.

One design decision regarding PREV/NEXT: the SQL standard
requires that the first argument of PREV/NEXT contain at least
one column reference [2] -- e.g. PREV(1) is a syntax error.
Beyond the standard's rationale (no starting row for offsetting),
there is a practical reason: when the target row falls outside
the frame, PREV should return NULL, but a constant expression
like PREV(42) would still evaluate to 42 since it never reads
from the slot. I think it's correct to follow the standard and
reject PREV/NEXT without a column reference. What do you think?

[1] https://www.postgresql.org/message-id/20260305.142049.1864331791480656300.ishii%40postgresql.org
[2] ISO/IEC 19075-5, Subclause 5.6.2

Best regards,
Henson
Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: SQL Property Graph Queries (SQL/PGQ)
Next
From: Álvaro Herrera
Date:
Subject: Re: Race conditions in logical decoding