Re: Row pattern recognition - Mailing list pgsql-hackers

From Henson Choi
Subject Re: Row pattern recognition
Date
Msg-id CAAAe_zB7rAEJtT6hXgF85=_Tj8Nti45ZHbQw26gxTF2DBs3hJw@mail.gmail.com
Whole thread
In response to Re: Row pattern recognition  (Tatsuo Ishii <ishii@postgresql.org>)
Responses Re: Row pattern recognition
Re: Row pattern recognition
List pgsql-hackers
Hi Tatsuo,

When I started this work a little over three months ago, I had
no idea it would grow into what it is today. There is still
much to refine, but I would not have come this far without
your guidance and support along the way.

This series completes RPR navigation support, including
PREV/NEXT, FIRST/LAST, and compound navigation such as
PREV(FIRST(col)).

Attached are 31 incremental patches on top of v46 (replacing
the previous 15-patch set). Here is a summary of the changes
from the previous version:


  0001-0007: Unchanged from the previous set.

  0008: Replace reduced frame map with single match result
        (revised: int -> int64 type widening for
        row_is_in_reduced_frame() and related variables,
        cascading from 0025)

  0009: Add fixed-length group absorption for RPR
        (unchanged, design note [1])

  0010: Rename rpr_explain test views to descriptive
        names (unchanged)


  0011: Fix quote_identifier() for RPR pattern variable
        name deparse (new)

        Pattern variable names that match SQL keywords were
        not being quoted in pg_get_viewdef() output, causing
        the deparsed view to fail on re-parse.

  0012: Fix execRPR.o ordering in executor Makefile to
        match meson.build (new)

        Ensures consistent build file ordering between Make
        and Meson build systems.

  0013: Remove unused force_colno parameter from RPR
        deparse functions (new)

        Dead parameter cleanup in get_rpr_pattern() and
        get_rpr_define() in ruleutils.c.

  0014: Add CHECK_FOR_INTERRUPTS to RPR context cleanup
        and finalize loops (new)

        Extends interrupt checking to ExecRPRCleanupContexts()
        and ExecRPRFinalizeMatch() loops, complementing the
        earlier additions in 0002 and 0003.

  0015: Narrow variable scope in ExecInitWindowAgg DEFINE
        clause loop (new)

        Moves local variable declarations into their
        respective loop bodies for clarity.

  0016: Normalize RPR element flag macros to return bool
        (new)

        RPR_ELEM_IS_* macros now return bool instead of the
        raw bitmask value, consistent with PostgreSQL style.


  0017: Implement 1-slot PREV/NEXT navigation for RPR
        (was 0011, context changes from 0008 int64)

  0018: Add JIT compilation support for RPR PREV/NEXT
        (was 0012, unchanged)

  0019: Add tuplestore trim optimization for RPR PREV
        navigation (was 0013, context changes from 0008
        int64)

  0020: Update RPR code comments to reflect 1-slot
        navigation model (was 0014, context only)


  0021: Enable JIT compilation for PREV/NEXT navigation
        tests in RPR (new)

        Adds test coverage for JIT-compiled PREV/NEXT
        expressions under jit_above_cost=0, verifying
        that the slot reload path produces correct results.

  0022: Add 2-arg PREV/NEXT test for row pattern
        navigation with host variable (new)

        Tests the PREV(value, offset) form where offset
        is a host variable (prepared statement parameter),
        exercising the RPR_NAV_OFFSET_NEEDS_EVAL path.

  0023: Add Nav Mark Lookback to EXPLAIN and fix
        compute_nav_max_offset() (new)

        EXPLAIN now displays "Nav Mark Lookback: N" for
        RPR windows with PREV navigation, showing how
        many rows the tuplestore retains. Also fixes
        compute_nav_max_offset() to correctly handle
        non-constant offset expressions.


  0024: Implement FIRST/LAST and compound navigation
        for RPR (was 0015, completed)

        The previous WIP patch is now complete with:

        - Compound navigation: PREV(FIRST(col)),
          NEXT(LAST(col, N)), etc. per SQL standard 5.6.4.
          Flattened into a single RPRNavExpr node with
          two offset values (inner position + outer offset).

        - RPRNavOffsetKind enum replaces sentinel macros
          for offset classification (CONSTANT, NEEDS_EVAL,
          FIRST_BASED).

        - Per-context DEFINE re-evaluation for match_start-
          dependent expressions (FIRST, LAST with offset),
          tracked via defineMatchStartDependent bitmapset
          computed by the planner.

        - Tuplestore mark handling extended for FIRST-based
          navigation: mark position accounts for both
          backward reach (PREV) and forward-from-match-start
          reach (FIRST).

        - Documentation added to func-window.sgml.

        See the design note [2] for the absorption safety
        analysis and evaluation sharing rules.


  0025: Guard against int64 overflow in RPR bounded frame
        end computation (new)

        Widens frame position variables from int to int64
        in row_is_in_reduced_frame() and related functions.
        Prevents overflow when frame end position exceeds
        INT_MAX in large partitions.

  0026: Fix RPR error message style: hint format,
        terminology, capitalization (new)

        Aligns RPR error messages with PostgreSQL conventions:
        lowercase first word, consistent use of "row pattern"
        terminology, HINT format corrections.

  0027: Fix comment typos, grammar, and inaccuracies in
        RPR code (new)

        Comment-only changes across execRPR.c, parse_rpr.c,
        nodeWindowAgg.c, and ruleutils.c. No functional
        changes.

  0028: Fix RPR documentation: synopsis, grammar, and
        terminology (new)

        Updates select.sgml, advanced.sgml, and
        func-window.sgml for consistent terminology
        and corrected synopsis examples.

  0029: Fix nav_slot pass-by-ref dangling pointer in RPR
        navigation (new)

        When a DEFINE expression contains multiple navigation
        calls targeting different positions (e.g.,
        PREV(x,1) > PREV(x,2)), the second call re-fetches
        nav_slot, freeing the previous tuple via pfree. Any
        pass-by-ref datum extracted from the first navigation
        becomes a dangling pointer. Fix by copying pass-by-ref
        results into per-tuple memory in the RESTORE step.

  0030: Add inline comments for complex RPR algorithms and
        design notes (new)

        Adds explanatory comments to key algorithms in
        execRPR.c and rpr.c (planner). Covers NFA absorption
        logic, match evaluation flow, navigation mark
        computation, and DEFINE re-evaluation decisions.
        Comment-only changes, no functional changes.

  0031: Remove unused include and fix header ordering in RPR
        files (new)

        Removes an unnecessary include from execExprInterp.c,
        drops a redundant nodeWindowAgg.h include from
        nodeWindowAgg.c, and fixes header ordering in
        parse_rpr.c to match PostgreSQL convention.


Changes from the previous version:

  - 0015 (FIRST/LAST) is now complete as 0024, with compound
    navigation, RPRNavOffsetKind, defineMatchStartDependent,
    and full test coverage.

  - 16 new patches (0011-0016, 0021-0023, 0025-0031) added
    since the previous set. These are all quality improvements:
    bug fixes, safety guards, style normalization, comments,
    and test coverage.

  - New features: compound navigation in 0024 (was "not
    yet done" in previous set), Nav Mark Lookback in
    EXPLAIN output (0023). All other new patches are
    quality improvements.

I am planning to submit PREFIX pattern absorption [3]
as a separate patch after this set is committed. The
changes are confined to the Absorb phase with no
external dependencies, so it can be submitted
independently. The design itself has some complexity
(shadow path tracking), and I would like to refine it
further until I am fully confident in the approach.

With this set I have covered the features I could
foresee. Whether anything else belongs in scope is
your call. From here, I think the emphasis should
shift from adding features to raising overall
quality.
If there are directions you think should be explored,
I am happy to work on them. Otherwise, I plan to focus
on code review, test coverage, and stabilization, and
will continue to submit bug fixes and test additions
as I find them.

I also very much welcome refactoring or restructuring
suggestions from a committer's maintainability
perspective. If anything in the current code will be
painful to maintain long-term, I would much rather
address it now than leave it for after commit.

For stabilization, I am thinking of the same approach
as in early January: gcov analysis, Valgrind runs, and
expanding the test suite.

**WITH THE PLANNER**, the biggest risk is what we do
not know we are missing -- there are too many possible
feature interactions to enumerate up front. To surface
these, I plan to run RPR-only tests under gcov and
review the uncovered planner paths: any line that RPR
could plausibly reach but does not becomes a candidate
risk, and a candidate for a new test. That turns the
unknown into a concrete review list rather than a guess
about which interactions to try. Does that seem like a
good direction?

If anything comes to mind that should be added or
fixed -- at any time -- please let me know.

[1] Fixed-length group absorption design note
https://www.postgresql.org/message-id/CAAAe_zAKAGKpK9iHx3ZSuG59sP93r5dfootqv5tCfaMt%3Dw6wzA%40mail.gmail.com
[2] FIRST/LAST navigation design note
https://www.postgresql.org/message-id/CAAAe_zCUrKGBgZdaazdO_v9QWHsS_1DXuP%3DrLeNhO3iwsHwwbg%40mail.gmail.com
[3] PREFIX pattern absorption design note
https://www.postgresql.org/message-id/CAAAe_zDq7R8CaDfmh8C%2BH3_639Y5LtJD%2B2Z%3D1txDt%3DvaOr90rQ%40mail.gmail.com

Best regards,
Henson

Attachment

pgsql-hackers by date:

Previous
From: CharSyam
Date:
Subject: [PATCH] Reduce pg_class scans in GRANT/REVOKE ON ALL TABLES IN SCHEMA
Next
From: SATYANARAYANA NARLAPURAM
Date:
Subject: Re: var_is_nonnullable() fails to handle invalid NOT NULL constraints