Re: Row pattern recognition - Mailing list pgsql-hackers

From Henson Choi
Subject Re: Row pattern recognition
Date
Msg-id CAAAe_zBESHXy0zvCHvYQvFtyk98dtyJAhsp3R6HvyXdP-n4-Qw@mail.gmail.com
Whole thread Raw
In response to Re: Row pattern recognition  (Tatsuo Ishii <ishii@postgresql.org>)
List pgsql-hackers

Hi Ishii-san,

Glad the review was useful!

2026년 1월 2일 (금) AM 11:16, Tatsuo Ishii <ishii@postgresql.org>님이 작성:

Hi Henson,

Thank you for the review! I will take care the issues (it will take
sometime).


If you have other patches that need pre-analysis (coverage, style, valgrind, etc.), feel free to ask anytime. I enjoy this kind of work.

 
Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

> SQL/RPR Patch Review Report
> ============================
>
> Patch: Row Pattern Recognition (SQL:2016)
> Commitfest: https://commitfest.postgresql.org/patch/4460
>
> Review Methodology:
>   This review focused on quality assessment, not line-by-line code audit.
>   Key code paths and quality issues were examined with surrounding context
>   when concerns arose. Documentation files were reviewed with AI-assisted
>   grammar and typo checking. Code coverage was measured using gcov and
>   custom analysis tools.
>
> Limitations:
>   - Not a security audit or formal verification
>   - Parser and executor internals reviewed at module level, not exhaustively
>   - Performance characteristics not benchmarked
>
>
> TABLE OF CONTENTS
> -----------------
>
> 1. Executive Summary
> 2. Issues Found
>    2.1 Critical / Major
>    2.2 Minor Issues (Review Needed)
>    2.3 Minor Issues (Style)
>    2.4 Suggestions for Discussion
> 3. Test Coverage
>    3.1 Covered Areas
>    3.2 Untested Items
>    3.3 Unimplemented Features (No Test Needed)
> 4. Code Coverage Analysis
>    4.1 Overall Coverage
>    4.2 Coverage by File
>    4.3 Uncovered Code Risk Assessment
> 5. Commit Analysis
> 6. Recommendations
> 7. Conclusion
>
>
> 1. EXECUTIVE SUMMARY
> --------------------
>
> Overall assessment: GOOD
>
> The SQL/RPR patch demonstrates solid implementation quality within its
> defined scope. Code follows PostgreSQL coding standards (with minor style
> issues), test coverage is comprehensive at 96.4%, and documentation is
> thorough with only minor typos.
>
> Rating by Area:
> - Code Quality:     Good (PostgreSQL style compliant, 26 static style fixes
> recommended)
> - Test Coverage:    Good (96.4% line coverage, 28 test cases)
> - Documentation:    Good (Complete, 1 minor typo)
> - Build/Regress:    Pass (make check-world, 753 tests passed)
>
>
> 2. ISSUES FOUND
> ---------------
>
> 2.1 Critical / Major
>
> #1 [Code] Greedy pattern combinatorial explosion
>    Pattern: PATTERN (A+ B+ C+) with DEFINE A AS TRUE, B AS TRUE, C AS TRUE
>    Impact: Memory exhaustion or infinite wait
>    Recommendation: Add work_mem-based memory limit (error on exceed)
>
>    Evidence - No memory limit in current code:
>    - nodeWindowAgg.c:5718-5722 string_set_init(): palloc() unconditional
>    - nodeWindowAgg.c:5741-5750 string_set_add(): set_size *= 2; repalloc()
> unlimited
>    - nodeWindowAgg.c:5095-5174 generate_patterns(): Triple loop, no limit
>
>    Only work_mem usage in RPR (nodeWindowAgg.c:1297):
>      winstate->buffer = tuplestore_begin_heap(false, false, work_mem);
>    -> For tuple buffer, unrelated to pattern combination memory (StringSet)
>
> 2.2 Minor Issues (Review Needed)
>
> #1 [Code] nodeWindowAgg.c:5849,5909,5912
>    pos > NUM_ALPHABETS check - intent unclear, causes error at 28 PATTERN
> elements
>
>    Reproduction:
>    - PATTERN (A B C ... Z A) (27 elements) -> OK
>    - PATTERN (A B C ... Z A B) (28 elements) -> ERROR "initial is not valid
> char: b"
>
> 2.3 Minor Issues (Style)
>
> #1 [Code] nodeWindowAgg.c (25 blocks)
>    #ifdef RPR_DEBUG -> recommend elog(DEBUG2, ...) or remove
>
> #2 [Code] src/backend/executor/nodeWindowAgg.c
>    static keyword on separate line (26 functions)
>
> #3 [Code] src/backend/utils/adt/ruleutils.c
>    Whitespace typo: "regexp !=NULL" -> "regexp != NULL"
>
> #4 [Code] nodeWindowAgg.c:4619
>    Error message case: "Unrecognized" -> "unrecognized" (PostgreSQL style)
>
> #5 [Doc] doc/src/sgml/ref/select.sgml:1128
>    Typo: "maximu" -> "maximum"
>
> 2.4 Suggestions for Discussion
>
> #1 Incremental matching with streaming NFA redesign
>    Benefits:
>    - O(n) time complexity per row (current: exponential in worst case)
>    - Bounded memory via state merging and context absorption
>    - Natural extension for OR patterns, {n,m} quantifiers, nested groups
>    - Enables MEASURES clause with incremental aggregates
>    - Proven approach in CEP engines (Flink, Esper)
>
>
> 3. TEST COVERAGE
> ----------------
>
> 3.1 Covered Areas
>
> - PATTERN clause: +, *, ? quantifiers (line 41, 71, 86)
> - DEFINE clause: Variable conditions, auto-TRUE for missing (line 120-133)
> - PREV/NEXT functions: Single argument (line 44, 173)
> - AFTER MATCH SKIP: TO NEXT ROW (line 182), PAST LAST ROW (line 198)
> - Aggregate integration: sum, avg, count, max, min (line 258-277)
> - Window function integration: first_value, last_value, nth_value (line
> 34-35)
> - JOIN/CTE: JOIN (line 313), WITH (line 324)
> - View: VIEW creation, pg_get_viewdef (line 390-406)
> - Error cases: 7 error scenarios (line 409-532)
> - Large partition: 5000 row smoke test (line 360-387)
> - ROWS BETWEEN offset: CURRENT ROW AND 2 FOLLOWING (line 244)
>
> 3.2 Untested Items
>
> Feature                  Priority   Recommendation
> -------------------------------------------------------------------------------
> 26 variable limit        Medium     Test 26 variables success, 27th
> variable error
> NULL value handling      Low        Test PREV(col) where col or previous
> row is NULL
>
> Sample test for 26 variable limit:
>
>     -- Should fail with 27th variable (parser error, no table needed)
>     SELECT * FROM (SELECT 1 AS x) t
>     WINDOW w AS (
>       ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>       PATTERN (a b c d e f g h i j k l m n o p q r s t u v w x y z aa)
>       DEFINE a AS TRUE, b AS TRUE, c AS TRUE, d AS TRUE, e AS TRUE, f AS
> TRUE,
>              g AS TRUE, h AS TRUE, i AS TRUE, j AS TRUE, k AS TRUE, l AS
> TRUE,
>              m AS TRUE, n AS TRUE, o AS TRUE, p AS TRUE, q AS TRUE, r AS
> TRUE,
>              s AS TRUE, t AS TRUE, u AS TRUE, v AS TRUE, w AS TRUE, x AS
> TRUE,
>              y AS TRUE, z AS TRUE, aa AS TRUE
>     );
>     -- ERROR: number of row pattern definition variable names exceeds 26
>
> Sample test for NULL handling:
>
>     CREATE TEMP TABLE stock_null (company TEXT, tdate DATE, price INTEGER);
>     INSERT INTO stock_null VALUES ('c1', '2023-07-01', 100);
>     INSERT INTO stock_null VALUES ('c1', '2023-07-02', NULL);  -- NULL in
> middle
>     INSERT INTO stock_null VALUES ('c1', '2023-07-03', 200);
>     INSERT INTO stock_null VALUES ('c1', '2023-07-04', 150);
>
>     SELECT company, tdate, price, count(*) OVER w AS match_count
>     FROM stock_null
>     WINDOW w AS (
>       PARTITION BY company
>       ORDER BY tdate
>       ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>       PATTERN (START UP DOWN)
>       DEFINE START AS TRUE, UP AS price > PREV(price), DOWN AS price <
> PREV(price)
>     );
>     -- Result: all rows show match_count = 0 (NULL breaks pattern matching)
>
> 3.3 Unimplemented Features (No Test Needed)
>
> Per documentation (select.sgml:1133-1136):
> - MEASURES clause: Not implemented
> - SUBSET clause: Not implemented
> - AFTER MATCH variants: Only SKIP TO NEXT ROW, PAST LAST ROW supported
>
> Not in documentation, verified by testing:
> - {n,m} quantifier: Parser error ("syntax error at or near {")
> - (A|B) OR pattern: Not supported (parsed as single variable name "a|b")
> - (A B)+ compound repetition: Parser accepts, but execution returns 0
> matches
>
>
> 4. CODE COVERAGE ANALYSIS
> -------------------------
>
> 4.1 Overall Coverage
>
> 96.4% (732 / 759 lines)
>
> 4.2 Coverage by File (major RPR-modified files)
>
> nodeWindowAgg.c:    96.6% (560/580) - Pattern matching execution engine
> parse_clause.c:     96.7% (88/91) - PATTERN/DEFINE analysis
> ruleutils.c:        93.1% (54/58) - pg_get_viewdef output
>
> 4.3 Uncovered Code Risk Assessment
>
> Total: 27 lines uncovered
>
> Medium Risk (2 items) - Test addition recommended (see section 3.2):
> - parse_clause.c:4093: transformDefineClause - 27th variable error
> - nodeWindowAgg.c:5623: get_slots - null_slot for PREV at partition first
> row
>
> Low Risk (25 items) - Defensive code / Unreachable:
> - nodeWindowAgg.c:3074: attno_map_walker - Parser validates arg count
> - nodeWindowAgg.c:3137-3138: rpr_navigation_walker - switch default
> - nodeWindowAgg.c:3566: window_gettupleslot - Before mark position
> - nodeWindowAgg.c:4289: WinGetFuncArgInFrame - isout flag
> - nodeWindowAgg.c:4393: WinGetSlotInFrame - Boundary check
> - nodeWindowAgg.c:4618-4619: row_is_in_reduced_frame - switch default
> - nodeWindowAgg.c:4697: register_reduced_frame_map - pos < 0 check
> - nodeWindowAgg.c:5007: search_str_set - NULL return continue
> - nodeWindowAgg.c:5405: do_pattern_match - Regex compile error
> - nodeWindowAgg.c:5435,5437-5438: do_pattern_match - Regex exec error
> - nodeWindowAgg.c:5700: pattern_initial - Variable not found
> - nodeWindowAgg.c:5776: string_set_get - Index range check
> - nodeWindowAgg.c:5850: variable_pos_register - a-z range check
> - nodeWindowAgg.c:5910: variable_pos_fetch - a-z range check
> - nodeWindowAgg.c:5913: variable_pos_fetch - Index range check
> - parse_clause.c:3989: transformDefineClause - A_Expr type check
> - parse_clause.c:4145: transformPatternClause - A_Expr type check
> - ruleutils.c:6904-6908: get_rule_windowspec - SKIP TO NEXT ROW output
>
> Conclusion: Most uncovered code consists of defensive error handling or
> code unreachable due to parser pre-validation. No security or functional
> risk.
>
>
> 5. COMMIT ANALYSIS
> ------------------
>
> 8 sequential commits:
>
> Commit  Area            Files   +/-         Key Content
> -------------------------------------------------------------------------------
> 1       Raw Parser      4       +174/-16    gram.y grammar (PATTERN/DEFINE)
> 2       Parse/Analysis  4       +277/-1     parse_clause.c analysis logic
> 3       Rewriter        1       +109/-0     pg_get_viewdef extension
> 4       Planner         5       +73/-3      WindowAgg plan node extension
> 5       Executor        4       +1,942/-11  CORE: Pattern matching engine
> (+1,850)
> 6       Docs            3       +192/-7     advanced.sgml, func-window.sgml
> 7       Tests           3       +1,585/-1   rpr.sql (532), rpr.out (1,052)
> 8       typedefs        1       +6/-0       pgindent support
>
> Code Change Statistics:
> - Total files: 25
> - Lines added: 4,358
> - Lines deleted: 39
> - Net increase: +4,319 lines
>
>
> 6. RECOMMENDATIONS
> ------------------
>
> 6.1 Combinatorial Explosion Prevention (Major, Required)
>
> Add work_mem-based memory limit for StringSet allocation.
> Location: string_set_add() in nodeWindowAgg.c:5741-5750
> Consistent with existing PostgreSQL approach (Hash Join, Sort, etc.)
>
> 6.2 Code Review Required (Minor, Decision Needed)
>
> Location: nodeWindowAgg.c:5849,5909,5912
> Issue: pos > NUM_ALPHABETS check intent unclear
> Current: PATTERN with 28 elements causes error
> Question: Is 27 element limit intentional?
>
> 6.3 Code Style Fixes (Minor)
>
> - #ifdef RPR_DEBUG: Use elog(DEBUG2, ...) or remove (25 blocks)
> - static keyword on separate line: 26 functions to fix
> - Whitespace: "regexp !=NULL" -> "regexp != NULL"
> - Error message case: "Unrecognized" -> "unrecognized"
>
> 6.4 Documentation Fixes (Minor)
>
> - select.sgml: "maximu" -> "maximum"
>
> 6.5 Test Additions (Optional)
>
> Black-box Tests (Functional):
>
> Feature              Test Case                                    Priority
> -------------------------------------------------------------------------------
> Variable limit       26 variables success, 27 error               Medium
> NULL boundary        PREV at partition first row                  Medium
>
> White-box Tests (Coverage) - covered by above black-box tests:
>
> File:Line                          Target Branch
> -------------------------------------------------------------------------------
> parse_clause.c:4093                Limit error branch (Variable limit test)
> nodeWindowAgg.c:5623               null_slot usage (NULL boundary test)
>
>
> 7. CONCLUSION
> -------------
>
> Test Quality: GOOD
>
> Core functionality is thoroughly tested with comprehensive error case
> coverage.
>
> The patch is well-implemented within its defined scope. Identified issues
> include
> one major concern (combinatorial explosion) and minor style/documentation
> items.
>
> Recommended actions before commit:
> 1. Add work_mem-based memory limit for pattern combinations (required)
> 2. Clarify pos > NUM_ALPHABETS check intent (review needed)
> 3. Fix code style issues (optional but recommended)
> 4. Fix documentation typo (trivial)
> 5. Add tests for variable limit and NULL handling (optional)
>
> Points for discussion (optional):
> 6. Incremental matching with streaming NFA redesign
>
>
> Attachment:
> - coverage.tgz (gcov HTML report, RPR-modified code only)
>
>
> ---
> End of Report
>
> 2025년 9월 24일 (수) PM 7:36, Tatsuo Ishii <ishii@postgresql.org>님이 작성:
>
>> Attached are the v33 patches for Row pattern recognition.  The
>> difference from v32 is that the raw parse tree printing patch is not
>> included in v33. This is because now that we have
>> debug_print_raw_parse.
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS K.K.
>> English: http://www.sraoss.co.jp/index_en/
>> Japanese:http://www.sraoss.co.jp
>>

Best regards,

Henson 

pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: Row pattern recognition
Next
From: Robert Treat
Date:
Subject: Re: DOC: fixes multiple errors in alter table doc