Re: Row pattern recognition - Mailing list pgsql-hackers
| From | Henson Choi |
|---|---|
| Subject | Re: Row pattern recognition |
| Date | |
| Msg-id | CAAAe_zBESHXy0zvCHvYQvFtyk98dtyJAhsp3R6HvyXdP-n4-Qw@mail.gmail.com Whole thread Raw |
| In response to | Re: Row pattern recognition (Tatsuo Ishii <ishii@postgresql.org>) |
| List | pgsql-hackers |
Hi Ishii-san,
Glad the review was useful!
2026년 1월 2일 (금) AM 11:16, Tatsuo Ishii <ishii@postgresql.org>님이 작성:
Hi Henson,
Thank you for the review! I will take care the issues (it will take
sometime).
If you have other patches that need pre-analysis (coverage, style, valgrind, etc.), feel free to ask anytime. I enjoy this kind of work.
Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
> SQL/RPR Patch Review Report
> ============================
>
> Patch: Row Pattern Recognition (SQL:2016)
> Commitfest: https://commitfest.postgresql.org/patch/4460
>
> Review Methodology:
> This review focused on quality assessment, not line-by-line code audit.
> Key code paths and quality issues were examined with surrounding context
> when concerns arose. Documentation files were reviewed with AI-assisted
> grammar and typo checking. Code coverage was measured using gcov and
> custom analysis tools.
>
> Limitations:
> - Not a security audit or formal verification
> - Parser and executor internals reviewed at module level, not exhaustively
> - Performance characteristics not benchmarked
>
>
> TABLE OF CONTENTS
> -----------------
>
> 1. Executive Summary
> 2. Issues Found
> 2.1 Critical / Major
> 2.2 Minor Issues (Review Needed)
> 2.3 Minor Issues (Style)
> 2.4 Suggestions for Discussion
> 3. Test Coverage
> 3.1 Covered Areas
> 3.2 Untested Items
> 3.3 Unimplemented Features (No Test Needed)
> 4. Code Coverage Analysis
> 4.1 Overall Coverage
> 4.2 Coverage by File
> 4.3 Uncovered Code Risk Assessment
> 5. Commit Analysis
> 6. Recommendations
> 7. Conclusion
>
>
> 1. EXECUTIVE SUMMARY
> --------------------
>
> Overall assessment: GOOD
>
> The SQL/RPR patch demonstrates solid implementation quality within its
> defined scope. Code follows PostgreSQL coding standards (with minor style
> issues), test coverage is comprehensive at 96.4%, and documentation is
> thorough with only minor typos.
>
> Rating by Area:
> - Code Quality: Good (PostgreSQL style compliant, 26 static style fixes
> recommended)
> - Test Coverage: Good (96.4% line coverage, 28 test cases)
> - Documentation: Good (Complete, 1 minor typo)
> - Build/Regress: Pass (make check-world, 753 tests passed)
>
>
> 2. ISSUES FOUND
> ---------------
>
> 2.1 Critical / Major
>
> #1 [Code] Greedy pattern combinatorial explosion
> Pattern: PATTERN (A+ B+ C+) with DEFINE A AS TRUE, B AS TRUE, C AS TRUE
> Impact: Memory exhaustion or infinite wait
> Recommendation: Add work_mem-based memory limit (error on exceed)
>
> Evidence - No memory limit in current code:
> - nodeWindowAgg.c:5718-5722 string_set_init(): palloc() unconditional
> - nodeWindowAgg.c:5741-5750 string_set_add(): set_size *= 2; repalloc()
> unlimited
> - nodeWindowAgg.c:5095-5174 generate_patterns(): Triple loop, no limit
>
> Only work_mem usage in RPR (nodeWindowAgg.c:1297):
> winstate->buffer = tuplestore_begin_heap(false, false, work_mem);
> -> For tuple buffer, unrelated to pattern combination memory (StringSet)
>
> 2.2 Minor Issues (Review Needed)
>
> #1 [Code] nodeWindowAgg.c:5849,5909,5912
> pos > NUM_ALPHABETS check - intent unclear, causes error at 28 PATTERN
> elements
>
> Reproduction:
> - PATTERN (A B C ... Z A) (27 elements) -> OK
> - PATTERN (A B C ... Z A B) (28 elements) -> ERROR "initial is not valid
> char: b"
>
> 2.3 Minor Issues (Style)
>
> #1 [Code] nodeWindowAgg.c (25 blocks)
> #ifdef RPR_DEBUG -> recommend elog(DEBUG2, ...) or remove
>
> #2 [Code] src/backend/executor/nodeWindowAgg.c
> static keyword on separate line (26 functions)
>
> #3 [Code] src/backend/utils/adt/ruleutils.c
> Whitespace typo: "regexp !=NULL" -> "regexp != NULL"
>
> #4 [Code] nodeWindowAgg.c:4619
> Error message case: "Unrecognized" -> "unrecognized" (PostgreSQL style)
>
> #5 [Doc] doc/src/sgml/ref/select.sgml:1128
> Typo: "maximu" -> "maximum"
>
> 2.4 Suggestions for Discussion
>
> #1 Incremental matching with streaming NFA redesign
> Benefits:
> - O(n) time complexity per row (current: exponential in worst case)
> - Bounded memory via state merging and context absorption
> - Natural extension for OR patterns, {n,m} quantifiers, nested groups
> - Enables MEASURES clause with incremental aggregates
> - Proven approach in CEP engines (Flink, Esper)
>
>
> 3. TEST COVERAGE
> ----------------
>
> 3.1 Covered Areas
>
> - PATTERN clause: +, *, ? quantifiers (line 41, 71, 86)
> - DEFINE clause: Variable conditions, auto-TRUE for missing (line 120-133)
> - PREV/NEXT functions: Single argument (line 44, 173)
> - AFTER MATCH SKIP: TO NEXT ROW (line 182), PAST LAST ROW (line 198)
> - Aggregate integration: sum, avg, count, max, min (line 258-277)
> - Window function integration: first_value, last_value, nth_value (line
> 34-35)
> - JOIN/CTE: JOIN (line 313), WITH (line 324)
> - View: VIEW creation, pg_get_viewdef (line 390-406)
> - Error cases: 7 error scenarios (line 409-532)
> - Large partition: 5000 row smoke test (line 360-387)
> - ROWS BETWEEN offset: CURRENT ROW AND 2 FOLLOWING (line 244)
>
> 3.2 Untested Items
>
> Feature Priority Recommendation
> -------------------------------------------------------------------------------
> 26 variable limit Medium Test 26 variables success, 27th
> variable error
> NULL value handling Low Test PREV(col) where col or previous
> row is NULL
>
> Sample test for 26 variable limit:
>
> -- Should fail with 27th variable (parser error, no table needed)
> SELECT * FROM (SELECT 1 AS x) t
> WINDOW w AS (
> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> PATTERN (a b c d e f g h i j k l m n o p q r s t u v w x y z aa)
> DEFINE a AS TRUE, b AS TRUE, c AS TRUE, d AS TRUE, e AS TRUE, f AS
> TRUE,
> g AS TRUE, h AS TRUE, i AS TRUE, j AS TRUE, k AS TRUE, l AS
> TRUE,
> m AS TRUE, n AS TRUE, o AS TRUE, p AS TRUE, q AS TRUE, r AS
> TRUE,
> s AS TRUE, t AS TRUE, u AS TRUE, v AS TRUE, w AS TRUE, x AS
> TRUE,
> y AS TRUE, z AS TRUE, aa AS TRUE
> );
> -- ERROR: number of row pattern definition variable names exceeds 26
>
> Sample test for NULL handling:
>
> CREATE TEMP TABLE stock_null (company TEXT, tdate DATE, price INTEGER);
> INSERT INTO stock_null VALUES ('c1', '2023-07-01', 100);
> INSERT INTO stock_null VALUES ('c1', '2023-07-02', NULL); -- NULL in
> middle
> INSERT INTO stock_null VALUES ('c1', '2023-07-03', 200);
> INSERT INTO stock_null VALUES ('c1', '2023-07-04', 150);
>
> SELECT company, tdate, price, count(*) OVER w AS match_count
> FROM stock_null
> WINDOW w AS (
> PARTITION BY company
> ORDER BY tdate
> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> PATTERN (START UP DOWN)
> DEFINE START AS TRUE, UP AS price > PREV(price), DOWN AS price <
> PREV(price)
> );
> -- Result: all rows show match_count = 0 (NULL breaks pattern matching)
>
> 3.3 Unimplemented Features (No Test Needed)
>
> Per documentation (select.sgml:1133-1136):
> - MEASURES clause: Not implemented
> - SUBSET clause: Not implemented
> - AFTER MATCH variants: Only SKIP TO NEXT ROW, PAST LAST ROW supported
>
> Not in documentation, verified by testing:
> - {n,m} quantifier: Parser error ("syntax error at or near {")
> - (A|B) OR pattern: Not supported (parsed as single variable name "a|b")
> - (A B)+ compound repetition: Parser accepts, but execution returns 0
> matches
>
>
> 4. CODE COVERAGE ANALYSIS
> -------------------------
>
> 4.1 Overall Coverage
>
> 96.4% (732 / 759 lines)
>
> 4.2 Coverage by File (major RPR-modified files)
>
> nodeWindowAgg.c: 96.6% (560/580) - Pattern matching execution engine
> parse_clause.c: 96.7% (88/91) - PATTERN/DEFINE analysis
> ruleutils.c: 93.1% (54/58) - pg_get_viewdef output
>
> 4.3 Uncovered Code Risk Assessment
>
> Total: 27 lines uncovered
>
> Medium Risk (2 items) - Test addition recommended (see section 3.2):
> - parse_clause.c:4093: transformDefineClause - 27th variable error
> - nodeWindowAgg.c:5623: get_slots - null_slot for PREV at partition first
> row
>
> Low Risk (25 items) - Defensive code / Unreachable:
> - nodeWindowAgg.c:3074: attno_map_walker - Parser validates arg count
> - nodeWindowAgg.c:3137-3138: rpr_navigation_walker - switch default
> - nodeWindowAgg.c:3566: window_gettupleslot - Before mark position
> - nodeWindowAgg.c:4289: WinGetFuncArgInFrame - isout flag
> - nodeWindowAgg.c:4393: WinGetSlotInFrame - Boundary check
> - nodeWindowAgg.c:4618-4619: row_is_in_reduced_frame - switch default
> - nodeWindowAgg.c:4697: register_reduced_frame_map - pos < 0 check
> - nodeWindowAgg.c:5007: search_str_set - NULL return continue
> - nodeWindowAgg.c:5405: do_pattern_match - Regex compile error
> - nodeWindowAgg.c:5435,5437-5438: do_pattern_match - Regex exec error
> - nodeWindowAgg.c:5700: pattern_initial - Variable not found
> - nodeWindowAgg.c:5776: string_set_get - Index range check
> - nodeWindowAgg.c:5850: variable_pos_register - a-z range check
> - nodeWindowAgg.c:5910: variable_pos_fetch - a-z range check
> - nodeWindowAgg.c:5913: variable_pos_fetch - Index range check
> - parse_clause.c:3989: transformDefineClause - A_Expr type check
> - parse_clause.c:4145: transformPatternClause - A_Expr type check
> - ruleutils.c:6904-6908: get_rule_windowspec - SKIP TO NEXT ROW output
>
> Conclusion: Most uncovered code consists of defensive error handling or
> code unreachable due to parser pre-validation. No security or functional
> risk.
>
>
> 5. COMMIT ANALYSIS
> ------------------
>
> 8 sequential commits:
>
> Commit Area Files +/- Key Content
> -------------------------------------------------------------------------------
> 1 Raw Parser 4 +174/-16 gram.y grammar (PATTERN/DEFINE)
> 2 Parse/Analysis 4 +277/-1 parse_clause.c analysis logic
> 3 Rewriter 1 +109/-0 pg_get_viewdef extension
> 4 Planner 5 +73/-3 WindowAgg plan node extension
> 5 Executor 4 +1,942/-11 CORE: Pattern matching engine
> (+1,850)
> 6 Docs 3 +192/-7 advanced.sgml, func-window.sgml
> 7 Tests 3 +1,585/-1 rpr.sql (532), rpr.out (1,052)
> 8 typedefs 1 +6/-0 pgindent support
>
> Code Change Statistics:
> - Total files: 25
> - Lines added: 4,358
> - Lines deleted: 39
> - Net increase: +4,319 lines
>
>
> 6. RECOMMENDATIONS
> ------------------
>
> 6.1 Combinatorial Explosion Prevention (Major, Required)
>
> Add work_mem-based memory limit for StringSet allocation.
> Location: string_set_add() in nodeWindowAgg.c:5741-5750
> Consistent with existing PostgreSQL approach (Hash Join, Sort, etc.)
>
> 6.2 Code Review Required (Minor, Decision Needed)
>
> Location: nodeWindowAgg.c:5849,5909,5912
> Issue: pos > NUM_ALPHABETS check intent unclear
> Current: PATTERN with 28 elements causes error
> Question: Is 27 element limit intentional?
>
> 6.3 Code Style Fixes (Minor)
>
> - #ifdef RPR_DEBUG: Use elog(DEBUG2, ...) or remove (25 blocks)
> - static keyword on separate line: 26 functions to fix
> - Whitespace: "regexp !=NULL" -> "regexp != NULL"
> - Error message case: "Unrecognized" -> "unrecognized"
>
> 6.4 Documentation Fixes (Minor)
>
> - select.sgml: "maximu" -> "maximum"
>
> 6.5 Test Additions (Optional)
>
> Black-box Tests (Functional):
>
> Feature Test Case Priority
> -------------------------------------------------------------------------------
> Variable limit 26 variables success, 27 error Medium
> NULL boundary PREV at partition first row Medium
>
> White-box Tests (Coverage) - covered by above black-box tests:
>
> File:Line Target Branch
> -------------------------------------------------------------------------------
> parse_clause.c:4093 Limit error branch (Variable limit test)
> nodeWindowAgg.c:5623 null_slot usage (NULL boundary test)
>
>
> 7. CONCLUSION
> -------------
>
> Test Quality: GOOD
>
> Core functionality is thoroughly tested with comprehensive error case
> coverage.
>
> The patch is well-implemented within its defined scope. Identified issues
> include
> one major concern (combinatorial explosion) and minor style/documentation
> items.
>
> Recommended actions before commit:
> 1. Add work_mem-based memory limit for pattern combinations (required)
> 2. Clarify pos > NUM_ALPHABETS check intent (review needed)
> 3. Fix code style issues (optional but recommended)
> 4. Fix documentation typo (trivial)
> 5. Add tests for variable limit and NULL handling (optional)
>
> Points for discussion (optional):
> 6. Incremental matching with streaming NFA redesign
>
>
> Attachment:
> - coverage.tgz (gcov HTML report, RPR-modified code only)
>
>
> ---
> End of Report
>
> 2025년 9월 24일 (수) PM 7:36, Tatsuo Ishii <ishii@postgresql.org>님이 작성:
>
>> Attached are the v33 patches for Row pattern recognition. The
>> difference from v32 is that the raw parse tree printing patch is not
>> included in v33. This is because now that we have
>> debug_print_raw_parse.
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS K.K.
>> English: http://www.sraoss.co.jp/index_en/
>> Japanese:http://www.sraoss.co.jp
>>
Best regards,
Henson
pgsql-hackers by date: