Re: Row pattern recognition - Mailing list pgsql-hackers

From Henson Choi
Subject Re: Row pattern recognition
Date
Msg-id CAAAe_zC0ZYrjGR40u8On9w1d_fZC=O0D_JF548D6n5VNr-PkSA@mail.gmail.com
Whole thread Raw
In response to Re: Row pattern recognition  (Tatsuo Ishii <ishii@postgresql.org>)
List pgsql-hackers
Hi Ishii-san,

2026년 1월 17일 (토) PM 5:38, Tatsuo Ishii <ishii@postgresql.org>님이 작성:
Hi Henson,

> Already found and cleaned up.
> Will include it in the next patch.

Excellent! Thank you so much.

 Attached is the patch with the following changes:

1. Removed 26 variables limit and fixed count overflow:
   - Deleted defineInitial field from WindowClause (parsenodes.h),
     WindowAgg (plannodes.h), and related code in createplan.c and parse_rpr.c
   - Now uses RPR_VARID_MAX (252) constant for the limit check
   - Fixed NFA count overflow issue:
     * Changed RPRNFAState.counts from int16 to int32 (execnodes.h)
     * Added RPR_COUNT_MAX (INT32_MAX) constant (rpr.h)
     * Handles pattern matching failures correctly at >32,767 repetitions
     * Added INT32_MAX boundary handling to prevent overflow
   - Updated test cases in rpr.sql:
     * Updated variable limit tests from 26 to 252
     * Added 100K row large partition test to verify int32 overflow handling

2. Simplified context absorption logic (under review):
   - Key insight: Most absorption opportunities provide no practical benefit
     because subsequent contexts are discarded immediately after a match
     succeeds. Only absorption at unbounded pattern start shows measurable impact.
   - Changes:
     * Planner (rpr.c): Added isUnboundedStart() to identify patterns where
       leading unbounded quantifiers enable safe absorption
     * Executor (nodeWindowAgg.c): Simplified to only absorb at pattern start,
       separated cleanup logic into nfa_cleanup_dead_contexts()
     * Added pattern->isAbsorbable flag to enable absorption only when safe
   - Impact: Reduces context growth from quadratic to linear.
     Example from rpr_explain.sql Test 3.1 (PATTERN A+ B, 50 rows):
     * With absorption: 51 total contexts, 3 peak, 30 absorbed
     * Without absorption: Would create O(n²) contexts (each row starts a
       context, creating 1+2+3+4 contexts per 5-row segment)
     * Absorption maintains linear O(n) growth instead of quadratic explosion
   - Note: Changes in EXPLAIN ANALYZE output suggest there may be issues in
     the matching process, particularly in absorption and NFA state transitions.
     I need to review these areas more thoroughly before I can be confident in
     their correctness.

3. Added NFA statistics to EXPLAIN ANALYZE output (under review):
   - Extended explain.c to show NFA construction and matching statistics
   - Added comprehensive statistics tracking (execnodes.h):
     * NFALengthStats structure for min/max/avg length measurements
     * State statistics: peak, total created, merged (deduplicated)
     * Context statistics: peak, total created, absorbed, skipped, pruned
     * Match statistics: succeeded/failed counts with length distributions
   - Implemented statistics collection in nodeWindowAgg.c
   - New test suite: rpr_explain.sql for testing NFA statistics output
   - Note: rpr_explain is not included in parallel_schedule (runs separately)
   - Statistics reveal pattern complexity impact:
     Example from Test 2.7 (PATTERN ((A | B)+)+, 1000 rows):
     * NFA States: 2,664 peak, 892,447 total created
     * Peak grows linearly with rows (~2.6x), likely due to state merging
     * High total shows significant state churn despite controlled peak
     * This demonstrates complex pattern behavior that users should understand
     * Question: Should we add a GUC parameter to limit NFA state peak count
       (e.g., max_rpr_nfa_states) to provide administrators with safety controls?
   - This feature is still under review and may be refined in future patches

4. Major refactoring of pattern compilation and optimization (rpr.c):
   - Split large monolithic functions into smaller, semantically meaningful
     single-purpose functions for better readability
   - Pattern compilation refactored:
     * Added collectDefineVariables() to populate variable names from DEFINE
     * Split scanRPRPattern() into recursive worker and wrapper
     * Extracted allocateRPRPattern() for memory allocation
     * Split fillRPRPattern() by node type: fillRPRPatternVar(),
       fillRPRPatternGroup(), fillRPRPatternAlt()
   - Pattern optimization converted to dispatcher pattern:
     * optimizeSeqPattern() - flattens nested SEQ, merges consecutive vars/groups
     * optimizeAltPattern() - flattens nested ALT, removes duplicates
     * optimizeGroupPattern() - multiplies quantifiers, unwraps single children
   - Each optimization step broken into separate helper functions:
     * flattenSeqChildren(), mergeConsecutiveVars(), mergeConsecutiveGroups()
     * flattenAltChildren(), removeDuplicateAlternatives()
     * tryMultiplyQuantifiers(), tryUnwrapGroup()
   - New optimization: mergeGroupPrefixSuffix()
     * A B (A B)+ -> (A B){2,}
     * (A B)+ A B -> (A B){2,}
     * A B (A B)+ A B -> (A B){3,}
   - Code structure reorganized with categorized forward declarations
   - Added test cases in rpr.sql to verify new optimizations:
     * Consecutive GROUP merge tests
     * Quantifier multiplication tests (including INF blocking)
     * Group prefix/suffix merge tests

5. Fixed quantifier validation:
   - Corrected bounds checking in gram.y (e.g., {,m} now requires m >= 1)
   - Improved error messages for invalid quantifier ranges

Additional notes:
- The NFA statistics output format in EXPLAIN ANALYZE was quickly
  implemented for debugging purposes. I would appreciate feedback from
  experienced PostgreSQL developers on:
  * Whether the output format and structure are appropriate
  * Whether these statistics would be valuable for query tuning in production,
    or if they are only useful for development/debugging
  * If production-worthy, what simplifications might make them more accessible
    to users unfamiliar with NFA internals
- Reluctant quantifier handling needs further clarification. If I cannot
  make confident fixes before the next review cycle, I may temporarily
  raise errors for reluctant cases until the logic is properly sorted out.

Next steps:
- The executor implementation (nodeWindowAgg.c) is the only major component
  remaining for detailed review.
- I will focus on thoroughly reviewing the absorption logic and NFA state
  transition mechanisms to address the concerns noted in section 2.
- After making necessary improvements, I will send the next patch with
  more confidence in the executor correctness.

Best regards,
Henson
Attachment

pgsql-hackers by date:

Previous
From: Kirill Reshke
Date:
Subject: Re: Parallel CREATE INDEX for GIN indexes