Re: Row pattern recognition - Mailing list pgsql-hackers

From Henson Choi
Subject Re: Row pattern recognition
Date
Msg-id CAAAe_zDCwjuzomxnsR5uO-d7D-1RD53M7Nb1USTm738YyNRwpw@mail.gmail.com
Whole thread
In response to Re: Row pattern recognition  (Henson Choi <assam258@gmail.com>)
List pgsql-hackers
Hi hackers,

I'd like to share an internal design document for the NFA executor
in the row pattern recognition (RPR) patch.

The attached guide covers:

- Pattern element structure (RPRPatternElement fields and flags)
- Compilation phases (AST to flat element array)
- Runtime execution model (Match / Absorb / Advance phases)
- Cycle detection for zero-consumption loops
- Empty-loop handling for nullable group bodies
- Reluctant quantifier support
- Absorption optimization for redundant context pruning

The full English text follows below.  All three language
versions are also attached:

  RPR_NFA_Algorithm_Guide.txt     (English)
  RPR_NFA_Algorithm_Guide_KO.txt  (Korean)
  RPR_NFA_Algorithm_Guide_JA.txt  (Japanese)

These are reference documents intended to help reviewers understand
the NFA executor internals.  They are not part of the patch itself.

================================================================================
  PostgreSQL Row Pattern Recognition: Flat-Array Stream NFA Guide
================================================================================

  Target audience: Developers with a basic understanding of the PostgreSQL
                   executor and planner architecture

  Scope: The entire process from PATTERN/DEFINE clause parsing to NFA
         runtime execution

  Related code:
    - src/backend/parser/parse_rpr.c          (parser phase)
    - src/backend/optimizer/plan/rpr.c        (optimizer phase)
    - src/backend/executor/nodeWindowAgg.c    (executor phase)
    - src/include/nodes/plannodes.h           (plan node definitions)
    - src/include/nodes/execnodes.h           (execution state definitions)
    - src/include/optimizer/rpr.h             (types and constants)

================================================================================


What is a Flat-Array Stream NFA?

  The NFA in this implementation is not a traditional state-transition graph
  but a flat array of fixed-size 16-byte elements. At runtime, it processes
  the row stream in a forward-only manner, expanding epsilon transitions
  eagerly without backtracking.

  - Flat-Array: Pattern compiled into a flat array,
                not a graph (Chapter IV)
  - Stream:     Rows consumed sequentially in one direction,
                never revisited (Chapter XII)
  - NFA:        Nondeterministic execution where multiple states
                coexist within a single context (Chapter VI)


Chapter I  Row Pattern Recognition Overview
================================================================================

Row Pattern Recognition (hereafter RPR) is a feature introduced in SQL:2016
that matches regex-based patterns against ordered row sets.

The SQL standard defines two forms:

  Feature R010: MATCH_RECOGNIZE (FROM clause)
    - Dedicated table operator
    - Provides dedicated functions such as MATCH_NUMBER(), CLASSIFIER()
    - Supports ONE ROW PER MATCH / ALL ROWS PER MATCH

  Feature R020: RPR in a window (WINDOW clause)
    - Integrated into the existing window function framework
    - Supports ALL ROWS PER MATCH only
    - No MATCH_NUMBER()

This implementation targets Feature R020.

The basic syntax is as follows:

  SELECT ...
  OVER (
    PARTITION BY ...
    ORDER BY ...
    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    [INITIAL | SEEK]   -- SEEK is defined in the standard but not implemented
    AFTER MATCH SKIP TO NEXT ROW | SKIP PAST LAST ROW
    PATTERN ( <regex> )
    DEFINE <variable> AS <condition>, ...
  )

The PATTERN clause is a regular expression over row pattern variables.
The DEFINE clause specifies boolean conditions that determine whether each
variable evaluates to true for the current row.

Example:

  PATTERN (A+ B)
  DEFINE A AS price > PREV(price),
         B AS price < PREV(price)

This pattern matches "a span where prices rise consecutively then drop."


Chapter II  Overall Processing Pipeline
================================================================================

RPR processing is divided into three phases:

  +------------------------------------------------------------+
  |  1. Parsing (Parser)                                       |
  |     SQL text -> PATTERN AST + DEFINE expression tree       |
  |                                                            |
  |  2. Compilation (Optimizer/Planner)                        |
  |     PATTERN AST -> optimization -> flat NFA element array  |
  |                                                            |
  |  3. Execution (Executor)                                   |
  |     Row-by-row matching via NFA simulation                 |
  +------------------------------------------------------------+

Each phase uses independent data structures, and the interfaces between
phases are well-defined:

  Parser -> Planner:    WindowClause.rpPattern (RPRPatternNode tree)
                        WindowClause.defineClause (TargetEntry list)

  Planner -> Executor:  WindowAgg.rpPattern (RPRPattern struct)
                        WindowAgg.defineClause (TargetEntry list)


Chapter III  Parsing Phase
================================================================================

III-1. Entry Point

  transformWindowDefinitions() (parse_clause.c)
    +-- transformRPR() (parse_rpr.c)

transformRPR() is invoked when RPCommonSyntax is present and performs the
following:

  (1) Frame option validation
      - Only ROWS is allowed (RANGE, GROUPS are not)
      - The start boundary must be CURRENT ROW
      - EXCLUDE option is not allowed

  (2) Transcription to WindowClause
      - Copies rpPattern, rpSkipTo, initial fields

  (3) DEFINE clause transformation (transformDefineClause)

III-2. PATTERN AST

The parser transforms the PATTERN clause into an RPRPatternNode tree.
Each node has one of the following four types:

  RPR_PATTERN_VAR    Variable reference. Name stored in varName field.
  RPR_PATTERN_SEQ    Concatenation. Children node list in children.
  RPR_PATTERN_ALT    Alternation. Branch node list in children.
  RPR_PATTERN_GROUP  Group (parentheses). Body node list in children.

All nodes have min/max fields to express quantifiers:

  A       -> VAR(A, min=1, max=1)
  A+      -> VAR(A, min=1, max=INF)
  A*      -> VAR(A, min=0, max=INF)
  A?      -> VAR(A, min=0, max=1)
  A{3,5}  -> VAR(A, min=3, max=5)

If the reluctant field is not -1, the quantifier is reluctant (non-greedy).

Example: PATTERN ((A+ B) | C*)

  ALT
  +-- SEQ
  |   +-- VAR(A, 1, INF)
  |   +-- VAR(B, 1, 1)
  +-- VAR(C, 0, INF)

III-3. DEFINE Clause Transformation

transformDefineClause() processes each DEFINE variable as follows:

  (1) Checks for duplicate variable names
  (2) Transforms the expression into a standard SQL expression
  (3) Coerces to Boolean type (coerce_to_boolean)
  (4) Wraps in a TargetEntry with the variable name set in resname

Variables that are used in PATTERN but not defined in DEFINE are implicitly
evaluated as TRUE (matching all rows).


Chapter IV  Compilation Phase
================================================================================

IV-1. Entry Point

  create_windowagg_plan() (createplan.c)
    +-- collectPatternVariables()   Collect variable names
    +-- filterDefineClause()        Remove unused DEFINE entries
    +-- buildRPRPattern()           NFA compilation (6 phases)

IV-2. The 6 Phases of buildRPRPattern()

  Phase 1: AST optimization (optimizeRPRPattern)
  Phase 2: Statistics collection (scanRPRPattern)
  Phase 3: Memory allocation (allocateRPRPattern)
  Phase 4: NFA element fill (fillRPRPattern)
  Phase 5: Finalization (finalizeRPRPattern)
  Phase 6: Absorbability analysis (computeAbsorbability)

IV-3. Phase 1: AST Optimization

After copying the parser-generated AST, the following optimizations are
applied:

  (a) SEQ flattening: Unwrap nested SEQ nodes
      SEQ(A, SEQ(B, C)) -> SEQ(A, B, C)

  (b) Consecutive variable merging: Merge consecutive occurrences of the
      same variable into a single quantifier
      A A -> A{2}
      A{2,3} A{1,2} -> A{3,5}

  (c) Consecutive group merging: Merge repeated identical groups
      (A B)+ (A B)+ -> (A B){2,INF}

  (d) Consecutive ALT merging: Merge repeated identical ALT nodes
      (A | B) (A | B) (A | B) -> (A | B){3}

  (e) Prefix/suffix absorption: Absorb identical sequences before/after
      a group
      A B (A B)+ -> (A B){2,INF}

  (f) ALT flattening and deduplication
      (A | (B | C)) -> (A | B | C)
      (A | B | A) -> (A | B)

  (g) Quantifier multiplication: Collapse nested quantifiers when safe
      (A+)+ -> A+
      (A{2,3}){5} -> A{10,15}

  (h) Single-child unwrap
      SEQ(A) -> A,  (A){1,1} -> A

IV-4. Phase 4: NFA Element Array Generation

Transforms the optimized AST into a flat array of RPRPatternElement.
This is the core data structure used for NFA simulation at runtime.

RPRPatternElement struct (16 bytes):

  Field      Size     Description
  ---------------------------------------------------------
  varId      1B      Variable ID (0-251) or control code (252-255)
  depth      1B      Group nesting depth
  flags      1B      Bit flags (see below)
  reserved   1B      Padding
  min        4B      Quantifier lower bound
  max        4B      Quantifier upper bound
  next       2B      Next element index (sequential flow)
  jump       2B      Branch target index (for ALT/GROUP)

Control codes:

  RPR_VARID_BEGIN (252)  Group start marker
  RPR_VARID_END   (253)  Group end marker
  RPR_VARID_ALT   (254)  Alternation start marker
  RPR_VARID_FIN   (255)  Pattern completion marker

Element flags (1 byte, bitmask):

  Bit  Constant                    Set on        Description
  --------------------------------------------------------------------------
  0x01 RPR_ELEM_RELUCTANT          VAR, BEGIN,   Non-greedy quantifier.
                                   END           Prefers shorter match: try
                                                 exit-loop first, then repeat.
                                                 Set on VAR for simple (A+?),
                                                 on BEGIN+END for group ((...)+?)

  0x02 RPR_ELEM_EMPTY_LOOP         END           Group body can produce empty
                                                 match (all children nullable).
                                                 Creates a fast-forward exit clone
                                                 alongside the normal loop-back so
                                                 cycle detection doesn't kill
                                                 legitimate matches. (IV-4b)

  0x04 RPR_ELEM_ABSORBABLE_BRANCH  VAR, BEGIN,   Element lies within an
                                   END, ALT      absorbable region. Used at
                                                 runtime to track whether the
                                                 current NFA state is in an
                                                 absorbable context.

  0x08 RPR_ELEM_ABSORBABLE         VAR, END      Absorption judgment point.
                                                 WHERE to compare consecutive
                                                 iterations for absorption.
                                                 - Simple unbounded VAR (A+):
                                                   set on the VAR itself
                                                 - Unbounded GROUP ((A B)+):
                                                   set on the END element only

  Accessor macros:
    RPRElemIsReluctant(e)        (e)->flags & 0x01
    RPRElemCanEmptyLoop(e)       (e)->flags & 0x02
    RPRElemIsAbsorbableBranch(e) (e)->flags & 0x04
    RPRElemIsAbsorbable(e)       (e)->flags & 0x08

Example: PATTERN (A+ B | C)

  AST: ALT(SEQ(VAR(A,1,INF), VAR(B,1,1)), VAR(C,1,1))

  Compilation result:

  idx  varId  depth  min  max  next  jump  Description
  ---------------------------------------------------
   0   ALT    0      1    1    1     -1    Alternation start
   1   A(0)   1      1    INF  2     3     Branch 1: A+
   2   B(1)   1      1    1    4     -1    Branch 1: B -> FIN
   3   C(2)   1      1    1    4     -1    Branch 2: C -> FIN
   4   FIN    0      1    1    -1    -1    Pattern completion

  - idx 0: ALT marker. next(=1) is the start of the first branch
  - idx 1: Variable A. next(=2) is B, jump(=3) is the start of the second branch
  - idx 2: Variable B. next(=4) is FIN
  - idx 3: Variable C. next(=4) is FIN
  - idx 4: FIN marker. Match completion signal

Roles of next and jump:

  - next: The next element to move to "after consuming" the current element.
          For VAR, the next position after a successful match.
          For BEGIN/END, the next position inside/outside the group.

  - jump: The element to "skip to."
          In ALT, a jump from one branch to the next branch.
          In BEGIN, a skip path to END+1 (for groups with min=0).
          In END, a loop-back to the start of the group body.

Example: PATTERN ((A B)+)

  idx  varId    depth  min  max  next  jump  Description
  ---------------------------------------------------
   0   BEGIN    0      1    INF  1     4     Group start
   1   A(0)     1      1    1    2     -1    A
   2   B(1)     1      1    1    3     -1    B
   3   END      0      1    INF  4     1     Group end
   4   FIN      0      1    1    -1    -1    Pattern completion

  - idx 0: BEGIN. next(=1) enters the group body.
           jump(=4) skips to after END = FIN (used when min=0).
  - idx 3: END. next(=4) exits the group.
           jump(=1) loops back to the start of the group body.

IV-4a. Reluctant Flag (RPR_ELEM_RELUCTANT)

The reluctant flag is set during Phase 4 (fillRPRPattern) when the AST node
has a non-negative reluctant field (reluctant >= 0). It reverses the priority
of quantifier expansion at runtime:

  Greedy (default):  try loop-back first, then exit  (prefer longer match)
  Reluctant:         try exit first, then loop-back   (prefer shorter match)

The flag is set on all elements that carry the quantifier:

  Simple VAR (A+?):     RPR_ELEM_RELUCTANT on the VAR element
  Group ((...)+?):      RPR_ELEM_RELUCTANT on BEGIN and END elements

At runtime (nfa_advance), the flag controls DFS exploration order:

  VAR with quantifier:
    Greedy:    primary path = next (continue), clone = jump (skip)
    Reluctant: primary path = jump (skip), clone = next (continue)

  END element:
    Greedy:    primary path = jump (loop-back), clone = next (exit)
    Reluctant: primary path = next (exit), clone = jump (loop-back)

  BEGIN with min=0:
    Greedy:    primary path = next (enter group), clone = jump (skip)
    Reluctant: primary path = jump (skip), clone = next (enter group)

The absorption optimization requires greedy quantifiers. Reluctant
quantifiers are excluded from absorbability analysis (see IV-5).

IV-4b. Empty Loop Flag (RPR_ELEM_EMPTY_LOOP)

The empty-loop flag is set during Phase 4 (fillRPRPatternGroup) on the END
element when the group body is nullable -- i.e., every path through the
body can match zero rows (all children are nullable).

Example patterns that trigger this flag:

  (A?)*    A is nullable (min=0), so group body is nullable -> END gets flag
  (A? B?)+ Both children nullable -> body nullable -> END gets flag
  (A | B*) B* is nullable, making the ALT nullable -> END gets flag

The flag works in conjunction with the zero-consumption cycle detection
(elemIdx visited bitmap). Without this flag, cycle detection alone would
cause legitimate matches to fail.

Problem example: (A*){2,3}
  - Iteration 1: A* consumes all available rows -> count=1, END reached
  - Loop-back for iteration 2: A* matches zero rows -> END reached again
  - Cycle detection sees the same elemIdx on the same row -> state killed
  - count never reaches min(2) -> match fails (incorrect)

With the RPR_ELEM_EMPTY_LOOP flag, nfa_advance_end creates two paths:
the normal loop-back (which cycle detection will eventually kill) and
a fast-forward exit clone that bypasses the loop entirely.
(See IX-4(c) for detailed runtime behavior.)
    - Zero-consumption is impossible since body is not nullable

IV-5. Absorbability Analysis (RPR_ELEM_ABSORBABLE)

Context absorption is an optimization technique that reduces O(n^2) to O(n).
(Runtime behavior is described in Chapter VIII.)

This phase determines whether the pattern has a structure suitable for the
absorption optimization and sets flags on the relevant elements:

  RPR_ELEM_ABSORBABLE         Absorption comparison point
  RPR_ELEM_ABSORBABLE_BRANCH  Element within an absorbable region

Eligibility conditions:

  (1) SKIP PAST LAST ROW (not NEXT ROW)
  (2) Frame end is UNBOUNDED FOLLOWING

Structural conditions (isUnboundedStart + computeAbsorbabilityRecursive):

  Case 1: Simple VAR+ (e.g., A+)
          -> ABSORBABLE | ABSORBABLE_BRANCH set on the VAR
  Case 2: GROUP+ whose body consists only of {1,1} VARs (e.g., (A B)+)
          -> ABSORBABLE_BRANCH on children,
            ABSORBABLE | ABSORBABLE_BRANCH on END
  Case 3: GROUP+ whose body starts with VAR+ (e.g., (A+ B)+)
          -> Recurses from BEGIN into the body, applying Case 1.
            ABSORBABLE | ABSORBABLE_BRANCH set on A.
            B and END get no flags -> absorption stops once past A.

Absorbability is determined per-element, not per-pattern.
Absorption comparison is performed only when a state resides at an
element with the RPR_ELEM_ABSORBABLE flag. Once a state leaves the
flagged region, absorption is permanently disabled for that state.

Through this mechanism, the runtime guarantees monotonicity:
"a context that started earlier always subsumes a context that
started later."


Chapter V  NFA Runtime Data Structures
================================================================================

V-1. RPRNFAState -- NFA State

A single NFA state represents "how far the pattern has progressed."

  Field         Description
  ---------------------------------------------------------
  elemIdx       Index of the current pattern element
  counts[]      Repetition count per group depth
  isAbsorbable  Whether the state is in an absorbable region
  next          Next state in the linked list

The size of the counts array is (maxDepth + 1), allocated as a flexible
array member at the end of the struct.

Example: In PATTERN ((A B)+ C), a state waiting for B in the 3rd iteration

  Element array: [0:BEGIN(d0) 1:A(d1) 2:B(d1) 3:END(d0) 4:C(d0) 5:FIN]

  elemIdx = 2 (B, depth 1)
  counts[0] = 2 (depth 0: depth of END. Group completed 2 iterations)
  counts[1] = 1 (depth 1: depth of B. A matched in current iteration)

  Counts are indexed by depth, not by elemIdx.
  counts[0] is incremented when passing through END(depth 0),
  and the group repetition count is preserved even when
  the state is at B(depth 1).

Definition of two states being "equal":

  Two states are equal if they have the same elemIdx and the same counts
  up to the depth of that element.
  nfa_states_equal() compares counts[0..elem->depth] using memcmp.
  Only counts at or below the depth of the current element are meaningful.

V-2. RPRNFAContext -- Matching Context

A single context represents "a matching attempt started from a specific
start row."

  Field                 Description
  ---------------------------------------------------------
  states                Linked list of active NFA states
  matchStartRow         Row number where matching started
  matchEndRow           Row number where matching completed (-1 if incomplete)
  lastProcessedRow      Last row processed
  matchedState          State that reached FIN (for greedy fallback)
  hasAbsorbableState    Whether this context can absorb other contexts
  allStatesAbsorbable   Whether this context can be absorbed
  next, prev            Doubly-linked list

Since the NFA is nondeterministic, multiple states can coexist
simultaneously within a single context.

Example: In PATTERN (A | B) C, if the first row matches both A and B,
two states coexist within the context:

  State 1: elemIdx=3 (waiting for C, via branch A)
  State 2: elemIdx=3 (waiting for C, via branch B)

In this case, since the (elemIdx, counts) of the two states are equal,
nfa_add_state_unique() retains only State 1 (branch A), which was
added first.
Because DFS processes the first branch of ALT first, the state via A
is registered first, and the state via B is discarded as a duplicate.
This is the preferment guarantee.

V-3. RPR Fields of WindowAggState

  nfaContext / nfaContextTail   Doubly-linked list of active contexts
  nfaContextFree                Reuse pool for contexts
  nfaStateFree                  Reuse pool for states
  nfaVarMatched                 Per-row cache: varMatched[varId]
  nfaVisitedElems               Bitmap for cycle detection
  nfaStateSize                  Precomputed size of RPRNFAState

Memory management:

  States and contexts are managed through their own free lists.
  Instead of palloc, they are obtained from the reuse pool, and
  returned to the pool upon deallocation.
  This reduces the overhead of frequent allocation/deallocation.


Chapter VI  NFA Execution: 3-Phase Model
================================================================================

VI-1. Entry Point and Overall Flow

When the window function processes each row, row_is_in_reduced_frame()
is called. This function determines whether the current row belongs to
a matched frame, and if necessary, calls update_reduced_frame() to
drive the NFA.

Flow of update_reduced_frame():

  (1) Find or create a context for the target row
  (2) Enter the row processing loop
  (3) After the loop ends, record the result in reduced_frame_map

Pseudocode of the row processing loop:

  targetCtx = nfa_get_head_context(pos)
  if targetCtx == NULL:
      targetCtx = nfa_start_context(pos)

  for currentPos = startPos ...
      if not nfa_evaluate_row(currentPos):  -- row does not exist
          nfa_finalize_all_contexts()       -- finalize all contexts
          break

      nfa_process_row(currentPos)           -- 3-phase processing
      nfa_start_context(currentPos + 1)     -- pre-create next start point
      nfa_cleanup_dead_contexts()           -- remove dead contexts

      if targetCtx->states == NULL:         -- target context exhausted
          break

Key point: Processing a single row may require processing multiple rows
ahead. Due to the nature of window functions, determining the frame for
row N requires looking at rows beyond N.

VI-2. Context Creation: nfa_start_context()

Creates a new context and performs the initial advance.

  (1) Allocate context via nfa_context_alloc()
  (2) Set matchStartRow = pos
  (3) Create initial state: elemIdx=0 (first pattern element), counts=all zero
  (4) Call nfa_advance(initialAdvance=true)

The initial advance expands epsilon transitions at the beginning of
the pattern. For example, the initial advance for PATTERN ((A | B) C):

  Start: elemIdx=0 (ALT)
    -> Expand ALT branches
      -> elemIdx=1 (A) -- VAR, so add state; stop here
      -> elemIdx=2 (B) -- VAR, so add state; stop here

  Result: Two states in the context {waiting for A, waiting for B}

During the initial advance, reaching FIN is not recorded as a match.
This is to prevent empty matches (zero-length matches).

VI-3. Row Evaluation: nfa_evaluate_row()

Evaluates all variable conditions in the DEFINE clause at once for
the current row.

  for each defineClause[i]:
      result = ExecEvalExpr(defineClause[i])
      varMatched[i] = (not null and true)

To support row navigation operators such as PREV() and NEXT(),
the previous row, current row, and next row are set in separate slots:

  ecxt_scantuple  = previous row (for PREV reference)
  ecxt_outertuple = current row  (default reference)
  ecxt_innertuple = next row     (for NEXT reference)

The varMatched array is referenced later in Phase 1 (Match).

VI-4. nfa_process_row(): 3-Phase Processing

NFA processing for a single row is divided into three phases:

  +--------------------------------------------+
  |  Phase 1: MATCH (convergence)              |
  |  Compare the current row against each VAR  |
  |  state. Remove states that fail to match.  |
  |                                            |
  |  Phase 2: ABSORB (absorption)              |
  |  Merge duplicate contexts to prevent       |
  |  state explosion.                          |
  |                                            |
  |  Phase 3: ADVANCE (expansion)              |
  |  Expand epsilon transitions to prepare     |
  |  for the next row.                         |
  +--------------------------------------------+

This ordering is important:

  - Match executes first to "consume the current row."
  - Absorb executes immediately after Match, when states have been updated.
  - Advance executes last to prepare "states waiting for the next row."


Chapter VII  Phase 1: Match
================================================================================

nfa_match() iterates through each state in the context:

  (1) Check whether the state's elemIdx is a VAR element
  (2) Compare against the current row using nfa_eval_var_match()
  (3) Match success: increment repetition count, retain state
  (4) Match failure: remove state

Match determination (nfa_eval_var_match):

  If varId is within the range of defineVariableList:
      Use the value of varMatched[varId]

  If varId exceeds the range (variable not defined in DEFINE):
      Unconditionally true (matches all rows)

Immediate advance for simple VARs:

  For a VAR with min=1, max=1 where the next element is END,
  the Match phase processes through END immediately.
  This is necessary for accurate state comparison in Phase 2 (Absorb).

  Example: In PATTERN ((A B)+), when A matches, it immediately advances
  to B, and when B matches, it immediately advances through END to
  complete the group count. This enables absorption comparison with
  other contexts.


Chapter VIII  Phase 2: Absorb (Context Absorption)
================================================================================

VIII-1. Problem

In the current implementation, a new context is started for each row processed.
Applying PATTERN (A+) to 10 rows produces 10 contexts,
each of which tracks state independently.

If there are N rows, the total number of states becomes O(N^2):

  Context 1 (started at row 1): can match A up to N times
  Context 2 (started at row 2): can match A up to N-1 times
  ...
  Context N (started at row N): can match A 1 time

VIII-2. Solution: Context Absorption

Key observation: a context started earlier contains
all matches of a later-started context (monotonicity principle).

If Context 1 started at row 1 and matched A 5 times,
the state where Context 2 (started at row 2) matched A 4 times
is already contained within Context 1.

Therefore Context 2 can be "absorbed" into Context 1.

VIII-3. Absorption Conditions

  (1) The pattern is marked as isAbsorbable (see IV-5)
  (2) allStatesAbsorbable of the target context is true
  (3) An earlier context "covers" all states of the target

Cover condition (nfa_states_covered):

  A state with the same elemIdx exists in the earlier context,
  and the count at that depth is greater than or equal -- then it is covered.

VIII-4. Dual-Flag Design

Two boolean flags make the absorption decision efficient:

  hasAbsorbableState (monotonic: only true->false transition possible)
    "Does this context have the ability to absorb other contexts?"
    true if at least one absorbable state exists.
    Transitions to false when states are removed leaving no absorbable states.
    Once false, it never becomes true again.

  allStatesAbsorbable (dynamic: can fluctuate)
    "Can this context be absorbed?"
    true if all states are in an absorbable region.
    Becomes false when a non-absorbable state is added; reverts to true when it is removed.

VIII-5. Absorption Order

nfa_absorb_contexts() traverses from tail (newest) to head (oldest).

  for ctx = tail to head:
      if ctx.allStatesAbsorbable:
          for older = ctx.prev to head:
              if older.hasAbsorbableState:
                  if nfa_states_covered(older, ctx):
                      free(ctx)  -- absorbed
                      break

Since inspection starts from the newest context, the most recently started
(= having the shortest match) context is absorbed first.


Chapter IX  Phase 3: Advance (Epsilon Transition Expansion)
================================================================================

IX-1. Overview

nfa_advance() expands epsilon transitions from each state after Match,
generating "new states waiting for the next row."

An epsilon transition is a transition that moves without consuming a row:

  - ALT: branch to each alternative
  - BEGIN: enter group (or skip if min=0)
  - END: loop-back within group (or exit when condition is met)
  - FIN: record match completion
  - VAR loop/exit: repeat/exit according to the quantifier

Expansion stops upon reaching a VAR element, and the state is added.
This is because VAR is the element that "will consume the next row."

IX-2. Processing Order: DFS and Preferment

advance processes states in lexicographic order,
performing Depth-First Search (DFS) on each state.

This DFS order is what guarantees the SQL standard's "preferment":

  The branch that appears first in the PATTERN text takes precedence.

Example: PATTERN (A | B) C

  The first branch A of the ALT takes precedence over the second branch B.
  When both A and B can match, the match via A is selected.

nfa_add_state_unique() prevents duplicate addition of the same state,
so the state added first (= from the preferred branch) is retained.

IX-3. Routing Function: nfa_route_to_elem()

All inter-element transitions in the advance phase go through nfa_route_to_elem().
This function branches its behavior based on the type of the next element:

  If the next element is VAR:
    (1) Add the state to the context (nfa_add_state_unique)
    (2) If the VAR has min=0, also add a skip path (recurse via next)
    -> Expansion stops here (VAR is the element that "will consume the next row")

  If the next element is non-VAR (ALT, BEGIN, END, FIN):
    -> Recursively call nfa_advance_state() to continue expansion

With this structure, advance recursively follows epsilon transitions
until reaching a VAR, consistently stopping only at VAR elements.

IX-4. Per-Element advance Behavior

(a) ALT (nfa_advance_alt)

  Upon encountering an ALT element, all branches are expanded in order.
  The first element of each branch is connected via a jump pointer.

  idx=0 (ALT) -> branch 1 start (next) -> branch 2 start (jump) -> ...

  nfa_advance_state() is recursively called for each branch.

(b) BEGIN (nfa_advance_begin)

  Handles group entry.
  jump points to the element after END (= first element outside the group).

  Greedy (default):
    (1) Enter the group body (move via next, reset the count at that depth)
    (2) If min=0, also add a group skip path (move via jump)

  Reluctant:
    Order reversed -- skip path first, group entry second.
    If the skip path reaches FIN, the group entry path is not generated
    (shortest match preferred).

(c) END (nfa_advance_end)

  Handles group termination. This is the core of the repetition logic.

  Let count be the count at the current depth:

  count < min:
    Loop-back (move via jump, repeat the group body)

    If the RPR_ELEM_EMPTY_LOOP flag is set:
      In addition to loop-back, also add a fast-forward exit path.
      This is because the body may produce an empty match, causing count
      to never reach min. fast-forward resets counts[depth] to 0
      and exits via next (treating the remaining required iterations
      as empty matches).

  min <= count < max:
    Greedy: loop-back first, exit second
    Reluctant: exit first, loop-back second
              If the exit path reaches FIN, loop-back is omitted.

  count >= max:
    Unconditional exit (move via next)

  On exit: reset counts[depth] = 0, and if the next element is an outer END,
  increment the count at the outer depth.

(d) VAR (nfa_advance_var)

  Handles repeat/exit for a VAR element with a quantifier.

  Let count be the count at the current depth:

  count < min:
    Unconditional loop (stay at the same elemIdx, wait for the next row)

  min <= count < max:
    Greedy: loop first, exit (next) second
    Reluctant: exit first, loop second
              If the exit path reaches FIN, loop is omitted.

  count >= max:
    Unconditional exit (move via next)

  On exit: reset counts[depth] = 0.

(e) FIN

  Match success. The current state is moved to matchedState for recording,
  and matchEndRow is set to the current row.

  Upon reaching FIN, all remaining unprocessed states are removed
  (early termination). By DFS order, the path that reached FIN first
  has the highest preferment, so the rest are inferior paths.
  This is the core mechanism that guarantees preferment.

  In SKIP PAST LAST ROW mode, upon reaching FIN, subsequent contexts
  that started within the match range are immediately pruned.

IX-5. State Deduplication: nfa_add_state_unique()

When adding a new state to a context, it is compared against existing states;
if an identical state already exists, it is not added.

Comparison criteria: elemIdx + counts[0..elem->depth] (see V-1)

This deduplication is the core mechanism that suppresses NFA state explosion.
Because DFS order causes preferred-branch states to be added first,
identical states from lower-priority branches are automatically discarded.

IX-6. Cycle Detection: nfaVisitedElems

When a group body can produce an empty match (zero consumption),
looping back from END may cause an infinite loop.

Example: PATTERN ((A?)*)

  A? has min=0, so it can pass through without matching.
  If the outer group repeats: BEGIN -> A? skip -> END -> BEGIN -> ...

To prevent this:

  (1) At compile time: set the RPR_ELEM_EMPTY_LOOP flag on the END
      of groups whose body is nullable.
      The runtime effect of this flag is described in IX-4(c):
      when count < min, a fast-forward exit path is added,
      resolving the deadlock where count cannot increase due to empty matches.

  (2) At runtime: initialize the nfaVisitedElems bitmap immediately before
      DFS expansion of each state within advance (once per state).
      During DFS, set the corresponding elemIdx bit when visiting each element.
      If a previously visited elemIdx is revisited, that path is terminated.

  Note: the bitmap tracks only elemIdx and does not consider counts.
  Therefore, legitimate revisits to the same elemIdx but with different counts
  may also be blocked. This only occurs when the group body is nullable
  (all paths can match empty), causing END -> loop-back -> skip -> END
  within a single DFS. In such cases the END element has the
  RPR_ELEM_EMPTY_LOOP flag, so the fast-forward exit (IX-4(c)) provides
  an alternative path that bypasses the cycle.


Chapter X  Match Result Processing
================================================================================

X-1. Reduced Frame Map

RPR match results are recorded in a byte array called reduced_frame_map.
One byte is allocated per row, and the value is one of the following:

  RF_NOT_DETERMINED (0)  Not yet processed
  RF_FRAME_HEAD     (1)  Start row of the match
  RF_SKIPPED        (2)  Interior row of the match (skipped in frame)
  RF_UNMATCHED      (3)  Match failure

The window function references this map to determine frame inclusion for each row.

X-2. AFTER MATCH SKIP

Determines the starting point for the next match attempt after a successful match:

  SKIP TO NEXT ROW:
    New match attempt begins from the row after the match start row.
    Overlapping matches are possible.

  SKIP PAST LAST ROW:
    New match attempt begins from the row after the match end row.
    Only non-overlapping matches are possible.

X-3. INITIAL vs SEEK

  Standard definition (section 6.12):
  INITIAL: "is used to look for a match whose first row is R."
  SEEK:    "is used to permit a search for the first match anywhere
           from R through the end of the full window frame."
  In either case, if there is no match, the reduced window frame is empty.
  The default is INITIAL.

  Current implementation:
  SEEK is not supported (the parser raises an error).
  Only INITIAL is supported, searching only for matches starting at each
  row position pos.


Chapter XI  Worked Example: Full Execution Trace
================================================================================

XI-1. Query

  SELECT company, tdate, price,
         first_value(price) OVER w AS start_price,
         last_value(price) OVER w AS end_price
  FROM stock
  WINDOW w AS (
    PARTITION BY company
    ORDER BY tdate
    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
    AFTER MATCH SKIP PAST LAST ROW
    PATTERN (A+ B)
    DEFINE A AS price > PREV(price),
           B AS price < PREV(price)
  );

XI-2. Data

  Row#    tdate       price
  --------------------------
  0       2024-01-01  100
  1       2024-01-02  110
  2       2024-01-03  120
  3       2024-01-04  115
  4       2024-01-05  130

XI-3. Compilation Result

  PATTERN (A+ B) -> unchanged after optimization

  idx  varId  depth  min  max  next  jump
  ------------------------------------------
   0   A(0)   0      1    INF  1     -1     A+
   1   B(1)   0      1    1    2     -1     B
   2   FIN    0      1    1    -1    -1

  DEFINE: A -> "price > PREV(price)", B -> "price < PREV(price)"
  isAbsorbable = true (A+ is a simple unbounded VAR)

XI-4. Execution Trace

--- Row 0 (price=100) ---

  update_reduced_frame(0) called.

  Context C0 created (matchStartRow=0).
  Initial advance: elemIdx=0(A) -> VAR, so state is added.
  C0.states = [{elemIdx=0, counts=[0]}]

  nfa_evaluate_row(0):
    A: price(100) > PREV(price) -> no PREV -> false
    B: price(100) < PREV(price) -> no PREV -> false
    varMatched = [false, false]

  nfa_process_row(0):
    Phase 1 (Match): A(0) state vs varMatched[0]=false -> state removed
    C0.states = [] (empty)

    Phase 2 (Absorb): skipped (no states)
    Phase 3 (Advance): skipped (no states)

  C0.states is empty, so the loop terminates.
  matchEndRow < matchStartRow -> RF_UNMATCHED.
  register_reduced_frame_map(0, RF_UNMATCHED).

--- Row 1 (price=110) ---

  update_reduced_frame(1) called.

  Context C1 created (matchStartRow=1).
  Initial advance: C1.states = [{elemIdx=0, counts=[0]}]

  nfa_evaluate_row(1):
    A: 110 > PREV(100) -> true
    B: 110 < PREV(100) -> false
    varMatched = [true, false]

  nfa_process_row(1):
    Phase 1 (Match): A(0) match succeeds -> counts[0]++ -> counts=[1]
    C1.states = [{elemIdx=0, counts=[1]}]

    Phase 3 (Advance):
      State {elemIdx=0, counts=[1]}: A+ (min=1, count=1, max=INF)
        count >= min, so:
        Greedy -> loop first: keep {elemIdx=0, counts=[1]}
                  exit: reset counts[0]=0, next(=1) -> {elemIdx=1, counts=[0]}
    C1.states = [{elemIdx=0, counts=[1]}, {elemIdx=1, counts=[0]}]

--- Row 2 (price=120) ---

  Context C2 created (matchStartRow=2).
  Initial advance: C2.states = [{elemIdx=0, counts=[0]}]

  nfa_evaluate_row(2):
    A: 120 > PREV(110) -> true
    B: 120 < PREV(110) -> false
    varMatched = [true, false]

  C1 nfa_process_row(2):
    Phase 1 (Match):
      {elemIdx=0, counts=[1]}: A matches -> counts=[2]
      {elemIdx=1, counts=[0]}: B does not match -> removed
    C1.states = [{elemIdx=0, counts=[2]}]

  C2 nfa_process_row(2):
    Phase 1 (Match):
      {elemIdx=0, counts=[0]}: A matches -> counts=[1]
    C2.states = [{elemIdx=0, counts=[1]}]

    Phase 2 (Absorb):
      Does C1 (started earlier) cover C2?
        C1: {elemIdx=0, counts=[2]}, C2: {elemIdx=0, counts=[1]}
        Same elemIdx, C1.counts >= C2.counts -> covered
      C2 absorbed. -> removed.

    Phase 3 (Advance):
      {elemIdx=0, counts=[2]}: Greedy -> loop + exit
        Loop: {elemIdx=0, counts=[2]}
        Exit: reset counts[0]=0, next(=1) -> {elemIdx=1, counts=[0]}
    C1.states = [{elemIdx=0, counts=[2]}, {elemIdx=1, counts=[0]}]

  Context C3 created (matchStartRow=3).

--- Row 3 (price=115) ---

  nfa_evaluate_row(3):
    A: 115 > PREV(120) -> false
    B: 115 < PREV(120) -> true
    varMatched = [false, true]

  nfa_process_row(3):
    Phase 1 (Match):
      {elemIdx=0, counts=[2]}: A does not match -> removed
      {elemIdx=1, counts=[0]}: B matches -> counts=[1]
    C1.states = [{elemIdx=1, counts=[1]}]

    Phase 3 (Advance):
      {elemIdx=1, counts=[1]}: B (min=1, max=1)
        count(1) >= max(1) -> unconditional exit
        Reset counts[0]=0, next = 2 (FIN)
      FIN reached -> matchEndRow = 3, matchedState recorded.
      Early termination: no remaining states, so completed immediately.
    C1.states = [] (empty after reaching FIN)

  C1.states is empty and matchEndRow=3 >= matchStartRow=1 -> match succeeds.

  register_reduced_frame_map(1, RF_FRAME_HEAD)
  register_reduced_frame_map(2, RF_SKIPPED)
  register_reduced_frame_map(3, RF_SKIPPED)

--- Row 4 (price=130) ---

  update_reduced_frame(4) called.
  C3 was already created but matchStartRow=3, so it is not applicable.
  New context C4 created (matchStartRow=4).

  nfa_evaluate_row(4):
    A: 130 > PREV(115) -> true
    B: 130 < PREV(115) -> false

  ... No subsequent rows, so nfa_finalize_all_contexts() is called.
  Match incomplete -> RF_UNMATCHED.

XI-5. Final Result

  Row 0: RF_UNMATCHED  -> frame = the row itself
  Row 1: RF_FRAME_HEAD -> frame = rows 1 through 3
  Row 2: RF_SKIPPED    -> inside row 1's match
  Row 3: RF_SKIPPED    -> inside row 1's match
  Row 4: RF_UNMATCHED  -> frame = the row itself


Chapter XII  Summary of Key Design Decisions
================================================================================

XII-1. Flat Array vs Tree-Based NFA

  Choice: Flat array (RPRPatternElement[])

  Rationale:
  - Cache-friendly: 16-byte fixed size, contiguous memory
  - Index-based references: 2-byte indices instead of pointers
  - Easy to serialize: can use memcpy when passing to plan nodes

XII-2. Forward-only Execution vs Backtracking

  Choice: Forward-only (state set tracking)

  Rationale:
  - Backtracking takes exponential time in the worst case
  - NFA simulation guarantees polynomial time
  - DFS order naturally guarantees preferment.
    Greedy/reluctant per quantifier requires only reversing the DFS order
  - Window functions receive sorted rows sequentially.
    Forward-only fits directly into this pipeline,
    whereas backtracking requires re-fetching previous rows
  - DEFINE conditions are SQL expressions (PREV, RUNNING aggregates, etc.)
    with high re-evaluation cost. Forward-only requires only one evaluation
    per row

XII-3. Per-Context Management

  Choice: Independent context per start row

  Rationale:
  - Supports overlapping matches under SKIP TO NEXT ROW
  - Determines the frame for each row independently
  - Absorption optimization can eliminate redundant contexts in O(n)

XII-4. Memory Pool Management

  Choice: Custom free list

  Rationale:
  - NFA states are created and destroyed in large numbers per row
  - Avoids palloc/pfree overhead
  - State size is variable (counts[] array), but within a single query
    maxDepth is fixed, so all states have the same size

XII-5. Execution Optimization Summary

  The following optimizations make the NFA simulation practical.

  -- Compile-time --

  (1) AST Optimization (IV-3)

    Simplifies the AST before converting the pattern to an NFA.
    Reduces the number of NFA elements through consecutive variable
    merging (A A -> A{2}), SEQ flattening, quantifier multiplication,
    and other transformations.

    Significance: Reducing the element count directly shrinks the state
    space, decreasing the cost of all subsequent runtime phases (match,
    absorb, advance).

  -- Runtime: advance phase --

  (2) Group Skip (IX-4(b))

    At the BEGIN of a group with min=0, uses jump to skip the entire
    group. Moves directly to the first element outside the group without
    exploring the group body. Greedy enters then skips; Reluctant skips
    then enters.

    Significance: For optional groups (min=0), immediately generates
    a skip path without exploring the body, avoiding unnecessary DFS
    expansion.

  (3) State Deduplication (IX-5)

    During advance, DFS may generate states with the same (elemIdx,
    counts) combination through multiple paths. Additionally, unlike
    VAR repetition, group repetition cannot perform absorption
    comparison using VAR states, so inline advance is performed from
    after Phase 1 match through to END; this process can also produce
    duplicate states reaching the same END.
    nfa_add_state_unique() blocks duplicate addition of identical states
    in both cases.

    Significance: Prevents exponential growth of the state count in
    ALT branches and quantifier expansion. Since DFS order causes the
    preferred branch's state to be registered first, identical states
    from lower-priority branches are automatically discarded, thereby
    also guaranteeing preferment.

  (4) Cycle Detection and Fast-Forward (IX-6, IX-4(c))

    When a nullable group body (e.g., A?) repeats empty matches,
    the END -> BEGIN loop-back can continue indefinitely.

    Two mechanisms resolve this:
    - A visited bitmap (nfaVisitedElems) blocks revisitation of the
      same element, preventing infinite loops (safety)
    - At an END with the RPR_ELEM_EMPTY_LOOP flag set, when
      count < min, the remaining required iterations are treated as
      empty matches and a fast-forward exit path out of the group is
      added (correctness)

    Significance: Cycle detection guarantees termination, and
    fast-forward guarantees that the min condition is satisfied.
    Without these, patterns containing nullable groups would fall
    into infinite loops or fail to match.

  (5) Match Pruning (IX-4(e))

    When a state reaches FIN during advance, all remaining unprocessed
    states of that context are removed. Because of DFS order, the path
    that reaches FIN first has the highest preferment, so the remaining
    paths are inferior.

    Significance: Once the best match is determined, exploration of
    inferior paths is immediately terminated. This mechanism achieves
    both preferment guarantees and performance optimization.

  -- Runtime: inter-context --

  (6) Early Termination (SKIP PAST LAST ROW)

    In SKIP PAST LAST ROW mode, when a match is found, subsequent
    contexts whose start rows fall within the match range are pruned
    immediately without further processing.
    In SKIP TO NEXT ROW mode, overlapping contexts are preserved
    because each row requires its own independent match.

    Significance: Prunes subsequent contexts whose start rows overlap
    with a prior match range, avoiding unnecessary processing.

  (7) Context Absorption (Chapter VIII)

    If an independent context is created for each row, O(n^2) states
    accumulate. By exploiting the monotonicity that an earlier-started
    context subsumes the states of a later-started context, redundant
    contexts are eliminated early.

    Absorbability is determined per-element; comparison is performed
    only at elements with the RPR_ELEM_ABSORBABLE flag (see IV-5).

    Significance: Keeps the number of active contexts at a constant
    level, achieving O(n^2) -> O(n) time complexity. Without this,
    performance degrades sharply on long partitions.


Appendix A. Key Function Index
================================================================================

  Function                       File                  Role
  ----------------------------------------------------------------------
  transformRPR                   parse_rpr.c           Parser entry point
  transformDefineClause          parse_rpr.c           DEFINE transformation
  collectPatternVariables        rpr.c                 Variable collection
  filterDefineClause             rpr.c                 DEFINE filtering
  buildRPRPattern                rpr.c                 NFA compilation main
  optimizeRPRPattern             rpr.c                 AST optimization
  fillRPRPattern                 rpr.c                 NFA element generation
  finalizeRPRPattern             rpr.c                 Finalization
  computeAbsorbability           rpr.c                 Absorption analysis
  update_reduced_frame           nodeWindowAgg.c       Execution main loop
  nfa_start_context              nodeWindowAgg.c       Context creation
  nfa_process_row                nodeWindowAgg.c       3-phase processing
  nfa_evaluate_row               nodeWindowAgg.c       DEFINE evaluation
  nfa_match                      nodeWindowAgg.c       Phase 1
  nfa_absorb_contexts            nodeWindowAgg.c       Phase 2
  nfa_advance                    nodeWindowAgg.c       Phase 3
  nfa_advance_state              nodeWindowAgg.c       Per-state branching
  nfa_route_to_elem              nodeWindowAgg.c       Element routing
  nfa_advance_alt                nodeWindowAgg.c       ALT handling
  nfa_advance_begin              nodeWindowAgg.c       BEGIN handling
  nfa_advance_end                nodeWindowAgg.c       END handling
  nfa_advance_var                nodeWindowAgg.c       VAR handling
  nfa_add_state_unique           nodeWindowAgg.c       Deduplication
  nfa_states_covered             nodeWindowAgg.c       Absorption coverage check


Appendix B. Data Structure Relationship Diagram
================================================================================

  Parser Layer
  --------
  RPCommonSyntax
    |--- rpSkipTo: RPSkipTo
    |--- initial: bool
    +--- rpPattern: RPRPatternNode* (tree)
         |--- nodeType: VAR | SEQ | ALT | GROUP
         |--- min, max: quantifier
         |--- varName: variable name (VAR only)
         +--- children: List* (SEQ/ALT/GROUP only)

  Planner Layer
  ----------
  WindowAgg (plan node)
    |--- rpSkipTo: RPSkipTo
    |--- defineClause: List<TargetEntry>
    +--- rpPattern: RPRPattern*
         |--- numVars: int
         |--- varNames: char**
         |--- maxDepth: RPRDepth
         |--- isAbsorbable: bool
         |--- numElements: int
         +--- elements: RPRPatternElement[]  (flat array)
              |--- varId      (1B)
              |--- depth      (1B)
              |--- flags      (1B)
              |--- reserved   (1B)
              |--- min, max   (4B + 4B)
              +--- next, jump (2B + 2B)

  Executor Layer
  ----------
  WindowAggState
    |--- rpPattern: RPRPattern* (copied from plan)
    |--- defineClauseList: List<ExprState>
    |--- nfaVarMatched: bool[] (per-row cache)
    |--- nfaVisitedElems: bitmapword* (cycle detection)
    |--- nfaContext <-> nfaContextTail (doubly-linked list)
    |   +--- RPRNFAContext
    |       |--- states: RPRNFAState* (linked list)
    |       |   |--- elemIdx
    |       |   |--- counts[]
    |       |   +--- isAbsorbable
    |       |--- matchStartRow, matchEndRow
    |       |--- matchedState (cloned on FIN arrival)
    |       |--- hasAbsorbableState
    |       +--- allStatesAbsorbable
    |--- nfaContextFree (recycling pool)
    +--- nfaStateFree (recycling pool)


Appendix C. NFA Element Array Examples
================================================================================

C-1. PATTERN (A B C)

  idx  varId  depth  min  max  next  jump
  ------------------------------------------
   0   A      0      1    1    1     -1
   1   B      0      1    1    2     -1
   2   C      0      1    1    3     -1
   3   FIN    0      1    1    -1    -1

C-2. PATTERN (A+ B*)

  idx  varId  depth  min  max  next  jump  flags
  -------------------------------------------------
   0   A      0      1    INF  1     -1    ABSORBABLE | ABSORBABLE_BRANCH
   1   B      0      0    INF  2     -1
   2   FIN    0      1    1    -1    -1

  Only A+ is the absorption point (Case 1). Once past A,
  absorption is permanently disabled for that state.

C-3. PATTERN (A | B | C)

  idx  varId  depth  min  max  next  jump
  ------------------------------------------
   0   ALT    0      1    1    1     -1     alternation start
   1   A      1      1    1    4     2      branch 1 -> FIN, jump -> branch 2
   2   B      1      1    1    4     3      branch 2 -> FIN, jump -> branch 3
   3   C      1      1    1    4     -1     branch 3 -> FIN
   4   FIN    0      1    1    -1    -1

C-4. PATTERN ((A B)+ C)

  idx  varId    depth  min  max  next  jump  flags
  -----------------------------------------------------------
   0   BEGIN    0      1    INF  1     4     ABSORBABLE_BRANCH
   1   A        1      1    1    2     -1    ABSORBABLE_BRANCH
   2   B        1      1    1    3     -1    ABSORBABLE_BRANCH
   3   END      0      1    INF  4     1     ABSORBABLE | ABSORBABLE_BRANCH
   4   C        0      1    1    5     -1
   5   FIN      0      1    1    -1    -1

  Case 2: GROUP+ with {1,1} body VARs. A, B are branches;
  END is the absorption point. Compare with C-6 (Case 3).

C-5. PATTERN ((A | B)+? C)

  idx  varId    depth  min  max   next  jump  flags
  ---------------------------------------------------
   0   BEGIN    0      1    INF   1     5     RELUCTANT, group start
   1   ALT      1      1    1     2     -1    alternation start
   2   A        2      1    1     4     3     branch 1
   3   B        2      1    1     4     -1    branch 2
   4   END      0      1    INF   5     1     RELUCTANT, group end
   5   C        0      1    1     6     -1
   6   FIN      0      1    1     -1    -1

C-6. PATTERN ((A+ B)+ C)  -- Absorbability flag example

  idx  varId    depth  min  max   next  jump  flags
  ----------------------------------------------------------
   0   BEGIN    0      1    INF   1     4     ABSORBABLE_BRANCH, group start
   1   A        1      1    INF   2     -1    ABSORBABLE | ABSORBABLE_BRANCH
   2   B        1      1    1     3     -1
   3   END      0      1    INF   4     1     group end
   4   C        0      1    1     5     -1
   5   FIN      0      1    1     -1    -1

  Recurses from BEGIN into the body -> A matches Case 1 (simple VAR+).
  A gets ABSORBABLE | ABSORBABLE_BRANCH, BEGIN gets ABSORBABLE_BRANCH.
  B and END get no flags -> absorption stops once the state advances to B.
  (See IV-5 Case 3)

C-7. PATTERN ((A+ B | C*)+ D)  -- Per-branch absorption in ALT

  idx  varId    depth  min  max   next  jump  flags
  ----------------------------------------------------------
   0   BEGIN    0      1    INF   1     6     ABSORBABLE_BRANCH
   1   ALT      1      1    1     2     -1    ABSORBABLE_BRANCH
   2   A        2      1    INF   3     4     ABSORBABLE | ABSORBABLE_BRANCH
   3   B        2      1    1     5     -1
   4   C        2      0    INF   5     -1    ABSORBABLE | ABSORBABLE_BRANCH
   5   END      0      1    INF   6     1     EMPTY_LOOP
   6   D        0      1    1     7     -1
   7   FIN      0      1    1     -1    -1

  ALT branches are checked independently for absorbability.
  Branch 1: A+ matches Case 1 -> A gets ABSORBABLE. B has no flag.
  Branch 2: C* matches Case 1 -> C gets ABSORBABLE.
  Both A and C get ABSORBABLE_BRANCH as part of their respective branch paths.
  END has EMPTY_LOOP: branch 2 (C*) is nullable, making the group body nullable.
  BEGIN and ALT get ABSORBABLE_BRANCH (on the path to absorbable elements).


================================================================================
  End of document
================================================================================

Best regards,
Henson
Attachment

pgsql-hackers by date:

Previous
From: Jim Jones
Date:
Subject: Re: COMMENTS are not being copied in CREATE TABLE LIKE
Next
From: Alvaro Herrera
Date:
Subject: Re: pgstat include expansion