Re: Row pattern recognition - Mailing list pgsql-hackers

From Henson Choi
Subject Re: Row pattern recognition
Date
Msg-id CAAAe_zAKAGKpK9iHx3ZSuG59sP93r5dfootqv5tCfaMt=w6wzA@mail.gmail.com
Whole thread
In response to Re: Row pattern recognition  (Henson Choi <assam258@gmail.com>)
List pgsql-hackers
Hi hackers,


I'd like to propose an extension to the context absorption
optimization for Row Pattern Recognition in the window clause:
fixed-length iteration group absorption. This builds on the
two-flag absorption design [1] and three-phase execution
framework [2], and is a sibling optimization to PREFIX pattern
absorption [3].

The existing framework requires group children to have exactly
{1,1} quantifiers. This relaxes the constraint to accept any
fixed-length quantifier (min == max), allowing patterns like
(A B{2})+ and (A (B{2} C){4})+ to benefit from absorption
with no runtime changes.

PREFIX absorption [3] handles fixed-length elements before
the unbounded repetition; this handles fixed-length elements
inside the unbounded group.


[1] Context absorption design - two-flag approach
https://www.postgresql.org/message-id/CAAAe_zCmjDRf9u19xRRbtzq%2B-7ujqadZk0izz0mkef2j%2BWpBBA%40mail.gmail.com
[2] Three-phase absorption framework (Match -> Absorb -> Advance)
https://www.postgresql.org/message-id/CAAAe_zAGLOF_qporKahnL8ajSu_eXZYsJN2M1cEeF5n5GWSNdQ%40mail.gmail.com
[3] PREFIX pattern absorption
https://www.postgresql.org/message-id/CAAAe_zDq7R8CaDfmh8C%2BH3_639Y5LtJD%2B2Z%3D1txDt%3DvaOr90rQ%40mail.gmail.com
[4] FIRST/LAST navigation design note
https://www.postgresql.org/message-id/CAAAe_zCUrKGBgZdaazdO_v9QWHsS_1DXuP%3DrLeNhO3iwsHwwbg%40mail.gmail.com


Design Note: Fixed-Length Iteration Group Absorption
=====================================================


1. Problem Definition
=====================


1.1 Current Absorption Scope

Context absorption currently requires that children of an unbounded
group have exactly {1,1} quantifiers. This means only simple patterns
like (A B)+ qualify for absorption.

Patterns like (A B{2})+ or (A (B{2} C){4})+ are excluded even though
each iteration consumes a fixed number of rows.


1.2 The Fixed-Length Constraint

The existing {1,1} check is overly restrictive. The actual requirement
for absorption is that each iteration of the group consumes a fixed
number of rows, so that the count-dominance property holds: an older
context always has a repeat count >= any newer context at the same
element.

Any group where all children have min == max (recursively) satisfies
this property, regardless of nesting depth or quantifier values.


2. Relationship to the Existing Framework
==========================================


2.1 Position in the Three-Phase Execution Model

The Match -> Absorb -> Advance model [2] is preserved as-is. This
optimization only changes compile-time eligibility analysis in the
two-flag framework [1]; the Absorb phase logic is unchanged.


2.2 Prerequisites

The same four prerequisites as existing absorption apply:

1. Greedy quantifier -- the outer group must be greedy.
2. SKIP TO PAST LAST ROW -- prior matches suppress subsequent
   start points.
3. UNBOUNDED FOLLOWING -- all contexts see the same future rows.
4. Shared DEFINE evaluation -- all contexts produce the same
   DEFINE result for a given row [4].


2.3 Extension of the Dominance Order

The existing dominance condition is unchanged:

    C_old and C_new are in the absorbable region of the same
    element, and C_old.counts >= C_new.counts.

The only change is which patterns are recognized as having an
absorbable region. The runtime count comparison works identically
regardless of the step size per iteration.


3. Correctness Argument
========================


A fixed-length quantifier (min == max) is semantically equivalent
to repeating the element that many times. Therefore any pattern
with fixed-length children can be conceptually unrolled to {1,1}
elements:

  (A B{2})+           -> (A B B)+
  (A (B{2} C){4})+    -> (A B B C B B C B B C B B C)+
  ((A{2} B{3}){2})+   -> (A A B B B A A B B B)+


The unrolled form contains only {1,1} VARs inside the unbounded
group, which is exactly the existing Case 2 that is already proven
correct for absorption.

This optimization recognizes fixed-length groups at compile time
without actually unrolling them. The runtime behavior is identical
to the unrolled form: each iteration consumes a fixed number of
rows, count-dominance holds, and absorption proceeds unchanged.


4. Compile-Time Analysis
=========================


4.1 Change to isUnboundedStart()

Currently, Case 2 in isUnboundedStart() requires all children of
an unbounded group to be simple {1,1} VARs. The relaxation:

- VAR children: accept min == max (any value)
- BEGIN children (nested subgroups): recursively verify that all
  descendants have min == max, including the subgroup's own END
  quantifier
- ALT children: reject (unchanged)

The check follows the existing depth-based traversal of the flat
element array.


4.2 Flat Array Example

Pattern: (A (B{2} C){4})+

  [0] BEGIN depth=0  min=1 max=INF   <- outer group
  [1] A     depth=1  min=1 max=1     <- 1==1 OK
  [2] BEGIN depth=1  min=4 max=4     <- subgroup
  [3] B     depth=2  min=2 max=2     <- 2==2 OK
  [4] C     depth=2  min=1 max=1     <- 1==1 OK
  [5] END   depth=1  min=4 max=4     <- 4==4 OK
  [6] END   depth=0  min=1 max=INF   <- unbounded, greedy


Verification traverses [1]-[5] confirming min==max at every
level. [6] is the outer END: unbounded and greedy, so absorption
applies.


4.3 Runtime: No Changes

The runtime absorption logic (nfa_try_absorb_context,
nfa_states_covered) and flag assignment (RPR_ELEM_ABSORBABLE,
RPR_ELEM_ABSORBABLE_BRANCH) require no changes.


5. Worked Example
==================


Pattern: (A B{2})+
Step size = 1 + 2 = 3.

Data: rows match in repeating A, B, B cycle.
Contexts that fail A at B-rows die immediately (omitted).

Row  Data  C_old             C_new            Active
r0   A     A                 -                1
r1   B     B(cnt=1)          -                1
r2   B     B(cnt=2)->END     -                1
           count=1, loops
r3   A     A                 A                2
r4   B     B(cnt=1)          B(cnt=1)         2
r5   B     B(cnt=2)->END     B(cnt=2)->END    2
           count=2            count=1
           absorb: 2 >= 1 -> C_new absorbed
           C_old loops                        1
r6   A     A                 A                2
r7   B     B(cnt=1)          B(cnt=1)         2
r8   B     B(cnt=2)->END     B(cnt=2)->END    2
           count=3            count=1
           absorb: 3 >= 1 -> C_new absorbed
           C_old loops                        1

Absorption occurs at END (the ABSORBABLE judgment point),
where both contexts synchronize after completing one full
iteration. The older context has a strictly higher group
count and absorbs the newer one.

Between END points, both contexts progress in lockstep
at the same pattern positions. Maximum concurrent contexts
during this phase = 2. Overall complexity remains O(n).


6. Non-Qualifying Patterns
===========================


Patterns where any child has min != max do NOT qualify:

  (A B+ C)+    -- B+ has min=1, max=INF -> NOT fixed
  (A B{2,5})+  -- B has min=2, max=5 -> NOT fixed
  (A B?)+      -- B? has min=0, max=1 -> NOT fixed


These patterns have variable step sizes per iteration, so
count-dominance is not guaranteed. They remain non-absorbable
under this optimization.

pgsql-hackers by date:

Previous
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: Adding REPACK [concurrently]
Next
From: Lukas Fittl
Date:
Subject: Re: pg_plan_advice