Home > mailing lists

Re: Row pattern recognition - Mailing list pgsql-hackers

From	Henson Choi
Subject	Re: Row pattern recognition
Date	April 4 10:31:13
Msg-id	CAAAe_zAKAGKpK9iHx3ZSuG59sP93r5dfootqv5tCfaMt=w6wzA@mail.gmail.com Whole thread
In response to	Re: Row pattern recognition (Henson Choi <assam258@gmail.com>)
List	pgsql-hackers

Tree view

Hi hackers,

I'd like to propose an extension to the context absorption
optimization for Row Pattern Recognition in the window clause:
fixed-length iteration group absorption. This builds on the
two-flag absorption design [1] and three-phase execution
framework [2], and is a sibling optimization to PREFIX pattern
absorption [3].

The existing framework requires group children to have exactly
{1,1} quantifiers. This relaxes the constraint to accept any
fixed-length quantifier (min == max), allowing patterns like
(A B{2})+ and (A (B{2} C){4})+ to benefit from absorption
with no runtime changes.

PREFIX absorption [3] handles fixed-length elements before
the unbounded repetition; this handles fixed-length elements
inside the unbounded group.

[1] Context absorption design - two-flag approach
https://www.postgresql.org/message-id/CAAAe_zCmjDRf9u19xRRbtzq%2B-7ujqadZk0izz0mkef2j%2BWpBBA%40mail.gmail.com
[2] Three-phase absorption framework (Match -> Absorb -> Advance)
https://www.postgresql.org/message-id/CAAAe_zAGLOF_qporKahnL8ajSu_eXZYsJN2M1cEeF5n5GWSNdQ%40mail.gmail.com
[3] PREFIX pattern absorption
https://www.postgresql.org/message-id/CAAAe_zDq7R8CaDfmh8C%2BH3_639Y5LtJD%2B2Z%3D1txDt%3DvaOr90rQ%40mail.gmail.com
[4] FIRST/LAST navigation design note
https://www.postgresql.org/message-id/CAAAe_zCUrKGBgZdaazdO_v9QWHsS_1DXuP%3DrLeNhO3iwsHwwbg%40mail.gmail.com

Design Note: Fixed-Length Iteration Group Absorption
=====================================================

1. Problem Definition
=====================

1.1 Current Absorption Scope

Context absorption currently requires that children of an unbounded
group have exactly {1,1} quantifiers. This means only simple patterns
like (A B)+ qualify for absorption.

Patterns like (A B{2})+ or (A (B{2} C){4})+ are excluded even though
each iteration consumes a fixed number of rows.

1.2 The Fixed-Length Constraint

The existing {1,1} check is overly restrictive. The actual requirement
for absorption is that each iteration of the group consumes a fixed
number of rows, so that the count-dominance property holds: an older
context always has a repeat count >= any newer context at the same
element.

Any group where all children have min == max (recursively) satisfies
this property, regardless of nesting depth or quantifier values.

2. Relationship to the Existing Framework
==========================================

2.1 Position in the Three-Phase Execution Model

The Match -> Absorb -> Advance model [2] is preserved as-is. This
optimization only changes compile-time eligibility analysis in the
two-flag framework [1]; the Absorb phase logic is unchanged.

2.2 Prerequisites

The same four prerequisites as existing absorption apply:

1. Greedy quantifier -- the outer group must be greedy.
2. SKIP TO PAST LAST ROW -- prior matches suppress subsequent
start points.
3. UNBOUNDED FOLLOWING -- all contexts see the same future rows.
4. Shared DEFINE evaluation -- all contexts produce the same
DEFINE result for a given row [4].

2.3 Extension of the Dominance Order

The existing dominance condition is unchanged:

C_old and C_new are in the absorbable region of the same
element, and C_old.counts >= C_new.counts.

The only change is which patterns are recognized as having an
absorbable region. The runtime count comparison works identically
regardless of the step size per iteration.

3. Correctness Argument
========================

A fixed-length quantifier (min == max) is semantically equivalent
to repeating the element that many times. Therefore any pattern
with fixed-length children can be conceptually unrolled to {1,1}
elements:

(A B{2})+ -> (A B B)+
(A (B{2} C){4})+ -> (A B B C B B C B B C B B C)+
((A{2} B{3}){2})+ -> (A A B B B A A B B B)+

The unrolled form contains only {1,1} VARs inside the unbounded
group, which is exactly the existing Case 2 that is already proven
correct for absorption.

This optimization recognizes fixed-length groups at compile time
without actually unrolling them. The runtime behavior is identical
to the unrolled form: each iteration consumes a fixed number of
rows, count-dominance holds, and absorption proceeds unchanged.

4. Compile-Time Analysis
=========================

4.1 Change to isUnboundedStart()

Currently, Case 2 in isUnboundedStart() requires all children of
an unbounded group to be simple {1,1} VARs. The relaxation:

- VAR children: accept min == max (any value)
- BEGIN children (nested subgroups): recursively verify that all
descendants have min == max, including the subgroup's own END
quantifier
- ALT children: reject (unchanged)

The check follows the existing depth-based traversal of the flat
element array.

4.2 Flat Array Example

Pattern: (A (B{2} C){4})+

[0] BEGIN depth=0 min=1 max=INF <- outer group
[1] A depth=1 min=1 max=1 <- 1==1 OK
[2] BEGIN depth=1 min=4 max=4 <- subgroup
[3] B depth=2 min=2 max=2 <- 2==2 OK
[4] C depth=2 min=1 max=1 <- 1==1 OK
[5] END depth=1 min=4 max=4 <- 4==4 OK
[6] END depth=0 min=1 max=INF <- unbounded, greedy

Verification traverses [1]-[5] confirming min==max at every
level. [6] is the outer END: unbounded and greedy, so absorption
applies.

4.3 Runtime: No Changes

The runtime absorption logic (nfa_try_absorb_context,
nfa_states_covered) and flag assignment (RPR_ELEM_ABSORBABLE,
RPR_ELEM_ABSORBABLE_BRANCH) require no changes.

5. Worked Example
==================

Pattern: (A B{2})+
Step size = 1 + 2 = 3.

Data: rows match in repeating A, B, B cycle.
Contexts that fail A at B-rows die immediately (omitted).

Row Data C_old C_new Active
r0 A A - 1
r1 B B(cnt=1) - 1
r2 B B(cnt=2)->END - 1
count=1, loops
r3 A A A 2
r4 B B(cnt=1) B(cnt=1) 2
r5 B B(cnt=2)->END B(cnt=2)->END 2
count=2 count=1
absorb: 2 >= 1 -> C_new absorbed
C_old loops 1
r6 A A A 2
r7 B B(cnt=1) B(cnt=1) 2
r8 B B(cnt=2)->END B(cnt=2)->END 2
count=3 count=1
absorb: 3 >= 1 -> C_new absorbed
C_old loops 1

Absorption occurs at END (the ABSORBABLE judgment point),
where both contexts synchronize after completing one full
iteration. The older context has a strictly higher group
count and absorbs the newer one.

Between END points, both contexts progress in lockstep
at the same pattern positions. Maximum concurrent contexts
during this phase = 2. Overall complexity remains O(n).

6. Non-Qualifying Patterns
===========================

Patterns where any child has min != max do NOT qualify:

(A B+ C)+ -- B+ has min=1, max=INF -> NOT fixed
(A B{2,5})+ -- B has min=2, max=5 -> NOT fixed
(A B?)+ -- B? has min=0, max=1 -> NOT fixed

These patterns have variable step sizes per iteration, so
count-dominance is not guaranteed. They remain non-absorbable
under this optimization.

pgsql-hackers by date:

From: "Hayato Kuroda (Fujitsu)"
Date: 04 April, 09:20:35
Subject: RE: Adding REPACK [concurrently]

From: Lukas Fittl
Date: 04 April, 11:11:25
Subject: Re: pg_plan_advice

Re: Row pattern recognition - Mailing list pgsql-hackers

Previous

Next