Hi Hackers,
I’m writing to leave a brief note on State Pruning, a potential optimization for the RPR executor.
To be clear, this is a separate concern from both context absorption and state merging. While state merging handles identical states within a context, and context absorption handles dominance between different contexts, State Pruning focuses on discarding active NFA states that have become irrelevant due to lexical priority rules.
The core idea is to prune individual states by comparing them with the matchedState already secured. If a high-priority branch has already recorded a match, any active states in lower-priority branches may become "dead wood"—even if they are currently matching, they can never supersede the existing winner.
Detailed Example: A+ (prio 0) | B+ (prio 1) Consider an input where both A and B are TRUE for consecutive rows:
Row 1 (A=TRUE, B=TRUE):
A+ matches and records a matchedState (length 1, priority 0).
B+ also matches (length 1, priority 1) and remains in ctx->states.
Pruning Point: Since a priority 0 match is already secured, B+ (priority 1) can never win, regardless of how much longer it matches. We can prune the B+ state here.
Row 2 (A=TRUE, B=TRUE):
Without pruning: The engine evaluates both A+ and B+.
With pruning: Only A+ is evaluated. matchedState is updated to length 2.
Row 3 (A=FALSE, B=FALSE):
A+ fails. The final result is the matchedState from Row 2 (A+ with length 2).
As shown, State Pruning can significantly reduce the NFA workload by stopping redundant branches early. However, implementing this correctly will require substantial research and careful analysis, especially when dealing with the complex interplay between greedy/reluctant quantifiers and branch convergence. We must ensure that pruning a state won't lead to incorrect results in more complex scenarios.
I am documenting this as a future research item. For now, my focus remains on stabilizing the core RPR implementation and context absorption.
Regards,
Henson
>> Shall I fix them (and post v40 patches), or would you like to fix and
>> post another patch?
>>
>
> Yes, please fix them and post v40. Thanks!
Got it.
BTW, I found IGNORE NULLS module (ignorenulls_getfuncarginframe) needs
to be fixed. The module is used when a window function is called with
IGNORE NULLS option. Currently PostgreSQL ignores the option if RPR is
used because ignorenulls_getfuncarginframe does not call
row_is_in_reduced_frame(). I will bring in the fix into v40.
Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp