BTW, I have tested RPR with large partition/reduced frame (up to 100k
rows). Following query took over 3 minutes on my laptop.
The profiling results are very insightful and help us understand where
the time is spent in extreme cases.
I don't try to say we should enhance nfa modules in the case when
there is a large reduced frame, because I don't think this is a
realistic use case and I doubt it's worth to fight against unrealistic
scenario.
I agree with your assessment. Let me clarify:
The pattern itself (START UP+) is realistic - it's a common pattern for
detecting upward trends. However, 100k+ consecutive matches without
interruption is not realistic. Your UP{,3} example (94ms for 100k partition)
demonstrates that the current implementation performs very well for realistic
use cases.
I think this is more realistic use case than former. If I were
correct, we don't need to work on current nfa code to squeeze better
performance for unrealistic 100k reduced frame case. I maybe wrong
and I would love to hear from others especially, Henson, Vik, Jacob.
I agree that we should prioritize realistic use cases. That said, there are
potential optimizations that could help:
1. Anchored pattern absorption (see my earlier message:
https://www.postgresql.org/message-id/CAAAe_zAEg7sVM%3DWDwXMyE-odGmQyXSVi5ZzWgye6SupSjdMKpg%40mail.gmail.com)
2. Alt-pruning: In patterns like "A* | B*", once the higher-priority A branch
has a confirmed match, the B branch can be pruned immediately. Even if B
could continue extending to a longer match, it can never be selected due to
lexical order semantics—A will always win. This proactive pruning respects
SQL standard semantics while reducing unnecessary state expansion.
However, given the complexity of NFA internals, I believe we should take a
step-by-step approach:
1. First, stabilize the current RPR patch and prepare it for review
2. Then, consider optimizations as separate follow-up patches
3. Each optimization should be well-tested and reviewed independently
This approach reduces risk and makes review more manageable. The fact that
the current implementation handles even unrealistic 100k-row cases without
crashing (just slowly) shows it's already robust.
What do you think about this phased approach?
Best regards,
Henson
P.S. I discovered a crash bug that was introduced in the latest patch
refactoring. The issue occurs with nested alternation patterns like
(A+ | (A | B)+)*, where infinite recursion happens in nfa_advance_alt when
the inner BEGIN(+)'s skip jump is followed as an ALT branch pointer. I will
include the fix in the next patch update.