Home > mailing lists

Re: Row pattern recognition - Mailing list pgsql-hackers

From	Henson Choi
Subject	Re: Row pattern recognition
Date	February 9 11:03:23
Msg-id	CAAAe_zALWNDOcTMCOk71d3FD5MHQLiCsL-US5ok=7ZoX=xtHFg@mail.gmail.com Whole thread Raw
In response to	Re: Row pattern recognition (Tatsuo Ishii <ishii@postgresql.org>)
Responses	Re: Row pattern recognition
List	pgsql-hackers

Tree view

Hi Tatsuo,

Thank you for the thorough testing and performance profiling!

BTW, I have tested RPR with large partition/reduced frame (up to 100k
rows). Following query took over 3 minutes on my laptop.

The profiling results are very insightful and help us understand where

the time is spent in extreme cases.

I don't try to say we should enhance nfa modules in the case when
there is a large reduced frame, because I don't think this is a
realistic use case and I doubt it's worth to fight against unrealistic
scenario.

I agree with your assessment. Let me clarify:

The pattern itself (START UP+) is realistic - it's a common pattern for
detecting upward trends. However, 100k+ consecutive matches without
interruption is not realistic. Your UP{,3} example (94ms for 100k partition)
demonstrates that the current implementation performs very well for realistic

use cases.

I think this is more realistic use case than former. If I were
correct, we don't need to work on current nfa code to squeeze better
performance for unrealistic 100k reduced frame case. I maybe wrong
and I would love to hear from others especially, Henson, Vik, Jacob.

I agree that we should prioritize realistic use cases. That said, there are
potential optimizations that could help:

1. Anchored pattern absorption (see my earlier message:
https://www.postgresql.org/message-id/CAAAe_zAEg7sVM%3DWDwXMyE-odGmQyXSVi5ZzWgye6SupSjdMKpg%40mail.gmail.com)

2. Alt-pruning: In patterns like "A* | B*", once the higher-priority A branch
has a confirmed match, the B branch can be pruned immediately. Even if B
could continue extending to a longer match, it can never be selected due to
lexical order semantics—A will always win. This proactive pruning respects
SQL standard semantics while reducing unnecessary state expansion.

However, given the complexity of NFA internals, I believe we should take a
step-by-step approach:

1. First, stabilize the current RPR patch and prepare it for review
2. Then, consider optimizations as separate follow-up patches
3. Each optimization should be well-tested and reviewed independently

This approach reduces risk and makes review more manageable. The fact that
the current implementation handles even unrealistic 100k-row cases without
crashing (just slowly) shows it's already robust.

What do you think about this phased approach?

Best regards,
Henson

P.S. I discovered a crash bug that was introduced in the latest patch
refactoring. The issue occurs with nested alternation patterns like
(A+ | (A | B)+)*, where infinite recursion happens in nfa_advance_alt when
the inner BEGIN(+)'s skip jump is followed as an ALT branch pointer. I will

include the fix in the next patch update.

pgsql-hackers by date:

From: Michael Paquier
Date: 09 February, 10:59:13
Subject: Re: Add expressions to pg_restore_extended_stats()

From: Andrew Kim
Date: 09 February, 11:42:06
Subject: Re: Proposal for enabling auto-vectorization for checksum calculations

Re: Row pattern recognition - Mailing list pgsql-hackers

Previous

Next