Re: Row pattern recognition - Mailing list pgsql-hackers
| From | Tatsuo Ishii |
|---|---|
| Subject | Re: Row pattern recognition |
| Date | |
| Msg-id | 20260112.194507.364121646583109399.ishii@postgresql.org Whole thread Raw |
| In response to | Re: Row pattern recognition (Henson Choi <assam258@gmail.com>) |
| Responses |
Re: Row pattern recognition
|
| List | pgsql-hackers |
Hi Henson,
> Hi Ishii-san,
>
> I've been working on the pattern grammar and AST for row pattern
> recognition.
> The attached patch extends the PATTERN clause to support more regex-like
> syntax.
>
> The main additions are:
>
> - Alternation: (A | B)
> - Grouping with quantifiers: (A B)+, (A | B)*
> - Full quantifier support: +, *, ?, {n}, {n,}, {n,m}
> - Reluctant quantifiers: +?, *?, ?? (parsed but not executed yet)
Excellent!
> I've introduced RPRPatternNode to represent the pattern as a Thompson-style
> AST with four node types: VAR, SEQ, ALT, and GROUP. Each node stores min/max
> bounds for quantifiers and a reluctant flag. This should make the eventual
> NFA implementation more straightforward.
>
> The parser also does some basic pattern optimizations. For example:
> - (A (B C)) gets flattened to (A B C)
> - ((A)) unwraps to just A
> - (A{2}){3} becomes A{6}
> - (A | B | A) simplifies to (A | B)
> - A A A merges into A{3,3}
>
> I also updated ruleutils.c to deparse patterns properly.
>
>
> Current state:
>
> The executor still uses regex matching, which works but has some
> limitations.
> Complex patterns with lots of alternations can be slow.
>
> The 26-variable limit is still there since we're mapping to [a-z] for the
> regex.
>
> Reluctant quantifiers (+?, *?, ??) are parsed but will raise an error if you
> try to use them. They'll need proper NFA support to work correctly.
>
> Some optimization code was removed in the process, but it should work better
> once NFA matching is in place.
Yeah. Now the rpr regression takes more than 7 seconds. (with current
v37 patch it takes 86ms). I expect NFA matching works better.
> Next steps:
>
> The plan is to replace regex matching with NFA-based execution. The AST
> structure is already set up for that.
Sounds promising.
> The first step is converting the Thompson AST into a flat representation -
> flattening pattern elements and building actual NFA states with transitions,
> then replacing the regex engine with direct NFA matching. This should
> handle all
> the pattern syntax properly, including reluctant quantifiers.
>
> Once that's working, the next step is handling multiple contexts during
> matching -
> tracking which states are active and absorbing context as we go through
> rows.
> This is essential for proper pattern matching behavior.
Ok.
> After that, I'll implement PATH and CLASSIFIER functions, which depend on
> having
> the match context available.
What is PATH function? It's not in R010 or R020 as far as I know.
> Then there's room for various optimizations - PATH space optimization,
> incremental aggregate computation over matched rows, better state merging,
> etc.
Sounds good.
> Longer-term, the goal is to get to MATCH_RECOGNIZE in the FROM clause for
> full SQL standard compliance.
What about MEASURES clause? Without it, many things in the standard
cannot be implemented.
> Let me know if you have any concerns or suggestions about the approach.
Here are some comments on your patches.
- It is based on my v36 patches. But the latest one is v37 patch. It
would be nice if you create patches based on the my latest patches.
- You are doing some optimization (like (A (B C)) gets flattened to (A
B C) etc.) in the parse analysis phase. I think doing that kind of
optimization should be done in the optimizer/planner. Because with
your patch a deparsed query (as shown by pg_get_viewdef()) looks
different from what user inputs.
- If you add files to src/backend/parser, it should be noted in
src/backend/parser/README.
- It would be noce to update typedefs patch if you add new typedefs.
Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
pgsql-hackers by date: