Re: Row pattern recognition - Mailing list pgsql-hackers
From | Tatsuo Ishii |
---|---|
Subject | Re: Row pattern recognition |
Date | |
Msg-id | 20240930.090751.1732676018799092095.ishii@postgresql.org Whole thread Raw |
In response to | Re: Row pattern recognition (Tatsuo Ishii <ishii@postgresql.org>) |
Responses |
Re: Row pattern recognition
|
List | pgsql-hackers |
>> While playing with the feature, I've been trying to identify runs of >> matched rows by eye. But it's pretty difficult -- the best I can do is >> manually count rows using a `COUNT(*) OVER ...`. So I'd like to >> suggest that MEASURES be part of the eventual v1 feature, if there's >> no other way to determine whether a row was skipped by a previous >> match. (That was less obvious to me before the fix in v17.) > > I think implementing MEASURES is challenging. Especially we need to > find how our parser accepts "colname OVER > window_definition". Currently PostgreSQL's parser only accepts "func() > OVER window_definition" Even it is technically possible, I think the > v1 patch size will become much larger than now due to this. > > How about inventing new window function that returns row state instead? > > - match found (yes/no) > - skipped due to AFTER MATCH SKIP PAST LAST ROW (no match) Please disregard my proposal. Even if we make such a function, it would always return NULL for unmatched rows or skipped rows, and I think the function does not solve your problem. However, I wonder if supporting MEASURES solves the problem either because any columns defined by MEASURES will return NULL except the first row in a reduced frame. Can you please show an example how to identify runs of matched rows using MEASURES? > For the rest of the mail I need more time to understand. I will reply > back after studying it. For now, I just want to thank you for the > valuable information! > >> -- >> >> I've been working on an implementation [1] of SQL/RPR's "parenthesized >> language" and preferment order. (These are defined in SQL/Foundation >> 2023, section 9.41.) The tool gives you a way to figure out, for a >> given pattern, what matches are supposed to be attempted and in what >> order: >> >> $ ./src/test/modules/rpr/rpr_prefer "a b? a" >> ( ( a ( b ) ) a ) >> ( ( a ( ) ) a ) >> >> Many simple patterns result in an infinite set of possible matches. So >> if you use an unbounded quantifiers, you have to also use --max-rows >> to limit the size of the hypothetical window frame: >> >> $ ./src/test/modules/rpr/rpr_prefer --max-rows 2 "^ PERMUTE(a*, b+)? $" >> ( ( ^ ( ( ( ( ( ( a ) ( b ) ) ) - ) ) ) ) $ ) >> ( ( ^ ( ( ( ( ( ( ) ( b b ) ) ) - ) ) ) ) $ ) >> ( ( ^ ( ( ( ( ( ( ) ( b ) ) ) - ) ) ) ) $ ) >> ( ( ^ ( ( ( - ( ( ( b b ) ( ) ) ) ) ) ) ) $ ) >> ( ( ^ ( ( ( - ( ( ( b ) ( a ) ) ) ) ) ) ) $ ) >> ( ( ^ ( ( ( - ( ( ( b ) ( ) ) ) ) ) ) ) $ ) >> ( ( ^ ( ) ) $ ) I wonder how Oracle solves the problem (an infinite set of possible matches) without using "--max-rows" or something like that because in my understanding Oracle supports the regular expressions and PERMUTE. >> I've found this useful to check my personal understanding of the spec >> and the match behavior, but it could also potentially be used to >> generate test cases, or to help users debug their own patterns. For >> example, a pattern that has a bunch of duplicate sequences in its PL >> is probably not very well optimized: >> >> $ ./src/test/modules/rpr/rpr_prefer --max-rows 4 "a+ a+" >> ( ( a a a ) ( a ) ) >> ( ( a a ) ( a a ) ) >> ( ( a a ) ( a ) ) >> ( ( a ) ( a a a ) ) >> ( ( a ) ( a a ) ) >> ( ( a ) ( a ) ) >> >> And patterns with catastrophic backtracking behavior tend to show a >> "sawtooth" pattern in the output, with a huge number of potential >> matches being generated relative to the number of rows in the frame. >> >> My implementation is really messy -- it leaks memory like a sieve, and >> I cannibalized the parser from ECPG, which just ended up as an >> exercise in teaching myself flex/bison. But if there's interest in >> having this kind of tool in the tree, I can work on making it >> reviewable. Either way, I should be able to use it to double-check >> more complicated test cases. I definitely am interested in the tool! >> A while back [2], you were wondering whether our Bison implementation >> would be able to parse the PATTERN grammar directly. I think this tool >> proves that the answer is "yes", but PERMUTE in particular causes a >> shift/reduce conflict. To fix it, I applied the same precedence >> workaround that we use for CUBE and ROLLUP. That's a good news! Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
pgsql-hackers by date: