From 597ccd24e405949e5d320eee0cb1f7262ad3068e Mon Sep 17 00:00:00 2001 From: Tatsuo Ishii Date: Sat, 2 May 2026 13:40:29 +0900 Subject: [PATCH v47 6/9] Row pattern recognition patch (docs). --- doc/src/sgml/advanced.sgml | 145 ++++++++++++++++++++++++++++- doc/src/sgml/func/func-window.sgml | 121 ++++++++++++++++++++++++ doc/src/sgml/ref/select.sgml | 91 +++++++++++++++++- 3 files changed, 350 insertions(+), 7 deletions(-) diff --git a/doc/src/sgml/advanced.sgml b/doc/src/sgml/advanced.sgml index 3286c2cf0b2..11c2416df51 100644 --- a/doc/src/sgml/advanced.sgml +++ b/doc/src/sgml/advanced.sgml @@ -552,13 +552,150 @@ WHERE pos < 3; two rows for each department). + + Row Pattern Common Syntax can be used to perform Row Pattern Recognition + in a query. The Row Pattern Common Syntax includes two sub + clauses: DEFINE + and PATTERN. DEFINE defines + row pattern variables along with an expression. The expression must be a + logical expression, which means it must + return TRUE, FALSE + or NULL. The expression may comprise column references + and functions. Window functions, aggregate functions and subqueries are + not allowed. An example of DEFINE is as follows. + + +DEFINE + LOWPRICE AS price <= 100, + UP AS price > PREV(price), + DOWN AS price < PREV(price) + + + Note that PREV returns the price + column in the previous row if it's called in a context of row pattern + recognition. Thus in the second line the row pattern variable "UP" + is TRUE when the price column in the current row is + greater than the price column in the previous row. Likewise, "DOWN" + is TRUE when the + price column in the current row is lower than + the price column in the previous row. + + + Once DEFINE exists, PATTERN can be + used. PATTERN defines a sequence of rows that satisfies + conditions defined in the DEFINE clause. For example + the following PATTERN defines a sequence of rows starting + with a row satisfying "LOWPRICE", then one or more rows satisfying + "UP" and finally one or more rows satisfying "DOWN". Pattern variables can + be followed by quantifiers: "+" means one or more matches, "*" means zero + or more matches, "?" means zero or one match, "{n}" (n > 0) means exactly + n matches, "{n,}" (n >= 0) means at least n matches, "{,m}" (m > 0) means + at most m matches, and "{n,m}" (0 <= n <= m, 0 < m) means between n and m + matches. Patterns can be grouped using parentheses and combined using + alternation (the vertical bar "|" for OR). For example, "(UP DOWN)+" + matches one or more repetitions of UP followed by DOWN. If a sequence of + rows which satisfies the PATTERN is found, in the starting row all columns + or functions are shown in the target list. Note that aggregations only + look into the matched rows, rather than the whole frame. On the second or + subsequent rows all window functions are shown as NULL. Aggregates on + non-starting rows return their initial value: for example, + count() returns 0 and sum() + returns NULL. For rows that do not match the PATTERN, columns are shown + as NULL too. Example of a SELECT using + the DEFINE and PATTERN clause is as + follows. + + +SELECT company, tdate, price, + first_value(price) OVER w, + max(price) OVER w, + count(price) OVER w +FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + INITIAL + PATTERN (LOWPRICE UP+ DOWN+) + DEFINE + LOWPRICE AS price <= 100, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + + + company | tdate | price | first_value | max | count +----------+------------+-------+-------------+-----+------- + company1 | 2023-07-01 | 100 | 100 | 200 | 4 + company1 | 2023-07-02 | 200 | | | 0 + company1 | 2023-07-03 | 150 | | | 0 + company1 | 2023-07-04 | 140 | | | 0 + company1 | 2023-07-05 | 150 | | | 0 + company1 | 2023-07-06 | 90 | 90 | 130 | 4 + company1 | 2023-07-07 | 110 | | | 0 + company1 | 2023-07-08 | 130 | | | 0 + company1 | 2023-07-09 | 120 | | | 0 + company1 | 2023-07-10 | 130 | | | 0 +(10 rows) + + + + + Row Pattern Recognition internally uses a nondeterministic finite + automaton (NFA) to match patterns. For patterns with unbounded + quantifiers (e.g., A+ or (A B)+), + the NFA may need to track many active matching contexts simultaneously, + which could potentially lead to O(n2) + complexity as the number of rows increases. + + + + Before execution, PostgreSQL automatically + optimizes patterns to simplify their structure. This includes flattening + nested sequences and alternations, merging consecutive identical variables + (e.g., A{2,3} A{1,2} becomes A{3,5}), + removing duplicate alternatives + (e.g., (A | B | A) becomes (A | B)), + and simplifying nested quantifiers + (e.g., (A*)* becomes A*). + These optimizations reduce pattern complexity and also decrease + nesting depth, making the 253-level depth limit rarely encountered. + They are applied transparently and can be observed + in EXPLAIN output. + + + + To mitigate the O(n2) complexity described + above, PostgreSQL also employs + a context absorption optimization. When a pattern starts with a greedy + unbounded element, newer matching contexts cannot produce longer matches + than older contexts. By detecting and eliminating these redundant + contexts, the matching complexity is reduced from + O(n2) to O(n) for many common patterns. + + + + When examining query plans for Row Pattern Recognition with + EXPLAIN, the pattern output may include special + markers that indicate optimization opportunities. A double quote + " marks where pattern absorption can occur, + and a single quote ' marks absorbable elements + within a branch. For example, a+" indicates that + repeated matches of a can be absorbed, while + (a' b')+" shows that both a + and b within the group are absorbable. + These markers are primarily useful for understanding internal + optimization behavior. + + When a query involves multiple window functions, it is possible to write out each one with a separate OVER clause, but this is - duplicative and error-prone if the same windowing behavior is wanted - for several functions. Instead, each windowing behavior can be named - in a WINDOW clause and then referenced in OVER. - For example: + duplicative and error-prone if the same windowing behavior is wanted for + several functions. Instead, each windowing behavior can be named in + a WINDOW clause and then referenced + in OVER. For example: SELECT sum(salary) OVER w, avg(salary) OVER w diff --git a/doc/src/sgml/func/func-window.sgml b/doc/src/sgml/func/func-window.sgml index bcf755c9ebc..d1da105ad0f 100644 --- a/doc/src/sgml/func/func-window.sgml +++ b/doc/src/sgml/func/func-window.sgml @@ -278,6 +278,127 @@ nth_value. + + Row Pattern Recognition navigation functions are listed in + . These functions + can be used to describe the DEFINE clause of Row Pattern Recognition. + + + + Row Pattern Navigation Functions + + + + + Function + + + Description + + + + + + + + + prev + + prev ( value anyelement [, offset bigint ] ) + anyelement + + + Returns value evaluated at the row that is + offset rows before the current row within + the partition; + returns NULL if the target row is outside the partition. + offset defaults to 1 if omitted. + offset must be a non-negative integer; + an offset of 0 refers to the current row itself. + offset must not be NULL. + Can only be used in a DEFINE clause. + + + + + + + next + + next ( value anyelement [, offset bigint ] ) + anyelement + + + Returns value evaluated at the row that is + offset rows after the current row within + the partition; + returns NULL if the target row is outside the partition. + offset defaults to 1 if omitted. + offset must be a non-negative integer; + an offset of 0 refers to the current row itself. + offset must not be NULL. + Can only be used in a DEFINE clause. + + + + + + + first + + first ( value anyelement [, offset bigint ] ) + anyelement + + + Returns value evaluated at the row that is + offset rows after the match start row; + returns NULL if the target row is beyond the current row. + offset defaults to 0 if omitted, referring to the + match start row itself. + offset must be a non-negative integer. + offset must not be NULL. + Can only be used in a DEFINE clause. + + + + + + + last + + last ( value anyelement [, offset bigint ] ) + anyelement + + + Returns value evaluated at the row that is + offset rows before the current row within + the match; + returns NULL if the target row is before the match start row. + offset defaults to 0 if omitted, referring to the + current row itself. + offset must be a non-negative integer. + offset must not be NULL. + Can only be used in a DEFINE clause. + + + + + +
+ + + PREV and NEXT may wrap + FIRST or LAST for compound + navigation. For example, + PREV(FIRST(val, 2), 3) fetches the value at + 3 rows before the row that is 2 rows after the match start. + The reverse nesting (FIRST/LAST + wrapping PREV/NEXT) is not + permitted. Same-category nesting (e.g., + PREV inside PREV) is also + prohibited. + + The SQL standard defines a FROM FIRST or FROM LAST diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index 09b6ce809bb..5272d6c0bfa 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -1022,8 +1022,8 @@ WINDOW window_name AS ( frame_clause can be one of -{ RANGE | ROWS | GROUPS } frame_start [ frame_exclusion ] -{ RANGE | ROWS | GROUPS } BETWEEN frame_start AND frame_end [ frame_exclusion ] +{ RANGE | ROWS | GROUPS } frame_start [ frame_exclusion ] [ row_pattern_common_syntax ] +{ RANGE | ROWS | GROUPS } BETWEEN frame_start AND frame_end [ frame_exclusion ] [ row_pattern_common_syntax ] where frame_start @@ -1130,9 +1130,94 @@ EXCLUDE NO OTHERS a given peer group will be in the frame or excluded from it. + + The + optional row_pattern_common_syntax + defines the Row Pattern Recognition condition for + this + window. row_pattern_common_syntax + includes the following subclauses. + + +[ { AFTER MATCH SKIP PAST LAST ROW | AFTER MATCH SKIP TO NEXT ROW } ] +[ INITIAL | SEEK ] +PATTERN ( pattern_variable_name [ quantifier ] [ ... ] ) +DEFINE definition_variable_name AS expression [, ...] + + AFTER MATCH SKIP PAST LAST ROW or AFTER MATCH + SKIP TO NEXT ROW controls how to proceed to the next row position + after a match is found. With AFTER MATCH SKIP PAST LAST + ROW (the default) the next row position is next to the last row of + the previous match. On the other hand, with AFTER MATCH SKIP TO NEXT + ROW the next row position is next to the first row of the previous + match. INITIAL or SEEK specifies from + which row in the frame pattern matching begins. + If INITIAL is specified, the match must start + from the first row in the frame. If SEEK is specified, + the set of matching rows does not necessarily start from the first row. The + default is INITIAL. Currently + only INITIAL is supported. DEFINE + defines definition variables along with a boolean + expression. PATTERN defines a sequence of rows that + satisfies certain conditions using variables defined + in the DEFINE clause (an empty PATTERN() + is not supported). Each pattern variable can be followed by a quantifier + to specify how many times it should match: + * (zero or more), + + (one or more), + ? (zero or one), + {n} (exactly n times, n > 0), + {n,} (at least n times, n >= 0), + {,m} (at most m times, m > 0), or + {n,m} + (between n and m times, 0 <= n <= m, 0 < m). + Reluctant quantifiers (e.g., *?, +?, + ??, {n,m}?) + are supported. + The exclusion ({- and -}) + is not supported. + Patterns can be grouped using parentheses, and alternation (OR) can be + expressed using the vertical bar |. + For example, (A B)+ matches one or more repetitions + of the sequence A followed by B, and A | B matches + either A or B. + If a pattern variable is not defined in + the DEFINE clause, it is not automatically added + to the DEFINE clause. Instead, the executor evaluates + the variable as TRUE at execution time, behaving as if + the following definition existed. + + +variable_name AS TRUE + + + Conversely, variables defined in the DEFINE clause + but not used in the PATTERN clause are filtered out + during query planning. + + + + Note that the maximum number of unique pattern variables + used in the PATTERN clause is 251. + If this limit is exceeded, an error will be raised. + Additionally, the maximum nesting depth of pattern groups + (parentheses) is 253 levels. + However, pattern optimizations such as flattening nested sequences + and simplifying nested quantifiers may reduce the effective depth, + so this limit is rarely reached in practice. + + + + The SQL standard defines more subclauses: MEASURES + and SUBSET. They are not currently supported + in PostgreSQL. Also in the standard there are + more variations in AFTER MATCH clause. + + The purpose of a WINDOW clause is to specify the - behavior of window functions appearing in the query's + behavior of window functions appearing in the + query's SELECT list or ORDER BY clause. These functions -- 2.43.0