Re: Row pattern recognition - Mailing list pgsql-hackers
From | Tatsuo Ishii |
---|---|
Subject | Re: Row pattern recognition |
Date | |
Msg-id | 20240524.113919.173511174304034017.t-ishii@sranhm.sra.co.jp Whole thread Raw |
In response to | Re: Row pattern recognition (Tatsuo Ishii <ishii@sraoss.co.jp>) |
Responses |
Re: Row pattern recognition
|
List | pgsql-hackers |
Attached are the v20 patches. Just rebased. (The conflict was in 0001 patch.) Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp From 9caddbaee100149874c9818a326e704fa7586557 Mon Sep 17 00:00:00 2001 From: Tatsuo Ishii <ishii@postgresql.org> Date: Fri, 24 May 2024 11:26:02 +0900 Subject: [PATCH v20 1/8] Row pattern recognition patch for raw parser. --- src/backend/parser/gram.y | 222 ++++++++++++++++++++++++++++++-- src/include/nodes/parsenodes.h | 67 ++++++++++ src/include/parser/kwlist.h | 9 ++ src/include/parser/parse_node.h | 1 + 4 files changed, 288 insertions(+), 11 deletions(-) diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 4d582950b7..ef6f25b2f8 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -679,6 +679,21 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); json_object_constructor_null_clause_opt json_array_constructor_null_clause_opt +%type <target> row_pattern_measure_item + row_pattern_definition +%type <node> opt_row_pattern_common_syntax + opt_row_pattern_skip_to + row_pattern_subset_item + row_pattern_term +%type <list> opt_row_pattern_measures + row_pattern_measure_list + row_pattern_definition_list + opt_row_pattern_subset_clause + row_pattern_subset_list + row_pattern_subset_rhs + row_pattern +%type <boolean> opt_row_pattern_initial_or_seek + first_or_last /* * Non-keyword token types. These are hard-wired into the "flex" lexer. @@ -722,7 +737,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE DATA_P DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS - DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DEPTH DESC + DEFERRABLE DEFERRED DEFINE DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DEPTH DESC DETACH DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P DOUBLE_P DROP @@ -738,7 +753,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); HANDLER HAVING HEADER_P HOLD HOUR_P IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE - INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P + INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIAL INITIALLY INLINE_P INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION @@ -751,7 +766,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED - MAPPING MATCH MATCHED MATERIALIZED MAXVALUE MERGE MERGE_ACTION METHOD + MAPPING MATCH MATCHED MATERIALIZED MAXVALUE MEASURES MERGE MERGE_ACTION METHOD MINUTE_P MINVALUE MODE MONTH_P MOVE NAME_P NAMES NATIONAL NATURAL NCHAR NESTED NEW NEXT NFC NFD NFKC NFKD NO @@ -763,8 +778,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); ORDER ORDINALITY OTHERS OUT_P OUTER_P OVER OVERLAPS OVERLAY OVERRIDING OWNED OWNER - PARALLEL PARAMETER PARSER PARTIAL PARTITION PARTITIONS PASSING PASSWORD PATH - PLACING PLAN PLANS POLICY + PARALLEL PARAMETER PARSER PARTIAL PARTITION PARTITIONS PASSING PASSWORD PAST + PATH PATTERN_P PERIOD PERMUTE PLACING PLAN PLANS POLICY POSITION PRECEDING PRECISION PRESERVE PREPARE PREPARED PRIMARY PRIOR PRIVILEGES PROCEDURAL PROCEDURE PROCEDURES PROGRAM PUBLICATION @@ -775,12 +790,13 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP ROUTINE ROUTINES ROW ROWS RULE - SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SELECT + SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SEEK SELECT SEQUENCE SEQUENCES + SERIALIZABLE SERVER SESSION SESSION_USER SET SETS SETOF SHARE SHOW SIMILAR SIMPLE SKIP SMALLINT SNAPSHOT SOME SPLIT SOURCE SQL_P STABLE STANDALONE_P START STATEMENT STATISTICS STDIN STDOUT STORAGE STORED STRICT_P STRING_P STRIP_P - SUBSCRIPTION SUBSTRING SUPPORT SYMMETRIC SYSID SYSTEM_P SYSTEM_USER + SUBSCRIPTION SUBSET SUBSTRING SUPPORT SYMMETRIC SYSID SYSTEM_P SYSTEM_USER TABLE TABLES TABLESAMPLE TABLESPACE TARGET TEMP TEMPLATE TEMPORARY TEXT_P THEN TIES TIME TIMESTAMP TO TRAILING TRANSACTION TRANSFORM @@ -889,6 +905,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %nonassoc UNBOUNDED NESTED /* ideally would have same precedence as IDENT */ %nonassoc IDENT PARTITION RANGE ROWS GROUPS PRECEDING FOLLOWING CUBE ROLLUP SET KEYS OBJECT_P SCALAR VALUE_P WITH WITHOUT PATH + MEASURES AFTER INITIAL SEEK PATTERN_P %left Op OPERATOR /* multi-character ops and user-defined operators */ %left '+' '-' %left '*' '/' '%' @@ -16287,7 +16304,8 @@ over_clause: OVER window_specification ; window_specification: '(' opt_existing_window_name opt_partition_clause - opt_sort_clause opt_frame_clause ')' + opt_sort_clause opt_row_pattern_measures opt_frame_clause + opt_row_pattern_common_syntax ')' { WindowDef *n = makeNode(WindowDef); @@ -16295,10 +16313,12 @@ window_specification: '(' opt_existing_window_name opt_partition_clause n->refname = $2; n->partitionClause = $3; n->orderClause = $4; + n->rowPatternMeasures = $5; /* copy relevant fields of opt_frame_clause */ - n->frameOptions = $5->frameOptions; - n->startOffset = $5->startOffset; - n->endOffset = $5->endOffset; + n->frameOptions = $6->frameOptions; + n->startOffset = $6->startOffset; + n->endOffset = $6->endOffset; + n->rpCommonSyntax = (RPCommonSyntax *)$7; n->location = @1; $$ = n; } @@ -16322,6 +16342,31 @@ opt_partition_clause: PARTITION BY expr_list { $$ = $3; } | /*EMPTY*/ { $$ = NIL; } ; +/* + * ROW PATTERN_P MEASURES + */ +opt_row_pattern_measures: MEASURES row_pattern_measure_list { $$ = $2; } + | /*EMPTY*/ { $$ = NIL; } + ; + +row_pattern_measure_list: + row_pattern_measure_item + { $$ = list_make1($1); } + | row_pattern_measure_list ',' row_pattern_measure_item + { $$ = lappend($1, $3); } + ; + +row_pattern_measure_item: + a_expr AS ColLabel + { + $$ = makeNode(ResTarget); + $$->name = $3; + $$->indirection = NIL; + $$->val = (Node *) $1; + $$->location = @1; + } + ; + /* * For frame clauses, we return a WindowDef, but only some fields are used: * frameOptions, startOffset, and endOffset. @@ -16481,6 +16526,143 @@ opt_window_exclusion_clause: | /*EMPTY*/ { $$ = 0; } ; +opt_row_pattern_common_syntax: +opt_row_pattern_skip_to opt_row_pattern_initial_or_seek + PATTERN_P '(' row_pattern ')' + opt_row_pattern_subset_clause + DEFINE row_pattern_definition_list + { + RPCommonSyntax *n = makeNode(RPCommonSyntax); + n->rpSkipTo = ((RPCommonSyntax *)$1)->rpSkipTo; + n->rpSkipVariable = ((RPCommonSyntax *)$1)->rpSkipVariable; + n->initial = $2; + n->rpPatterns = $5; + n->rpSubsetClause = $7; + n->rpDefs = $9; + $$ = (Node *) n; + } + | /*EMPTY*/ { $$ = NULL; } + ; + +opt_row_pattern_skip_to: + AFTER MATCH SKIP TO NEXT ROW + { + RPCommonSyntax *n = makeNode(RPCommonSyntax); + n->rpSkipTo = ST_NEXT_ROW; + n->rpSkipVariable = NULL; + $$ = (Node *) n; + } + | AFTER MATCH SKIP PAST LAST_P ROW + { + RPCommonSyntax *n = makeNode(RPCommonSyntax); + n->rpSkipTo = ST_PAST_LAST_ROW; + n->rpSkipVariable = NULL; + $$ = (Node *) n; + } + | AFTER MATCH SKIP TO first_or_last ColId + { + RPCommonSyntax *n = makeNode(RPCommonSyntax); + n->rpSkipTo = $5? ST_FIRST_VARIABLE : ST_LAST_VARIABLE; + n->rpSkipVariable = $6; + $$ = (Node *) n; + } +/* + | AFTER MATCH SKIP TO LAST_P ColId %prec LAST_P + { + RPCommonSyntax *n = makeNode(RPCommonSyntax); + n->rpSkipTo = ST_LAST_VARIABLE; + n->rpSkipVariable = $6; + $$ = n; + } + | AFTER MATCH SKIP TO ColId + { + RPCommonSyntax *n = makeNode(RPCommonSyntax); + n->rpSkipTo = ST_VARIABLE; + n->rpSkipVariable = $5; + $$ = n; + } +*/ + | /*EMPTY*/ + { + RPCommonSyntax *n = makeNode(RPCommonSyntax); + /* temporary set default to ST_NEXT_ROW */ + n->rpSkipTo = ST_PAST_LAST_ROW; + n->rpSkipVariable = NULL; + $$ = (Node *) n; + } + ; + +first_or_last: + FIRST_P { $$ = true; } + | LAST_P { $$ = false; } + ; + +opt_row_pattern_initial_or_seek: + INITIAL { $$ = true; } + | SEEK + { + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("SEEK is not supported"), + errhint("Use INITIAL."), + parser_errposition(@1))); + } + | /*EMPTY*/ { $$ = true; } + ; + +row_pattern: + row_pattern_term { $$ = list_make1($1); } + | row_pattern row_pattern_term { $$ = lappend($1, $2); } + ; + +row_pattern_term: + ColId { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "", (Node *)makeString($1), NULL, @1); } + | ColId '*' { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "*", (Node *)makeString($1), NULL, @1); } + | ColId '+' { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "+", (Node *)makeString($1), NULL, @1); } + | ColId '?' { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "?", (Node *)makeString($1), NULL, @1); } + ; + +opt_row_pattern_subset_clause: + SUBSET row_pattern_subset_list { $$ = $2; } + | /*EMPTY*/ { $$ = NIL; } + ; + +row_pattern_subset_list: + row_pattern_subset_item { $$ = list_make1($1); } + | row_pattern_subset_list ',' row_pattern_subset_item { $$ = lappend($1, $3); } + | /*EMPTY*/ { $$ = NIL; } + ; + +row_pattern_subset_item: ColId '=' '(' row_pattern_subset_rhs ')' + { + RPSubsetItem *n = makeNode(RPSubsetItem); + n->name = $1; + n->rhsVariable = $4; + $$ = (Node *) n; + } + ; + +row_pattern_subset_rhs: + ColId { $$ = list_make1(makeStringConst($1, @1)); } + | row_pattern_subset_rhs ',' ColId { $$ = lappend($1, makeStringConst($3, @1)); } + | /*EMPTY*/ { $$ = NIL; } + ; + +row_pattern_definition_list: + row_pattern_definition { $$ = list_make1($1); } + | row_pattern_definition_list ',' row_pattern_definition { $$ = lappend($1, $3); } + ; + +row_pattern_definition: + ColId AS a_expr + { + $$ = makeNode(ResTarget); + $$->name = $1; + $$->indirection = NIL; + $$->val = (Node *) $3; + $$->location = @1; + } + ; /* * Supporting nonterminals for expressions. @@ -17691,6 +17873,7 @@ unreserved_keyword: | INDEXES | INHERIT | INHERITS + | INITIAL | INLINE_P | INPUT_P | INSENSITIVE @@ -17719,6 +17902,7 @@ unreserved_keyword: | MATCHED | MATERIALIZED | MAXVALUE + | MEASURES | MERGE | METHOD | MINUTE_P @@ -17764,7 +17948,11 @@ unreserved_keyword: | PARTITIONS | PASSING | PASSWORD + | PAST | PATH + | PATTERN_P + | PERIOD + | PERMUTE | PLAN | PLANS | POLICY @@ -17817,6 +18005,7 @@ unreserved_keyword: | SEARCH | SECOND_P | SECURITY + | SEEK | SEQUENCE | SEQUENCES | SERIALIZABLE @@ -17845,6 +18034,7 @@ unreserved_keyword: | STRING_P | STRIP_P | SUBSCRIPTION + | SUBSET | SUPPORT | SYSID | SYSTEM_P @@ -18039,6 +18229,7 @@ reserved_keyword: | CURRENT_USER | DEFAULT | DEFERRABLE + | DEFINE | DESC | DISTINCT | DO @@ -18202,6 +18393,7 @@ bare_label_keyword: | DEFAULTS | DEFERRABLE | DEFERRED + | DEFINE | DEFINER | DELETE_P | DELIMITER @@ -18279,6 +18471,7 @@ bare_label_keyword: | INDEXES | INHERIT | INHERITS + | INITIAL | INITIALLY | INLINE_P | INNER_P @@ -18333,6 +18526,7 @@ bare_label_keyword: | MATCHED | MATERIALIZED | MAXVALUE + | MEASURES | MERGE | MERGE_ACTION | METHOD @@ -18390,7 +18584,11 @@ bare_label_keyword: | PARTITIONS | PASSING | PASSWORD + | PAST | PATH + | PATTERN_P + | PERIOD + | PERMUTE | PLACING | PLAN | PLANS @@ -18449,6 +18647,7 @@ bare_label_keyword: | SCROLL | SEARCH | SECURITY + | SEEK | SELECT | SEQUENCE | SEQUENCES @@ -18483,6 +18682,7 @@ bare_label_keyword: | STRING_P | STRIP_P | SUBSCRIPTION + | SUBSET | SUBSTRING | SUPPORT | SYMMETRIC diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index ddfed02db2..c8a2eed08f 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -550,6 +550,48 @@ typedef struct SortBy ParseLoc location; /* operator location, or -1 if none/unknown */ } SortBy; +/* + * AFTER MATCH row pattern skip to types in row pattern common syntax + */ +typedef enum RPSkipTo +{ + ST_NONE, /* AFTER MATCH omitted */ + ST_NEXT_ROW, /* SKIP TO NEXT ROW */ + ST_PAST_LAST_ROW, /* SKIP TO PAST LAST ROW */ + ST_FIRST_VARIABLE, /* SKIP TO FIRST variable name */ + ST_LAST_VARIABLE, /* SKIP TO LAST variable name */ + ST_VARIABLE /* SKIP TO variable name */ +} RPSkipTo; + +/* + * Row Pattern SUBSET clause item + */ +typedef struct RPSubsetItem +{ + NodeTag type; + char *name; /* Row Pattern SUBSET clause variable name */ + List *rhsVariable; /* Row Pattern SUBSET rhs variables (list of + * char *string) */ +} RPSubsetItem; + +/* + * RowPatternCommonSyntax - raw representation of row pattern common syntax + * + */ +typedef struct RPCommonSyntax +{ + NodeTag type; + RPSkipTo rpSkipTo; /* Row Pattern AFTER MATCH SKIP type */ + char *rpSkipVariable; /* Row Pattern Skip To variable name, if any */ + bool initial; /* true if <row pattern initial or seek> is + * initial */ + List *rpPatterns; /* PATTERN variables (list of A_Expr) */ + List *rpSubsetClause; /* row pattern subset clause (list of + * RPSubsetItem), if any */ + List *rpDefs; /* row pattern definitions clause (list of + * ResTarget) */ +} RPCommonSyntax; + /* * WindowDef - raw representation of WINDOW and OVER clauses * @@ -565,6 +607,9 @@ typedef struct WindowDef char *refname; /* referenced window name, if any */ List *partitionClause; /* PARTITION BY expression list */ List *orderClause; /* ORDER BY (list of SortBy) */ + List *rowPatternMeasures; /* row pattern measures (list of + * ResTarget) */ + RPCommonSyntax *rpCommonSyntax; /* row pattern common syntax */ int frameOptions; /* frame_clause options, see below */ Node *startOffset; /* expression for starting bound, if any */ Node *endOffset; /* expression for ending bound, if any */ @@ -1533,6 +1578,11 @@ typedef struct GroupingSet * the orderClause might or might not be copied (see copiedOrder); the framing * options are never copied, per spec. * + * "defineClause" is Row Pattern Recognition DEFINE clause (list of + * TargetEntry). TargetEntry.resname represents row pattern definition + * variable name. "patternVariable" and "patternRegexp" represents PATTERN + * clause. + * * The information relevant for the query jumbling is the partition clause * type and its bounds. */ @@ -1562,6 +1612,23 @@ typedef struct WindowClause Index winref; /* ID referenced by window functions */ /* did we copy orderClause from refname? */ bool copiedOrder pg_node_attr(query_jumble_ignore); + /* Row Pattern AFTER MACH SKIP clause */ + RPSkipTo rpSkipTo; /* Row Pattern Skip To type */ + char *rpSkipVariable; /* Row Pattern Skip To variable */ + bool initial; /* true if <row pattern initial or seek> is + * initial */ + /* Row Pattern DEFINE clause (list of TargetEntry) */ + List *defineClause; + /* Row Pattern DEFINE variable initial names (list of String) */ + List *defineInitial; + /* Row Pattern PATTERN variable name (list of String) */ + List *patternVariable; + + /* + * Row Pattern PATTERN regular expression quantifier ('+' or ''. list of + * String) + */ + List *patternRegexp; } WindowClause; /* diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h index f7fe834cf4..df49492629 100644 --- a/src/include/parser/kwlist.h +++ b/src/include/parser/kwlist.h @@ -129,6 +129,7 @@ PG_KEYWORD("default", DEFAULT, RESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("defaults", DEFAULTS, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("deferrable", DEFERRABLE, RESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("deferred", DEFERRED, UNRESERVED_KEYWORD, BARE_LABEL) +PG_KEYWORD("define", DEFINE, RESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("definer", DEFINER, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("delete", DELETE_P, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("delimiter", DELIMITER, UNRESERVED_KEYWORD, BARE_LABEL) @@ -215,6 +216,7 @@ PG_KEYWORD("index", INDEX, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("indexes", INDEXES, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("inherit", INHERIT, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("inherits", INHERITS, UNRESERVED_KEYWORD, BARE_LABEL) +PG_KEYWORD("initial", INITIAL, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("initially", INITIALLY, RESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("inline", INLINE_P, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("inner", INNER_P, TYPE_FUNC_NAME_KEYWORD, BARE_LABEL) @@ -273,6 +275,7 @@ PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("matched", MATCHED, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("maxvalue", MAXVALUE, UNRESERVED_KEYWORD, BARE_LABEL) +PG_KEYWORD("measures", MEASURES, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("merge", MERGE, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("merge_action", MERGE_ACTION, COL_NAME_KEYWORD, BARE_LABEL) PG_KEYWORD("method", METHOD, UNRESERVED_KEYWORD, BARE_LABEL) @@ -338,7 +341,11 @@ PG_KEYWORD("partition", PARTITION, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("partitions", PARTITIONS, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("passing", PASSING, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("password", PASSWORD, UNRESERVED_KEYWORD, BARE_LABEL) +PG_KEYWORD("past", PAST, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("path", PATH, UNRESERVED_KEYWORD, BARE_LABEL) +PG_KEYWORD("pattern", PATTERN_P, UNRESERVED_KEYWORD, BARE_LABEL) +PG_KEYWORD("period", PERIOD, UNRESERVED_KEYWORD, BARE_LABEL) +PG_KEYWORD("permute", PERMUTE, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("placing", PLACING, RESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("plan", PLAN, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("plans", PLANS, UNRESERVED_KEYWORD, BARE_LABEL) @@ -400,6 +407,7 @@ PG_KEYWORD("scroll", SCROLL, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("search", SEARCH, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("second", SECOND_P, UNRESERVED_KEYWORD, AS_LABEL) PG_KEYWORD("security", SECURITY, UNRESERVED_KEYWORD, BARE_LABEL) +PG_KEYWORD("seek", SEEK, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("select", SELECT, RESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("sequence", SEQUENCE, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("sequences", SEQUENCES, UNRESERVED_KEYWORD, BARE_LABEL) @@ -434,6 +442,7 @@ PG_KEYWORD("strict", STRICT_P, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("string", STRING_P, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("strip", STRIP_P, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("subscription", SUBSCRIPTION, UNRESERVED_KEYWORD, BARE_LABEL) +PG_KEYWORD("subset", SUBSET, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("substring", SUBSTRING, COL_NAME_KEYWORD, BARE_LABEL) PG_KEYWORD("support", SUPPORT, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("symmetric", SYMMETRIC, RESERVED_KEYWORD, BARE_LABEL) diff --git a/src/include/parser/parse_node.h b/src/include/parser/parse_node.h index 5b781d87a9..66935afff8 100644 --- a/src/include/parser/parse_node.h +++ b/src/include/parser/parse_node.h @@ -51,6 +51,7 @@ typedef enum ParseExprKind EXPR_KIND_WINDOW_FRAME_RANGE, /* window frame clause with RANGE */ EXPR_KIND_WINDOW_FRAME_ROWS, /* window frame clause with ROWS */ EXPR_KIND_WINDOW_FRAME_GROUPS, /* window frame clause with GROUPS */ + EXPR_KIND_RPR_DEFINE, /* DEFINE */ EXPR_KIND_SELECT_TARGET, /* SELECT target list item */ EXPR_KIND_INSERT_TARGET, /* INSERT target list item */ EXPR_KIND_UPDATE_SOURCE, /* UPDATE assignment source item */ -- 2.25.1 From 3a2ac3cd98e8e8a47e4214e5caef82829b8b0df3 Mon Sep 17 00:00:00 2001 From: Tatsuo Ishii <ishii@postgresql.org> Date: Fri, 24 May 2024 11:26:02 +0900 Subject: [PATCH v20 2/8] Row pattern recognition patch (parse/analysis). --- src/backend/parser/parse_agg.c | 7 + src/backend/parser/parse_clause.c | 296 +++++++++++++++++++++++++++++- src/backend/parser/parse_expr.c | 6 + src/backend/parser/parse_func.c | 3 + 4 files changed, 311 insertions(+), 1 deletion(-) diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c index bee7d8346a..9bc22a836a 100644 --- a/src/backend/parser/parse_agg.c +++ b/src/backend/parser/parse_agg.c @@ -577,6 +577,10 @@ check_agglevels_and_constraints(ParseState *pstate, Node *expr) errkind = true; break; + case EXPR_KIND_RPR_DEFINE: + errkind = true; + break; + /* * There is intentionally no default: case here, so that the * compiler will warn if we add a new ParseExprKind without @@ -967,6 +971,9 @@ transformWindowFuncCall(ParseState *pstate, WindowFunc *wfunc, case EXPR_KIND_CYCLE_MARK: errkind = true; break; + case EXPR_KIND_RPR_DEFINE: + errkind = true; + break; /* * There is intentionally no default: case here, so that the diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c index 8118036495..9762dce81f 100644 --- a/src/backend/parser/parse_clause.c +++ b/src/backend/parser/parse_clause.c @@ -98,7 +98,14 @@ static WindowClause *findWindowClause(List *wclist, const char *name); static Node *transformFrameOffset(ParseState *pstate, int frameOptions, Oid rangeopfamily, Oid rangeopcintype, Oid *inRangeFunc, Node *clause); - +static void transformRPR(ParseState *pstate, WindowClause *wc, WindowDef *windef, + List **targetlist); +static List *transformDefineClause(ParseState *pstate, WindowClause *wc, WindowDef *windef, + List **targetlist); +static void transformPatternClause(ParseState *pstate, WindowClause *wc, + WindowDef *windef); +static List *transformMeasureClause(ParseState *pstate, WindowClause *wc, + WindowDef *windef); /* * transformFromClause - @@ -2956,6 +2963,10 @@ transformWindowDefinitions(ParseState *pstate, rangeopfamily, rangeopcintype, &wc->endInRangeFunc, windef->endOffset); + + /* Process Row Pattern Recognition related clauses */ + transformRPR(pstate, wc, windef, targetlist); + wc->winref = winref; result = lappend(result, wc); @@ -3820,3 +3831,286 @@ transformFrameOffset(ParseState *pstate, int frameOptions, return node; } + +/* + * transformRPR + * Process Row Pattern Recognition related clauses + */ +static void +transformRPR(ParseState *pstate, WindowClause *wc, WindowDef *windef, + List **targetlist) +{ + /* + * Window definition exists? + */ + if (windef == NULL) + return; + + /* + * Row Pattern Common Syntax clause exists? + */ + if (windef->rpCommonSyntax == NULL) + return; + + /* Check Frame option. Frame must start at current row */ + if ((wc->frameOptions & FRAMEOPTION_START_CURRENT_ROW) == 0) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("FRAME must start at current row when row patttern recognition is used"))); + + /* Transform AFTER MACH SKIP TO clause */ + wc->rpSkipTo = windef->rpCommonSyntax->rpSkipTo; + + /* Transform AFTER MACH SKIP TO variable */ + wc->rpSkipVariable = windef->rpCommonSyntax->rpSkipVariable; + + /* Transform SEEK or INITIAL clause */ + wc->initial = windef->rpCommonSyntax->initial; + + /* Transform DEFINE clause into list of TargetEntry's */ + wc->defineClause = transformDefineClause(pstate, wc, windef, targetlist); + + /* Check PATTERN clause and copy to patternClause */ + transformPatternClause(pstate, wc, windef); + + /* Transform MEASURE clause */ + transformMeasureClause(pstate, wc, windef); +} + +/* + * transformDefineClause Process DEFINE clause and transform ResTarget into + * list of TargetEntry. + * + * XXX we only support column reference in row pattern definition search + * condition, e.g. "price". <row pattern definition variable name>.<column + * reference> is not supported, e.g. "A.price". + */ +static List * +transformDefineClause(ParseState *pstate, WindowClause *wc, WindowDef *windef, + List **targetlist) +{ + /* DEFINE variable name initials */ + static char *defineVariableInitials = "abcdefghijklmnopqrstuvwxyz"; + + ListCell *lc, + *l; + ResTarget *restarget, + *r; + List *restargets; + List *defineClause; + char *name; + int initialLen; + int i; + + /* + * If Row Definition Common Syntax exists, DEFINE clause must exist. (the + * raw parser should have already checked it.) + */ + Assert(windef->rpCommonSyntax->rpDefs != NULL); + + /* + * Check and add "A AS A IS TRUE" if pattern variable is missing in DEFINE + * per the SQL standard. + */ + restargets = NIL; + foreach(lc, windef->rpCommonSyntax->rpPatterns) + { + A_Expr *a; + bool found = false; + + if (!IsA(lfirst(lc), A_Expr)) + ereport(ERROR, + errmsg("node type is not A_Expr")); + + a = (A_Expr *) lfirst(lc); + name = strVal(a->lexpr); + + foreach(l, windef->rpCommonSyntax->rpDefs) + { + restarget = (ResTarget *) lfirst(l); + + if (!strcmp(restarget->name, name)) + { + found = true; + break; + } + } + + if (!found) + { + /* + * "name" is missing. So create "name AS name IS TRUE" ResTarget + * node and add it to the temporary list. + */ + A_Const *n; + + restarget = makeNode(ResTarget); + n = makeNode(A_Const); + n->val.boolval.type = T_Boolean; + n->val.boolval.boolval = true; + n->location = -1; + restarget->name = pstrdup(name); + restarget->indirection = NIL; + restarget->val = (Node *) n; + restarget->location = -1; + restargets = lappend((List *) restargets, restarget); + } + } + + if (list_length(restargets) >= 1) + { + /* add missing DEFINEs */ + windef->rpCommonSyntax->rpDefs = + list_concat(windef->rpCommonSyntax->rpDefs, restargets); + list_free(restargets); + } + + /* + * Check for duplicate row pattern definition variables. The standard + * requires that no two row pattern definition variable names shall be + * equivalent. + */ + restargets = NIL; + foreach(lc, windef->rpCommonSyntax->rpDefs) + { + restarget = (ResTarget *) lfirst(lc); + name = restarget->name; + + /* + * Add DEFINE expression (Restarget->val) to the targetlist as a + * TargetEntry if it does not exist yet. Planner will add the column + * ref var node to the outer plan's target list later on. This makes + * DEFINE expression could access the outer tuple while evaluating + * PATTERN. + * + * XXX: adding whole expressions of DEFINE to the plan.targetlist is + * not so good, because it's not necessary to evalute the expression + * in the target list while running the plan. We should extract the + * var nodes only then add them to the plan.targetlist. + */ + findTargetlistEntrySQL99(pstate, (Node *) restarget->val, + targetlist, EXPR_KIND_RPR_DEFINE); + + /* + * Make sure that the row pattern definition search condition is a + * boolean expression. + */ + transformWhereClause(pstate, restarget->val, + EXPR_KIND_RPR_DEFINE, "DEFINE"); + + foreach(l, restargets) + { + char *n; + + r = (ResTarget *) lfirst(l); + n = r->name; + + if (!strcmp(n, name)) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("row pattern definition variable name \"%s\" appears more than once in DEFINE clause", + name), + parser_errposition(pstate, exprLocation((Node *) r)))); + } + restargets = lappend(restargets, restarget); + } + list_free(restargets); + + /* + * Create list of row pattern DEFINE variable name's initial. We assign + * [a-z] to them (up to 26 variable names are allowed). + */ + restargets = NIL; + i = 0; + initialLen = strlen(defineVariableInitials); + + foreach(lc, windef->rpCommonSyntax->rpDefs) + { + char initial[2]; + + restarget = (ResTarget *) lfirst(lc); + name = restarget->name; + + if (i >= initialLen) + { + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("number of row pattern definition variable names exceeds %d", + initialLen), + parser_errposition(pstate, + exprLocation((Node *) restarget)))); + } + initial[0] = defineVariableInitials[i++]; + initial[1] = '\0'; + wc->defineInitial = lappend(wc->defineInitial, + makeString(pstrdup(initial))); + } + + defineClause = transformTargetList(pstate, windef->rpCommonSyntax->rpDefs, + EXPR_KIND_RPR_DEFINE); + + /* mark column origins */ + markTargetListOrigins(pstate, defineClause); + + /* mark all nodes in the DEFINE clause tree with collation information */ + assign_expr_collations(pstate, (Node *) defineClause); + + return defineClause; +} + +/* + * transformPatternClause + * Process PATTERN clause and return PATTERN clause in the raw parse tree + */ +static void +transformPatternClause(ParseState *pstate, WindowClause *wc, + WindowDef *windef) +{ + ListCell *lc; + + /* + * Row Pattern Common Syntax clause exists? + */ + if (windef->rpCommonSyntax == NULL) + return; + + wc->patternVariable = NIL; + wc->patternRegexp = NIL; + foreach(lc, windef->rpCommonSyntax->rpPatterns) + { + A_Expr *a; + char *name; + char *regexp; + + if (!IsA(lfirst(lc), A_Expr)) + ereport(ERROR, + errmsg("node type is not A_Expr")); + + a = (A_Expr *) lfirst(lc); + name = strVal(a->lexpr); + + wc->patternVariable = lappend(wc->patternVariable, makeString(pstrdup(name))); + regexp = strVal(lfirst(list_head(a->name))); + + wc->patternRegexp = lappend(wc->patternRegexp, makeString(pstrdup(regexp))); + } +} + +/* + * transformMeasureClause + * Process MEASURE clause + * XXX MEASURE clause is not supported yet + */ +static List * +transformMeasureClause(ParseState *pstate, WindowClause *wc, + WindowDef *windef) +{ + if (windef->rowPatternMeasures == NIL) + return NIL; + + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("%s", "MEASURE clause is not supported yet"), + parser_errposition(pstate, exprLocation((Node *) windef->rowPatternMeasures)))); + return NIL; +} diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index aba3546ed1..e98b45e06e 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -578,6 +578,7 @@ transformColumnRef(ParseState *pstate, ColumnRef *cref) case EXPR_KIND_COPY_WHERE: case EXPR_KIND_GENERATED_COLUMN: case EXPR_KIND_CYCLE_MARK: + case EXPR_KIND_RPR_DEFINE: /* okay */ break; @@ -1860,6 +1861,9 @@ transformSubLink(ParseState *pstate, SubLink *sublink) case EXPR_KIND_GENERATED_COLUMN: err = _("cannot use subquery in column generation expression"); break; + case EXPR_KIND_RPR_DEFINE: + err = _("cannot use subquery in DEFINE expression"); + break; /* * There is intentionally no default: case here, so that the @@ -3197,6 +3201,8 @@ ParseExprKindName(ParseExprKind exprKind) return "GENERATED AS"; case EXPR_KIND_CYCLE_MARK: return "CYCLE"; + case EXPR_KIND_RPR_DEFINE: + return "DEFINE"; /* * There is intentionally no default: case here, so that the diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c index 9b23344a3b..4c482abb30 100644 --- a/src/backend/parser/parse_func.c +++ b/src/backend/parser/parse_func.c @@ -2658,6 +2658,9 @@ check_srf_call_placement(ParseState *pstate, Node *last_srf, int location) case EXPR_KIND_CYCLE_MARK: errkind = true; break; + case EXPR_KIND_RPR_DEFINE: + errkind = true; + break; /* * There is intentionally no default: case here, so that the -- 2.25.1 From 13d116fd6cd7b1ff5dd00ef29fca67b3afedaa31 Mon Sep 17 00:00:00 2001 From: Tatsuo Ishii <ishii@postgresql.org> Date: Fri, 24 May 2024 11:26:02 +0900 Subject: [PATCH v20 3/8] Row pattern recognition patch (rewriter). --- src/backend/utils/adt/ruleutils.c | 103 ++++++++++++++++++++++++++++++ 1 file changed, 103 insertions(+) diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index 9618619762..0863fad59d 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -426,6 +426,10 @@ static void get_rule_groupingset(GroupingSet *gset, List *targetlist, bool omit_parens, deparse_context *context); static void get_rule_orderby(List *orderList, List *targetList, bool force_colno, deparse_context *context); +static void get_rule_pattern(List *patternVariable, List *patternRegexp, + bool force_colno, deparse_context *context); +static void get_rule_define(List *defineClause, List *patternVariables, + bool force_colno, deparse_context *context); static void get_rule_windowclause(Query *query, deparse_context *context); static void get_rule_windowspec(WindowClause *wc, List *targetList, deparse_context *context); @@ -6460,6 +6464,67 @@ get_rule_orderby(List *orderList, List *targetList, } } +/* + * Display a PATTERN clause. + */ +static void +get_rule_pattern(List *patternVariable, List *patternRegexp, + bool force_colno, deparse_context *context) +{ + StringInfo buf = context->buf; + const char *sep; + ListCell *lc_var, + *lc_reg = list_head(patternRegexp); + + sep = ""; + appendStringInfoChar(buf, '('); + foreach(lc_var, patternVariable) + { + char *variable = strVal((String *) lfirst(lc_var)); + char *regexp = NULL; + + if (lc_reg != NULL) + { + regexp = strVal((String *) lfirst(lc_reg)); + + lc_reg = lnext(patternRegexp, lc_reg); + } + + appendStringInfo(buf, "%s%s", sep, variable); + if (regexp !=NULL) + appendStringInfoString(buf, regexp); + + sep = " "; + } + appendStringInfoChar(buf, ')'); +} + +/* + * Display a DEFINE clause. + */ +static void +get_rule_define(List *defineClause, List *patternVariables, + bool force_colno, deparse_context *context) +{ + StringInfo buf = context->buf; + const char *sep; + ListCell *lc_var, + *lc_def; + + sep = " "; + Assert(list_length(patternVariables) == list_length(defineClause)); + + forboth(lc_var, patternVariables, lc_def, defineClause) + { + char *varName = strVal(lfirst(lc_var)); + TargetEntry *te = (TargetEntry *) lfirst(lc_def); + + appendStringInfo(buf, "%s%s AS ", sep, varName); + get_rule_expr((Node *) te->expr, context, false); + sep = ",\n "; + } +} + /* * Display a WINDOW clause. * @@ -6597,6 +6662,44 @@ get_rule_windowspec(WindowClause *wc, List *targetList, appendStringInfoString(buf, "EXCLUDE GROUP "); else if (wc->frameOptions & FRAMEOPTION_EXCLUDE_TIES) appendStringInfoString(buf, "EXCLUDE TIES "); + /* RPR */ + if (wc->rpSkipTo == ST_NEXT_ROW) + appendStringInfoString(buf, + "\n AFTER MATCH SKIP TO NEXT ROW "); + else if (wc->rpSkipTo == ST_PAST_LAST_ROW) + appendStringInfoString(buf, + "\n AFTER MATCH SKIP PAST LAST ROW "); + else if (wc->rpSkipTo == ST_FIRST_VARIABLE) + appendStringInfo(buf, + "\n AFTER MATCH SKIP TO FIRST %s ", + wc->rpSkipVariable); + else if (wc->rpSkipTo == ST_LAST_VARIABLE) + appendStringInfo(buf, + "\n AFTER MATCH SKIP TO LAST %s ", + wc->rpSkipVariable); + else if (wc->rpSkipTo == ST_VARIABLE) + appendStringInfo(buf, + "\n AFTER MATCH SKIP TO %s ", + wc->rpSkipVariable); + + if (wc->initial) + appendStringInfoString(buf, "\n INITIAL"); + + if (wc->patternVariable) + { + appendStringInfoString(buf, "\n PATTERN "); + get_rule_pattern(wc->patternVariable, wc->patternRegexp, + false, context); + } + + if (wc->defineClause) + { + appendStringInfoString(buf, "\n DEFINE\n"); + get_rule_define(wc->defineClause, wc->patternVariable, + false, context); + appendStringInfoChar(buf, ' '); + } + /* we will now have a trailing space; remove it */ buf->len--; } -- 2.25.1 From 3a3372e95be36f0c5bc00f03b6139e3dad9f3968 Mon Sep 17 00:00:00 2001 From: Tatsuo Ishii <ishii@postgresql.org> Date: Fri, 24 May 2024 11:26:02 +0900 Subject: [PATCH v20 4/8] Row pattern recognition patch (planner). --- src/backend/optimizer/plan/createplan.c | 24 +++++++++++++++----- src/backend/optimizer/plan/planner.c | 3 +++ src/backend/optimizer/plan/setrefs.c | 27 ++++++++++++++++++++++- src/backend/optimizer/prep/prepjointree.c | 9 ++++++++ src/include/nodes/plannodes.h | 19 ++++++++++++++++ 5 files changed, 76 insertions(+), 6 deletions(-) diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index 6b64c4a362..ef2101e216 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -287,9 +287,11 @@ static WindowAgg *make_windowagg(List *tlist, Index winref, int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations, int frameOptions, Node *startOffset, Node *endOffset, Oid startInRangeFunc, Oid endInRangeFunc, - Oid inRangeColl, bool inRangeAsc, bool inRangeNullsFirst, - List *runCondition, List *qual, bool topWindow, - Plan *lefttree); + Oid inRangeColl, bool inRangeAsc, bool inRangeNullsFirst, List *runCondition, + RPSkipTo rpSkipTo, List *patternVariable, List *patternRegexp, + List *defineClause, + List *defineInitial, + List *qual, bool topWindow, Plan *lefttree); static Group *make_group(List *tlist, List *qual, int numGroupCols, AttrNumber *grpColIdx, Oid *grpOperators, Oid *grpCollations, Plan *lefttree); @@ -2700,6 +2702,11 @@ create_windowagg_plan(PlannerInfo *root, WindowAggPath *best_path) wc->inRangeAsc, wc->inRangeNullsFirst, best_path->runCondition, + wc->rpSkipTo, + wc->patternVariable, + wc->patternRegexp, + wc->defineClause, + wc->defineInitial, best_path->qual, best_path->topwindow, subplan); @@ -6629,8 +6636,10 @@ make_windowagg(List *tlist, Index winref, int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations, int frameOptions, Node *startOffset, Node *endOffset, Oid startInRangeFunc, Oid endInRangeFunc, - Oid inRangeColl, bool inRangeAsc, bool inRangeNullsFirst, - List *runCondition, List *qual, bool topWindow, Plan *lefttree) + Oid inRangeColl, bool inRangeAsc, bool inRangeNullsFirst, List *runCondition, + RPSkipTo rpSkipTo, List *patternVariable, List *patternRegexp, List *defineClause, + List *defineInitial, + List *qual, bool topWindow, Plan *lefttree) { WindowAgg *node = makeNode(WindowAgg); Plan *plan = &node->plan; @@ -6656,6 +6665,11 @@ make_windowagg(List *tlist, Index winref, node->inRangeAsc = inRangeAsc; node->inRangeNullsFirst = inRangeNullsFirst; node->topWindow = topWindow; + node->rpSkipTo = rpSkipTo, + node->patternVariable = patternVariable; + node->patternRegexp = patternRegexp; + node->defineClause = defineClause; + node->defineInitial = defineInitial; plan->targetlist = tlist; plan->lefttree = lefttree; diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 032818423f..a5d17e3fda 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -870,6 +870,9 @@ subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root, EXPRKIND_LIMIT); wc->endOffset = preprocess_expression(root, wc->endOffset, EXPRKIND_LIMIT); + wc->defineClause = (List *) preprocess_expression(root, + (Node *) wc->defineClause, + EXPRKIND_TARGET); } parse->limitOffset = preprocess_expression(root, parse->limitOffset, diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c index 37abcb4701..cd0e7c57d8 100644 --- a/src/backend/optimizer/plan/setrefs.c +++ b/src/backend/optimizer/plan/setrefs.c @@ -210,7 +210,6 @@ static List *set_windowagg_runcondition_references(PlannerInfo *root, List *runcondition, Plan *plan); - /***************************************************************************** * * SUBPLAN REFERENCES @@ -2471,6 +2470,32 @@ set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset) NRM_EQUAL, NUM_EXEC_QUAL(plan)); + /* + * Modifies an expression tree in each DEFINE clause so that all Var + * nodes's varno refers to OUTER_VAR. + */ + if (IsA(plan, WindowAgg)) + { + WindowAgg *wplan = (WindowAgg *) plan; + + if (wplan->defineClause != NIL) + { + foreach(l, wplan->defineClause) + { + TargetEntry *tle = (TargetEntry *) lfirst(l); + + tle->expr = (Expr *) + fix_upper_expr(root, + (Node *) tle->expr, + subplan_itlist, + OUTER_VAR, + rtoffset, + NRM_EQUAL, + NUM_EXEC_QUAL(plan)); + } + } + } + pfree(subplan_itlist); } diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index 5482ab85a7..f54db3f044 100644 --- a/src/backend/optimizer/prep/prepjointree.c +++ b/src/backend/optimizer/prep/prepjointree.c @@ -2175,6 +2175,15 @@ perform_pullup_replace_vars(PlannerInfo *root, parse->returningList = (List *) pullup_replace_vars((Node *) parse->returningList, rvcontext); + foreach(lc, parse->windowClause) + { + WindowClause *wc = lfirst_node(WindowClause, lc); + + if (wc->defineClause != NIL) + wc->defineClause = (List *) + pullup_replace_vars((Node *) wc->defineClause, rvcontext); + } + if (parse->onConflict) { parse->onConflict->onConflictSet = (List *) diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h index 1aeeaec95e..5b8d8a6dac 100644 --- a/src/include/nodes/plannodes.h +++ b/src/include/nodes/plannodes.h @@ -20,6 +20,7 @@ #include "lib/stringinfo.h" #include "nodes/bitmapset.h" #include "nodes/lockoptions.h" +#include "nodes/parsenodes.h" #include "nodes/primnodes.h" @@ -1098,6 +1099,24 @@ typedef struct WindowAgg /* nulls sort first for in_range tests? */ bool inRangeNullsFirst; + /* Row Pattern Recognition AFTER MACH SKIP clause */ + RPSkipTo rpSkipTo; /* Row Pattern Skip To type */ + + /* Row Pattern PATTERN variable name (list of String) */ + List *patternVariable; + + /* + * Row Pattern RPATTERN regular expression quantifier ('+' or ''. list of + * String) + */ + List *patternRegexp; + + /* Row Pattern DEFINE clause (list of TargetEntry) */ + List *defineClause; + + /* Row Pattern DEFINE variable initial names (list of String) */ + List *defineInitial; + /* * false for all apart from the WindowAgg that's closest to the root of * the plan -- 2.25.1 From cdc0eefe6f322e2e1fe23c009354e282321bcf37 Mon Sep 17 00:00:00 2001 From: Tatsuo Ishii <ishii@postgresql.org> Date: Fri, 24 May 2024 11:26:02 +0900 Subject: [PATCH v20 5/8] Row pattern recognition patch (executor). --- src/backend/executor/nodeWindowAgg.c | 1610 +++++++++++++++++++++++++- src/backend/utils/adt/windowfuncs.c | 37 +- src/include/catalog/pg_proc.dat | 6 + src/include/nodes/execnodes.h | 30 + 4 files changed, 1671 insertions(+), 12 deletions(-) diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c index 3221fa1522..140bb3941e 100644 --- a/src/backend/executor/nodeWindowAgg.c +++ b/src/backend/executor/nodeWindowAgg.c @@ -36,6 +36,7 @@ #include "access/htup_details.h" #include "catalog/objectaccess.h" #include "catalog/pg_aggregate.h" +#include "catalog/pg_collation_d.h" #include "catalog/pg_proc.h" #include "executor/executor.h" #include "executor/nodeWindowAgg.h" @@ -48,6 +49,7 @@ #include "utils/acl.h" #include "utils/builtins.h" #include "utils/datum.h" +#include "utils/fmgroids.h" #include "utils/expandeddatum.h" #include "utils/lsyscache.h" #include "utils/memutils.h" @@ -159,6 +161,43 @@ typedef struct WindowStatePerAggData bool restart; /* need to restart this agg in this cycle? */ } WindowStatePerAggData; +/* + * Set of StringInfo. Used in RPR. + */ +typedef struct StringSet +{ + StringInfo *str_set; + Size set_size; /* current array allocation size in number of + * items */ + int set_index; /* current used size */ +} StringSet; + +/* + * Allowed subsequent PATTERN variables positions. + * Used in RPR. + * + * pos represents the pattern variable defined order in DEFINE caluase. For + * example. "DEFINE START..., UP..., DOWN ..." and "PATTERN START UP DOWN UP" + * will create: + * VariablePos[0].pos[0] = 0; START + * VariablePos[1].pos[0] = 1; UP + * VariablePos[1].pos[1] = 3; UP + * VariablePos[2].pos[0] = 2; DOWN + * + * Note that UP has two pos because UP appears in PATTERN twice. + * + * By using this strucrture, we can know which pattern variable can be followed + * by which pattern variable(s). For example, START can be followed by UP and + * DOWN since START's pos is 0, and UP's pos is 1 or 3, DOWN's pos is 2. + * DOWN can be followed by UP since UP's pos is either 1 or 3. + * + */ +#define NUM_ALPHABETS 26 /* we allow [a-z] variable initials */ +typedef struct VariablePos +{ + int pos[NUM_ALPHABETS]; /* postion(s) in PATTERN */ +} VariablePos; + static void initialize_windowaggregate(WindowAggState *winstate, WindowStatePerFunc perfuncstate, WindowStatePerAgg peraggstate); @@ -184,6 +223,7 @@ static void release_partition(WindowAggState *winstate); static int row_is_in_frame(WindowAggState *winstate, int64 pos, TupleTableSlot *slot); + static void update_frameheadpos(WindowAggState *winstate); static void update_frametailpos(WindowAggState *winstate); static void update_grouptailpos(WindowAggState *winstate); @@ -195,9 +235,48 @@ static Datum GetAggInitVal(Datum textInitVal, Oid transtype); static bool are_peers(WindowAggState *winstate, TupleTableSlot *slot1, TupleTableSlot *slot2); + +static int WinGetSlotInFrame(WindowObject winobj, TupleTableSlot *slot, + int relpos, int seektype, bool set_mark, + bool *isnull, bool *isout); static bool window_gettupleslot(WindowObject winobj, int64 pos, TupleTableSlot *slot); +static void attno_map(Node *node); +static bool attno_map_walker(Node *node, void *context); +static int row_is_in_reduced_frame(WindowObject winobj, int64 pos); +static bool rpr_is_defined(WindowAggState *winstate); + +static void create_reduced_frame_map(WindowAggState *winstate); +static int get_reduced_frame_map(WindowAggState *winstate, int64 pos); +static void register_reduced_frame_map(WindowAggState *winstate, int64 pos, + int val); +static void clear_reduced_frame_map(WindowAggState *winstate); +static void update_reduced_frame(WindowObject winobj, int64 pos); + +static int64 evaluate_pattern(WindowObject winobj, int64 current_pos, + char *vname, StringInfo encoded_str, bool *result); + +static bool get_slots(WindowObject winobj, int64 current_pos); + +static int search_str_set(char *pattern, StringSet * str_set, + VariablePos * variable_pos); +static char pattern_initial(WindowAggState *winstate, char *vname); +static int do_pattern_match(char *pattern, char *encoded_str); + +static StringSet * string_set_init(void); +static void string_set_add(StringSet * string_set, StringInfo str); +static StringInfo string_set_get(StringSet * string_set, int index); +static int string_set_get_size(StringSet * string_set); +static void string_set_discard(StringSet * string_set); +static VariablePos * variable_pos_init(void); +static void variable_pos_register(VariablePos * variable_pos, char initial, + int pos); +static bool variable_pos_compare(VariablePos * variable_pos, + char initial1, char initial2); +static int variable_pos_fetch(VariablePos * variable_pos, char initial, + int index); +static void variable_pos_discard(VariablePos * variable_pos); /* * initialize_windowaggregate @@ -774,10 +853,12 @@ eval_windowaggregates(WindowAggState *winstate) * transition function, or * - we have an EXCLUSION clause, or * - if the new frame doesn't overlap the old one + * - if RPR is enabled * * Note that we don't strictly need to restart in the last case, but if * we're going to remove all rows from the aggregation anyway, a restart * surely is faster. + * we restart aggregation too. *---------- */ numaggs_restart = 0; @@ -788,7 +869,8 @@ eval_windowaggregates(WindowAggState *winstate) (winstate->aggregatedbase != winstate->frameheadpos && !OidIsValid(peraggstate->invtransfn_oid)) || (winstate->frameOptions & FRAMEOPTION_EXCLUSION) || - winstate->aggregatedupto <= winstate->frameheadpos) + winstate->aggregatedupto <= winstate->frameheadpos || + rpr_is_defined(winstate)) { peraggstate->restart = true; numaggs_restart++; @@ -862,7 +944,22 @@ eval_windowaggregates(WindowAggState *winstate) * head, so that tuplestore can discard unnecessary rows. */ if (agg_winobj->markptr >= 0) - WinSetMarkPosition(agg_winobj, winstate->frameheadpos); + { + int64 markpos = winstate->frameheadpos; + + if (rpr_is_defined(winstate)) + { + /* + * If RPR is used, it is possible PREV wants to look at the + * previous row. So the mark pos should be frameheadpos - 1 + * unless it is below 0. + */ + markpos -= 1; + if (markpos < 0) + markpos = 0; + } + WinSetMarkPosition(agg_winobj, markpos); + } /* * Now restart the aggregates that require it. @@ -917,6 +1014,14 @@ eval_windowaggregates(WindowAggState *winstate) { winstate->aggregatedupto = winstate->frameheadpos; ExecClearTuple(agg_row_slot); + + /* + * If RPR is defined, we do not use aggregatedupto_nonrestarted. To + * avoid assertion failure below, we reset aggregatedupto_nonrestarted + * to frameheadpos. + */ + if (rpr_is_defined(winstate)) + aggregatedupto_nonrestarted = winstate->frameheadpos; } /* @@ -930,6 +1035,12 @@ eval_windowaggregates(WindowAggState *winstate) { int ret; +#ifdef RPR_DEBUG + elog(DEBUG1, "===== loop in frame starts: aggregatedupto: " INT64_FORMAT " aggregatedbase: " INT64_FORMAT, + winstate->aggregatedupto, + winstate->aggregatedbase); +#endif + /* Fetch next row if we didn't already */ if (TupIsNull(agg_row_slot)) { @@ -945,9 +1056,52 @@ eval_windowaggregates(WindowAggState *winstate) ret = row_is_in_frame(winstate, winstate->aggregatedupto, agg_row_slot); if (ret < 0) break; + if (ret == 0) goto next_tuple; + if (rpr_is_defined(winstate)) + { +#ifdef RPR_DEBUG + elog(DEBUG1, "reduced_frame_map: %d aggregatedupto: " INT64_FORMAT " aggregatedbase: " INT64_FORMAT, + get_reduced_frame_map(winstate, + winstate->aggregatedupto), + winstate->aggregatedupto, + winstate->aggregatedbase); +#endif + /* + * If the row status at currentpos is already decided and current + * row status is not decided yet, it means we passed the last + * reduced frame. Time to break the loop. + */ + if (get_reduced_frame_map(winstate, + winstate->currentpos) != RF_NOT_DETERMINED && + get_reduced_frame_map(winstate, + winstate->aggregatedupto) == RF_NOT_DETERMINED) + break; + + /* + * Otherwise we need to calculate the reduced frame. + */ + ret = row_is_in_reduced_frame(winstate->agg_winobj, + winstate->aggregatedupto); + if (ret == -1) /* unmatched row */ + break; + + /* + * Check if current row needs to be skipped due to no match. + */ + if (get_reduced_frame_map(winstate, + winstate->aggregatedupto) == RF_SKIPPED && + winstate->aggregatedupto == winstate->aggregatedbase) + { +#ifdef RPR_DEBUG + elog(DEBUG1, "skip current row for aggregation"); +#endif + break; + } + } + /* Set tuple context for evaluation of aggregate arguments */ winstate->tmpcontext->ecxt_outertuple = agg_row_slot; @@ -976,6 +1130,7 @@ next_tuple: ExecClearTuple(agg_row_slot); } + /* The frame's end is not supposed to move backwards, ever */ Assert(aggregatedupto_nonrestarted <= winstate->aggregatedupto); @@ -995,7 +1150,6 @@ next_tuple: &winstate->perfunc[wfuncno], peraggstate, result, isnull); - /* * save the result in case next row shares the same frame. * @@ -1090,6 +1244,7 @@ begin_partition(WindowAggState *winstate) winstate->framehead_valid = false; winstate->frametail_valid = false; winstate->grouptail_valid = false; + create_reduced_frame_map(winstate); winstate->spooled_rows = 0; winstate->currentpos = 0; winstate->frameheadpos = 0; @@ -2053,6 +2208,11 @@ ExecWindowAgg(PlanState *pstate) CHECK_FOR_INTERRUPTS(); +#ifdef RPR_DEBUG + elog(DEBUG1, "ExecWindowAgg called. pos: " INT64_FORMAT, + winstate->currentpos); +#endif + if (winstate->status == WINDOWAGG_DONE) return NULL; @@ -2221,6 +2381,17 @@ ExecWindowAgg(PlanState *pstate) /* don't evaluate the window functions when we're in pass-through mode */ if (winstate->status == WINDOWAGG_RUN) { + /* + * If RPR is defined and skip mode is next row, we need to clear + * existing reduced frame info so that we newly calculate the info + * starting from current row. + */ + if (rpr_is_defined(winstate)) + { + if (winstate->rpSkipTo == ST_NEXT_ROW) + clear_reduced_frame_map(winstate); + } + /* * Evaluate true window functions */ @@ -2388,6 +2559,9 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags) TupleDesc scanDesc; ListCell *l; + TargetEntry *te; + Expr *expr; + /* check for unsupported flags */ Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK))); @@ -2486,6 +2660,16 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags) winstate->temp_slot_2 = ExecInitExtraTupleSlot(estate, scanDesc, &TTSOpsMinimalTuple); + winstate->prev_slot = ExecInitExtraTupleSlot(estate, scanDesc, + &TTSOpsMinimalTuple); + + winstate->next_slot = ExecInitExtraTupleSlot(estate, scanDesc, + &TTSOpsMinimalTuple); + + winstate->null_slot = ExecInitExtraTupleSlot(estate, scanDesc, + &TTSOpsMinimalTuple); + winstate->null_slot = ExecStoreAllNullTuple(winstate->null_slot); + /* * create frame head and tail slots only if needed (must create slots in * exactly the same cases that update_frameheadpos and update_frametailpos @@ -2667,6 +2851,43 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags) winstate->inRangeAsc = node->inRangeAsc; winstate->inRangeNullsFirst = node->inRangeNullsFirst; + /* Set up SKIP TO type */ + winstate->rpSkipTo = node->rpSkipTo; + /* Set up row pattern recognition PATTERN clause */ + winstate->patternVariableList = node->patternVariable; + winstate->patternRegexpList = node->patternRegexp; + + /* Set up row pattern recognition DEFINE clause */ + winstate->defineInitial = node->defineInitial; + winstate->defineVariableList = NIL; + winstate->defineClauseList = NIL; + if (node->defineClause != NIL) + { + /* + * Tweak arg var of PREV/NEXT so that it refers to scan/inner slot. + */ + foreach(l, node->defineClause) + { + char *name; + ExprState *exps; + + te = lfirst(l); + name = te->resname; + expr = te->expr; + +#ifdef RPR_DEBUG + elog(DEBUG1, "defineVariable name: %s", name); +#endif + winstate->defineVariableList = + lappend(winstate->defineVariableList, + makeString(pstrdup(name))); + attno_map((Node *) expr); + exps = ExecInitExpr(expr, (PlanState *) winstate); + winstate->defineClauseList = + lappend(winstate->defineClauseList, exps); + } + } + winstate->all_first = true; winstate->partition_spooled = false; winstate->more_partitions = false; @@ -2674,6 +2895,64 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags) return winstate; } +/* + * Rewrite varno of Var node that is the argument of PREV/NET so that it sees + * scan tuple (PREV) or inner tuple (NEXT). + */ +static void +attno_map(Node *node) +{ + (void) expression_tree_walker(node, attno_map_walker, NULL); +} + +static bool +attno_map_walker(Node *node, void *context) +{ + FuncExpr *func; + int nargs; + Expr *expr; + Var *var; + + if (node == NULL) + return false; + + if (IsA(node, FuncExpr)) + { + func = (FuncExpr *) node; + + if (func->funcid == F_PREV || func->funcid == F_NEXT) + { + /* sanity check */ + nargs = list_length(func->args); + if (list_length(func->args) != 1) + elog(ERROR, "PREV/NEXT must have 1 argument but function %d has %d args", + func->funcid, nargs); + + expr = (Expr *) lfirst(list_head(func->args)); + if (!IsA(expr, Var)) + elog(ERROR, "PREV/NEXT's arg is not Var"); /* XXX: is it possible + * that arg type is + * Const? */ + var = (Var *) expr; + + if (func->funcid == F_PREV) + + /* + * Rewrite varno from OUTER_VAR to regular var no so that the + * var references scan tuple. + */ + var->varno = var->varnosyn; + else + var->varno = INNER_VAR; + +#ifdef RPR_DEBUG + elog(DEBUG1, "PREV/NEXT's varno is rewritten to: %d", var->varno); +#endif + } + } + return expression_tree_walker(node, attno_map_walker, NULL); +} + /* ----------------- * ExecEndWindowAgg * ----------------- @@ -2723,6 +3002,8 @@ ExecReScanWindowAgg(WindowAggState *node) ExecClearTuple(node->agg_row_slot); ExecClearTuple(node->temp_slot_1); ExecClearTuple(node->temp_slot_2); + ExecClearTuple(node->prev_slot); + ExecClearTuple(node->next_slot); if (node->framehead_slot) ExecClearTuple(node->framehead_slot); if (node->frametail_slot) @@ -3083,7 +3364,8 @@ window_gettupleslot(WindowObject winobj, int64 pos, TupleTableSlot *slot) return false; if (pos < winobj->markpos) - elog(ERROR, "cannot fetch row before WindowObject's mark position"); + elog(ERROR, "cannot fetch row: " INT64_FORMAT " before WindowObject's mark position: " INT64_FORMAT, + pos, winobj->markpos); oldcontext = MemoryContextSwitchTo(winstate->ss.ps.ps_ExprContext->ecxt_per_query_memory); @@ -3403,14 +3685,54 @@ WinGetFuncArgInFrame(WindowObject winobj, int argno, WindowAggState *winstate; ExprContext *econtext; TupleTableSlot *slot; - int64 abs_pos; - int64 mark_pos; Assert(WindowObjectIsValid(winobj)); winstate = winobj->winstate; econtext = winstate->ss.ps.ps_ExprContext; slot = winstate->temp_slot_1; + if (WinGetSlotInFrame(winobj, slot, + relpos, seektype, set_mark, + isnull, isout) == 0) + { + econtext->ecxt_outertuple = slot; + return ExecEvalExpr((ExprState *) list_nth(winobj->argstates, argno), + econtext, isnull); + } + + if (isout) + *isout = true; + *isnull = true; + return (Datum) 0; +} + +/* + * WinGetSlotInFrame + * slot: TupleTableSlot to store the result + * relpos: signed rowcount offset from the seek position + * seektype: WINDOW_SEEK_HEAD or WINDOW_SEEK_TAIL + * set_mark: If the row is found/in frame and set_mark is true, the mark is + * moved to the row as a side-effect. + * isnull: output argument, receives isnull status of result + * isout: output argument, set to indicate whether target row position + * is out of frame (can pass NULL if caller doesn't care about this) + * + * Returns 0 if we successfullt got the slot. false if out of frame. + * (also isout is set) + */ +static int +WinGetSlotInFrame(WindowObject winobj, TupleTableSlot *slot, + int relpos, int seektype, bool set_mark, + bool *isnull, bool *isout) +{ + WindowAggState *winstate; + int64 abs_pos; + int64 mark_pos; + int num_reduced_frame; + + Assert(WindowObjectIsValid(winobj)); + winstate = winobj->winstate; + switch (seektype) { case WINDOW_SEEK_CURRENT: @@ -3477,11 +3799,25 @@ WinGetFuncArgInFrame(WindowObject winobj, int argno, winstate->frameOptions); break; } + num_reduced_frame = row_is_in_reduced_frame(winobj, + winstate->frameheadpos); + if (num_reduced_frame < 0) + goto out_of_frame; + else if (num_reduced_frame > 0) + if (relpos >= num_reduced_frame) + goto out_of_frame; break; case WINDOW_SEEK_TAIL: /* rejecting relpos > 0 is easy and simplifies code below */ if (relpos > 0) goto out_of_frame; + + /* + * RPR cares about frame head pos. Need to call + * update_frameheadpos + */ + update_frameheadpos(winstate); + update_frametailpos(winstate); abs_pos = winstate->frametailpos - 1 + relpos; @@ -3548,6 +3884,14 @@ WinGetFuncArgInFrame(WindowObject winobj, int argno, mark_pos = 0; /* keep compiler quiet */ break; } + + num_reduced_frame = row_is_in_reduced_frame(winobj, + winstate->frameheadpos + relpos); + if (num_reduced_frame < 0) + goto out_of_frame; + else if (num_reduced_frame > 0) + abs_pos = winstate->frameheadpos + relpos + + num_reduced_frame - 1; break; default: elog(ERROR, "unrecognized window seek type: %d", seektype); @@ -3566,15 +3910,13 @@ WinGetFuncArgInFrame(WindowObject winobj, int argno, *isout = false; if (set_mark) WinSetMarkPosition(winobj, mark_pos); - econtext->ecxt_outertuple = slot; - return ExecEvalExpr((ExprState *) list_nth(winobj->argstates, argno), - econtext, isnull); + return 0; out_of_frame: if (isout) *isout = true; *isnull = true; - return (Datum) 0; + return -1; } /* @@ -3605,3 +3947,1251 @@ WinGetFuncArgCurrent(WindowObject winobj, int argno, bool *isnull) return ExecEvalExpr((ExprState *) list_nth(winobj->argstates, argno), econtext, isnull); } + +/* + * rpr_is_defined + * return true if Row pattern recognition is defined. + */ +static +bool +rpr_is_defined(WindowAggState *winstate) +{ + return winstate->patternVariableList != NIL; +} + +/* + * ----------------- + * row_is_in_reduced_frame + * Determine whether a row is in the current row's reduced window frame + * according to row pattern matching + * + * The row must has been already determined that it is in a full window frame + * and fetched it into slot. + * + * Returns: + * = 0, RPR is not defined. + * >0, if the row is the first in the reduced frame. Return the number of rows + * in the reduced frame. + * -1, if the row is unmatched row + * -2, if the row is in the reduced frame but needed to be skipped because of + * AFTER MATCH SKIP PAST LAST ROW + * ----------------- + */ +static +int +row_is_in_reduced_frame(WindowObject winobj, int64 pos) +{ + WindowAggState *winstate = winobj->winstate; + int state; + int rtn; + + if (!rpr_is_defined(winstate)) + { + /* + * RPR is not defined. Assume that we are always in the the reduced + * window frame. + */ + rtn = 0; +#ifdef RPR_DEBUG + elog(DEBUG1, "row_is_in_reduced_frame returns %d: pos: " INT64_FORMAT, + rtn, pos); +#endif + return rtn; + } + + state = get_reduced_frame_map(winstate, pos); + + if (state == RF_NOT_DETERMINED) + { + update_frameheadpos(winstate); + update_reduced_frame(winobj, pos); + } + + state = get_reduced_frame_map(winstate, pos); + + switch (state) + { + int64 i; + int num_reduced_rows; + + case RF_FRAME_HEAD: + num_reduced_rows = 1; + for (i = pos + 1; + get_reduced_frame_map(winstate, i) == RF_SKIPPED; i++) + num_reduced_rows++; + rtn = num_reduced_rows; + break; + + case RF_SKIPPED: + rtn = -2; + break; + + case RF_UNMATCHED: + rtn = -1; + break; + + default: + elog(ERROR, "Unrecognized state: %d at: " INT64_FORMAT, + state, pos); + break; + } + +#ifdef RPR_DEBUG + elog(DEBUG1, "row_is_in_reduced_frame returns %d: pos: " INT64_FORMAT, + rtn, pos); +#endif + return rtn; +} + +#define REDUCED_FRAME_MAP_INIT_SIZE 1024L + +/* + * create_reduced_frame_map + * Create reduced frame map + */ +static +void +create_reduced_frame_map(WindowAggState *winstate) +{ + winstate->reduced_frame_map = + MemoryContextAlloc(winstate->partcontext, + REDUCED_FRAME_MAP_INIT_SIZE); + winstate->alloc_sz = REDUCED_FRAME_MAP_INIT_SIZE; + clear_reduced_frame_map(winstate); +} + +/* + * clear_reduced_frame_map + * Clear reduced frame map + */ +static +void +clear_reduced_frame_map(WindowAggState *winstate) +{ + Assert(winstate->reduced_frame_map != NULL); + MemSet(winstate->reduced_frame_map, RF_NOT_DETERMINED, + winstate->alloc_sz); +} + +/* + * get_reduced_frame_map + * Get reduced frame map specified by pos + */ +static +int +get_reduced_frame_map(WindowAggState *winstate, int64 pos) +{ + Assert(winstate->reduced_frame_map != NULL); + + if (pos < 0 || pos >= winstate->alloc_sz) + elog(ERROR, "wrong pos: " INT64_FORMAT, pos); + + return winstate->reduced_frame_map[pos]; +} + +/* + * register_reduced_frame_map + * Add/replace reduced frame map member at pos. + * If there's no enough space, expand the map. + */ +static +void +register_reduced_frame_map(WindowAggState *winstate, int64 pos, int val) +{ + int64 realloc_sz; + + Assert(winstate->reduced_frame_map != NULL); + + if (pos < 0) + elog(ERROR, "wrong pos: " INT64_FORMAT, pos); + + if (pos > winstate->alloc_sz - 1) + { + realloc_sz = winstate->alloc_sz * 2; + + winstate->reduced_frame_map = + repalloc(winstate->reduced_frame_map, realloc_sz); + + MemSet(winstate->reduced_frame_map + winstate->alloc_sz, + RF_NOT_DETERMINED, realloc_sz - winstate->alloc_sz); + + winstate->alloc_sz = realloc_sz; + } + + winstate->reduced_frame_map[pos] = val; +} + +/* + * update_reduced_frame + * Update reduced frame info. + */ +static +void +update_reduced_frame(WindowObject winobj, int64 pos) +{ + WindowAggState *winstate = winobj->winstate; + ListCell *lc1, + *lc2; + bool expression_result; + int num_matched_rows; + int64 original_pos; + bool anymatch; + StringInfo encoded_str; + StringInfo pattern_str = makeStringInfo(); + StringSet *str_set; + int initial_index; + VariablePos *variable_pos; + bool greedy = false; + int64 result_pos, + i; + + /* + * Set of pattern variables evaluated to true. Each character corresponds + * to pattern variable. Example: str_set[0] = "AB"; str_set[1] = "AC"; In + * this case at row 0 A and B are true, and A and C are true in row 1. + */ + + /* initialize pattern variables set */ + str_set = string_set_init(); + + /* save original pos */ + original_pos = pos; + + /* + * Check if the pattern does not include any greedy quantifier. If it does + * not, we can just apply the pattern to each row. If it succeeds, we are + * done. + */ + foreach(lc1, winstate->patternRegexpList) + { + char *quantifier = strVal(lfirst(lc1)); + + if (*quantifier == '+' || *quantifier == '*') + { + greedy = true; + break; + } + } + + /* + * Non greedy case + */ + if (!greedy) + { + num_matched_rows = 0; + + foreach(lc1, winstate->patternVariableList) + { + char *vname = strVal(lfirst(lc1)); + + encoded_str = makeStringInfo(); + +#ifdef RPR_DEBUG + elog(DEBUG1, "pos: " INT64_FORMAT " pattern vname: %s", + pos, vname); +#endif + expression_result = false; + + /* evaluate row pattern against current row */ + result_pos = evaluate_pattern(winobj, pos, vname, + encoded_str, &expression_result); + if (!expression_result || result_pos < 0) + { +#ifdef RPR_DEBUG + elog(DEBUG1, "expression result is false or out of frame"); +#endif + register_reduced_frame_map(winstate, original_pos, + RF_UNMATCHED); + return; + } + /* move to next row */ + pos++; + num_matched_rows++; + } +#ifdef RPR_DEBUG + elog(DEBUG1, "pattern matched"); +#endif + register_reduced_frame_map(winstate, original_pos, RF_FRAME_HEAD); + + for (i = original_pos + 1; i < original_pos + num_matched_rows; i++) + { + register_reduced_frame_map(winstate, i, RF_SKIPPED); + } + return; + } + + /* + * Greedy quantifiers included. Loop over until none of pattern matches or + * encounters end of frame. + */ + for (;;) + { + result_pos = -1; + + /* + * Loop over each PATTERN variable. + */ + anymatch = false; + encoded_str = makeStringInfo(); + + forboth(lc1, winstate->patternVariableList, lc2, + winstate->patternRegexpList) + { + char *vname = strVal(lfirst(lc1)); +#ifdef RPR_DEBUG + char *quantifier = strVal(lfirst(lc2)); + + elog(DEBUG1, "pos: " INT64_FORMAT " pattern vname: %s quantifier: %s", + pos, vname, quantifier); +#endif + expression_result = false; + + /* evaluate row pattern against current row */ + result_pos = evaluate_pattern(winobj, pos, vname, + encoded_str, &expression_result); + if (expression_result) + { +#ifdef RPR_DEBUG + elog(DEBUG1, "expression result is true"); +#endif + anymatch = true; + } + + /* + * If out of frame, we are done. + */ + if (result_pos < 0) + break; + } + + if (!anymatch) + { + /* none of patterns matched. */ + break; + } + + string_set_add(str_set, encoded_str); + +#ifdef RPR_DEBUG + elog(DEBUG1, "pos: " INT64_FORMAT " encoded_str: %s", + encoded_str->data); +#endif + + /* move to next row */ + pos++; + + if (result_pos < 0) + { + /* out of frame */ + break; + } + } + + if (string_set_get_size(str_set) == 0) + { + /* no match found in the first row */ + register_reduced_frame_map(winstate, original_pos, RF_UNMATCHED); + return; + } + +#ifdef RPR_DEBUG + elog(DEBUG2, "pos: " INT64_FORMAT " encoded_str: %s", + pos, encoded_str->data); +#endif + + /* build regular expression */ + pattern_str = makeStringInfo(); + appendStringInfoChar(pattern_str, '^'); + initial_index = 0; + + variable_pos = variable_pos_init(); + + forboth(lc1, winstate->patternVariableList, + lc2, winstate->patternRegexpList) + { + char *vname = strVal(lfirst(lc1)); + char *quantifier = strVal(lfirst(lc2)); + char initial; + + initial = pattern_initial(winstate, vname); + Assert(initial != 0); + appendStringInfoChar(pattern_str, initial); + if (quantifier[0]) + appendStringInfoChar(pattern_str, quantifier[0]); + + /* + * Register the initial at initial_index. If the initial appears more + * than once, all of it's initial_index will be recorded. This could + * happen if a pattern variable appears in the PATTERN clause more + * than once like "UP DOWN UP" "UP UP UP". + */ + variable_pos_register(variable_pos, initial, initial_index); + + initial_index++; + } + +#ifdef RPR_DEBUG + elog(DEBUG2, "pos: " INT64_FORMAT " pattern: %s", + pos, pattern_str->data); +#endif + + /* look for matching pattern variable sequence */ +#ifdef RPR_DEBUG + elog(DEBUG1, "search_str_set started"); +#endif + num_matched_rows = search_str_set(pattern_str->data, + str_set, variable_pos); +#ifdef RPR_DEBUG + elog(DEBUG1, "search_str_set returns: %d", num_matched_rows); +#endif + variable_pos_discard(variable_pos); + string_set_discard(str_set); + + /* + * We are at the first row in the reduced frame. Save the number of + * matched rows as the number of rows in the reduced frame. + */ + if (num_matched_rows <= 0) + { + /* no match */ + register_reduced_frame_map(winstate, original_pos, RF_UNMATCHED); + } + else + { + register_reduced_frame_map(winstate, original_pos, RF_FRAME_HEAD); + + for (i = original_pos + 1; i < original_pos + num_matched_rows; i++) + { + register_reduced_frame_map(winstate, i, RF_SKIPPED); + } + } + + return; +} + +/* + * search_str_set + * Perform pattern matching using "pattern" against str_set. pattern is a + * regular expression derived from PATTERN clause. Note that the regular + * expression string is prefixed by '^' and followed by initials represented + * in a same way as str_set. str_set is a set of StringInfo. Each StringInfo + * has a string comprising initials of pattern variable strings being true in + * a row. The initials are one of [a-y], parallel to the order of variable + * names in DEFINE clause. Suppose DEFINE has variables START, UP and DOWN. If + * PATTERN has START, UP+ and DOWN, then the initials in PATTERN will be 'a', + * 'b' and 'c'. The "pattern" will be "^ab+c". + * + * variable_pos is an array representing the order of pattern variable string + * initials in PATTERN clause. For example initial 'a' potion is in + * variable_pos[0].pos[0] = 0. Note that if the pattern is "START UP DOWN UP" + * (UP appears twice), then "UP" (initial is 'b') has two position 1 and + * 3. Thus variable_pos for b is variable_pos[1].pos[0] = 1 and + * variable_pos[1].pos[1] = 3. + * + * Returns the longest number of the matching rows (greedy matching) if + * quatifier '+' or '*' is included in "pattern". + */ +static +int +search_str_set(char *pattern, StringSet * str_set, VariablePos * variable_pos) +{ +#define MAX_CANDIDATE_NUM 10000 /* max pattern match candidate size */ +#define FREEZED_CHAR 'Z' /* a pattern is freezed if it ends with the + * char */ +#define DISCARD_CHAR 'z' /* a pattern is not need to keep */ + + int set_size; /* number of rows in the set */ + int resultlen; + int index; + StringSet *old_str_set, + *new_str_set; + int new_str_size; + int len; + + set_size = string_set_get_size(str_set); + new_str_set = string_set_init(); + len = 0; + resultlen = 0; + + /* + * Generate all possible pattern variable name initials as a set of + * StringInfo named "new_str_set". For example, if we have two rows + * having "ab" (row 0) and "ac" (row 1) in the input str_set, new_str_set + * will have set of StringInfo "aa", "ac", "ba" and "bc" in the end. + */ +#ifdef RPR_DEBUG + elog(DEBUG1, "pattern: %s set_size: %d", pattern, set_size); +#endif + for (index = 0; index < set_size; index++) + { + StringInfo str; /* search target row */ + char *p; + int old_set_size; + int i; + +#ifdef RPR_DEBUG + elog(DEBUG1, "index: %d", index); +#endif + if (index == 0) + { + /* copy variables in row 0 */ + str = string_set_get(str_set, index); + p = str->data; + + /* + * Loop over each new pattern variable char. + */ + while (*p) + { + StringInfo new = makeStringInfo(); + + /* add pattern variable char */ + appendStringInfoChar(new, *p); + /* add new one to string set */ + string_set_add(new_str_set, new); +#ifdef RPR_DEBUG + elog(DEBUG1, "old_str: NULL new_str: %s", new->data); +#endif + p++; /* next pattern variable */ + } + } + else /* index != 0 */ + { + old_str_set = new_str_set; + new_str_set = string_set_init(); + str = string_set_get(str_set, index); + old_set_size = string_set_get_size(old_str_set); + + /* + * Loop over each rows in the previous result set. + */ + for (i = 0; i < old_set_size; i++) + { + StringInfo new; + char last_old_char; + int old_str_len; + StringInfo old = string_set_get(old_str_set, i); + + p = old->data; + old_str_len = strlen(p); + if (old_str_len > 0) + last_old_char = p[old_str_len - 1]; + else + last_old_char = '\0'; + + /* Is this old set freezed? */ + if (last_old_char == FREEZED_CHAR) + { + /* if shorter match. we can discard it */ + if ((old_str_len - 1) < resultlen) + { +#ifdef RPR_DEBUG + elog(DEBUG1, "discard this old set because shorter match: %s", + old->data); +#endif + continue; + } + +#ifdef RPR_DEBUG + elog(DEBUG1, "keep this old set: %s", old->data); +#endif + + /* move the old set to new_str_set */ + string_set_add(new_str_set, old); + old_str_set->str_set[i] = NULL; + continue; + } + /* Can this old set be discarded? */ + else if (last_old_char == DISCARD_CHAR) + { +#ifdef RPR_DEBUG + elog(DEBUG1, "discard this old set: %s", old->data); +#endif + continue; + } + +#ifdef RPR_DEBUG + elog(DEBUG1, "str->data: %s", str->data); +#endif + + /* + * loop over each pattern variable initial char in the input + * set. + */ + for (p = str->data; *p; p++) + { + /* + * Optimization. Check if the row's pattern variable + * initial character position is greater than or equal to + * the old set's last pattern variable initial character + * position. For example, if the old set's last pattern + * variable initials are "ab", then the new pattern + * variable initial can be "b" or "c" but can not be "a", + * if the initials in PATTERN is something like "a b c" or + * "a b+ c+" etc. This optimization is possible when we + * only allow "+" quantifier. + */ + if (variable_pos_compare(variable_pos, last_old_char, *p)) + { + /* copy source string */ + new = makeStringInfo(); + enlargeStringInfo(new, old->len + 1); + appendStringInfoString(new, old->data); + /* add pattern variable char */ + appendStringInfoChar(new, *p); +#ifdef RPR_DEBUG + elog(DEBUG1, "old_str: %s new_str: %s", + old->data, new->data); +#endif + + /* + * Adhoc optimization. If the first letter in the + * input string is the first and second position one + * and there's no associated quatifier '+', then we + * can dicard the input because there's no chace to + * expand the string further. + * + * For example, pattern "abc" cannot match "aa". + */ +#ifdef RPR_DEBUG + elog(DEBUG1, "pattern[1]:%c pattern[2]:%c new[0]:%c new[1]:%c", + pattern[1], pattern[2], new->data[0], new->data[1]); +#endif + if (pattern[1] == new->data[0] && + pattern[1] == new->data[1] && + pattern[2] != '+' && + pattern[1] != pattern[2]) + { +#ifdef RPR_DEBUG + elog(DEBUG1, "discard this new data: %s", + new->data); +#endif + pfree(new->data); + pfree(new); + continue; + } + + /* add new one to string set */ + string_set_add(new_str_set, new); + } + else + { + /* + * We are freezing this pattern string. Since there's + * no chance to expand the string further, we perform + * pattern matching against the string. If it does not + * match, we can discard it. + */ + len = do_pattern_match(pattern, old->data); + + if (len <= 0) + { + /* no match. we can discard it */ + continue; + } + + else if (len <= resultlen) + { + /* shorter match. we can discard it */ + continue; + } + else + { + /* match length is the longest so far */ + + int new_index; + + /* remember the longest match */ + resultlen = len; + + /* freeze the pattern string */ + new = makeStringInfo(); + enlargeStringInfo(new, old->len + 1); + appendStringInfoString(new, old->data); + /* add freezed mark */ + appendStringInfoChar(new, FREEZED_CHAR); +#ifdef RPR_DEBUG + elog(DEBUG1, "old_str: %s new_str: %s", old->data, new->data); +#endif + string_set_add(new_str_set, new); + + /* + * Search new_str_set to find out freezed entries + * that have shorter match length. Mark them as + * "discard" so that they are discarded in the + * next round. + */ + + /* new_index_size should be the one before */ + new_str_size = + string_set_get_size(new_str_set) - 1; + + /* loop over new_str_set */ + for (new_index = 0; new_index < new_str_size; + new_index++) + { + char new_last_char; + int new_str_len; + + new = string_set_get(new_str_set, new_index); + new_str_len = strlen(new->data); + if (new_str_len > 0) + { + new_last_char = + new->data[new_str_len - 1]; + if (new_last_char == FREEZED_CHAR && + (new_str_len - 1) <= len) + { + /* + * mark this set to discard in the + * next round + */ + appendStringInfoChar(new, DISCARD_CHAR); +#ifdef RPR_DEBUG + elog(DEBUG1, "add discard char: %s", new->data); +#endif + } + } + } + } + } + } + } + /* we no longer need old string set */ + string_set_discard(old_str_set); + } + } + + /* + * Perform pattern matching to find out the longest match. + */ + new_str_size = string_set_get_size(new_str_set); +#ifdef RPR_DEBUG + elog(DEBUG1, "new_str_size: %d", new_str_size); +#endif + len = 0; + resultlen = 0; + + for (index = 0; index < new_str_size; index++) + { + StringInfo s; + + s = string_set_get(new_str_set, index); + if (s == NULL) + continue; /* no data */ + +#ifdef RPR_DEBUG + elog(DEBUG1, "target string: %s", s->data); +#endif + len = do_pattern_match(pattern, s->data); + if (len > resultlen) + { + /* remember the longest match */ + resultlen = len; + + /* + * If the size of result set is equal to the number of rows in the + * set, we are done because it's not possible that the number of + * matching rows exceeds the number of rows in the set. + */ + if (resultlen >= set_size) + break; + } + } + + /* we no longer need new string set */ + string_set_discard(new_str_set); + + return resultlen; +} + +/* + * do_pattern_match + * perform pattern match using pattern against encoded_str. + * returns matching number of rows if matching is succeeded. + * Otherwise returns 0. + */ +static +int +do_pattern_match(char *pattern, char *encoded_str) +{ + Datum d; + text *res; + char *substr; + int len = 0; + text *pattern_text, + *encoded_str_text; + + pattern_text = cstring_to_text(pattern); + encoded_str_text = cstring_to_text(encoded_str); + + /* + * We first perform pattern matching using regexp_instr, then call + * textregexsubstr to get matched substring to know how long the matched + * string is. That is the number of rows in the reduced window frame. The + * reason why we can't call textregexsubstr in the first place is, it + * errors out if pattern does not match. + */ + if (DatumGetInt32(DirectFunctionCall2Coll( + regexp_instr, DEFAULT_COLLATION_OID, + PointerGetDatum(encoded_str_text), + PointerGetDatum(pattern_text)))) + { + d = DirectFunctionCall2Coll(textregexsubstr, + DEFAULT_COLLATION_OID, + PointerGetDatum(encoded_str_text), + PointerGetDatum(pattern_text)); + if (d != 0) + { + res = DatumGetTextPP(d); + substr = text_to_cstring(res); + len = strlen(substr); + pfree(substr); + } + } + pfree(encoded_str_text); + pfree(pattern_text); + + return len; +} + +/* + * evaluate_pattern + * Evaluate expression associated with PATTERN variable vname. current_pos is + * relative row position in a frame (starting from 0). If vname is evaluated + * to true, initial letters associated with vname is appended to + * encode_str. result is out paramater representing the expression evaluation + * result is true of false. + *--------- + * Return values are: + * >=0: the last match absolute row position + * otherwise out of frame. + *--------- + */ +static +int64 +evaluate_pattern(WindowObject winobj, int64 current_pos, + char *vname, StringInfo encoded_str, bool *result) +{ + WindowAggState *winstate = winobj->winstate; + ExprContext *econtext = winstate->ss.ps.ps_ExprContext; + ListCell *lc1, + *lc2, + *lc3; + ExprState *pat; + Datum eval_result; + bool out_of_frame = false; + bool isnull; + TupleTableSlot *slot; + + forthree(lc1, winstate->defineVariableList, + lc2, winstate->defineClauseList, + lc3, winstate->defineInitial) + { + char initial; /* initial letter associated with vname */ + char *name = strVal(lfirst(lc1)); + + if (strcmp(vname, name)) + continue; + + initial = *(strVal(lfirst(lc3))); + + /* set expression to evaluate */ + pat = lfirst(lc2); + + /* get current, previous and next tuples */ + if (!get_slots(winobj, current_pos)) + { + out_of_frame = true; + } + else + { + /* evaluate the expression */ + eval_result = ExecEvalExpr(pat, econtext, &isnull); + if (isnull) + { + /* expression is NULL */ +#ifdef RPR_DEBUG + elog(DEBUG1, "expression for %s is NULL at row: " INT64_FORMAT, + vname, current_pos); +#endif + *result = false; + } + else + { + if (!DatumGetBool(eval_result)) + { + /* expression is false */ +#ifdef RPR_DEBUG + elog(DEBUG1, "expression for %s is false at row: " INT64_FORMAT, + vname, current_pos); +#endif + *result = false; + } + else + { + /* expression is true */ +#ifdef RPR_DEBUG + elog(DEBUG1, "expression for %s is true at row: " INT64_FORMAT, + vname, current_pos); +#endif + appendStringInfoChar(encoded_str, initial); + *result = true; + } + } + + slot = winstate->temp_slot_1; + if (slot != winstate->null_slot) + ExecClearTuple(slot); + slot = winstate->prev_slot; + if (slot != winstate->null_slot) + ExecClearTuple(slot); + slot = winstate->next_slot; + if (slot != winstate->null_slot) + ExecClearTuple(slot); + + break; + } + + if (out_of_frame) + { + *result = false; + return -1; + } + } + return current_pos; +} + +/* + * get_slots + * Get current, previous and next tuples. + * Returns false if current row is out of partition/full frame. + */ +static +bool +get_slots(WindowObject winobj, int64 current_pos) +{ + WindowAggState *winstate = winobj->winstate; + TupleTableSlot *slot; + int ret; + ExprContext *econtext; + + econtext = winstate->ss.ps.ps_ExprContext; + + /* set up current row tuple slot */ + slot = winstate->temp_slot_1; + if (!window_gettupleslot(winobj, current_pos, slot)) + { +#ifdef RPR_DEBUG + elog(DEBUG1, "current row is out of partition at:" INT64_FORMAT, + current_pos); +#endif + return false; + } + ret = row_is_in_frame(winstate, current_pos, slot); + if (ret <= 0) + { +#ifdef RPR_DEBUG + elog(DEBUG1, "current row is out of frame at: " INT64_FORMAT, + current_pos); +#endif + ExecClearTuple(slot); + return false; + } + econtext->ecxt_outertuple = slot; + + /* for PREV */ + if (current_pos > 0) + { + slot = winstate->prev_slot; + if (!window_gettupleslot(winobj, current_pos - 1, slot)) + { +#ifdef RPR_DEBUG + elog(DEBUG1, "previous row is out of partition at: " INT64_FORMAT, + current_pos - 1); +#endif + econtext->ecxt_scantuple = winstate->null_slot; + } + else + { + ret = row_is_in_frame(winstate, current_pos - 1, slot); + if (ret <= 0) + { +#ifdef RPR_DEBUG + elog(DEBUG1, "previous row is out of frame at: " INT64_FORMAT, + current_pos - 1); +#endif + ExecClearTuple(slot); + econtext->ecxt_scantuple = winstate->null_slot; + } + else + { + econtext->ecxt_scantuple = slot; + } + } + } + else + econtext->ecxt_scantuple = winstate->null_slot; + + /* for NEXT */ + slot = winstate->next_slot; + if (!window_gettupleslot(winobj, current_pos + 1, slot)) + { +#ifdef RPR_DEBUG + elog(DEBUG1, "next row is out of partiton at: " INT64_FORMAT, + current_pos + 1); +#endif + econtext->ecxt_innertuple = winstate->null_slot; + } + else + { + ret = row_is_in_frame(winstate, current_pos + 1, slot); + if (ret <= 0) + { +#ifdef RPR_DEBUG + elog(DEBUG1, "next row is out of frame at: " INT64_FORMAT, + current_pos + 1); +#endif + ExecClearTuple(slot); + econtext->ecxt_innertuple = winstate->null_slot; + } + else + econtext->ecxt_innertuple = slot; + } + return true; +} + +/* + * pattern_initial + * Return pattern variable initial character + * matching with pattern variable name vname. + * If not found, return 0. + */ +static +char +pattern_initial(WindowAggState *winstate, char *vname) +{ + char initial; + char *name; + ListCell *lc1, + *lc2; + + forboth(lc1, winstate->defineVariableList, + lc2, winstate->defineInitial) + { + name = strVal(lfirst(lc1)); /* DEFINE variable name */ + initial = *(strVal(lfirst(lc2))); /* DEFINE variable initial */ + + + if (!strcmp(name, vname)) + return initial; /* found */ + } + return 0; +} + +/* + * string_set_init + * Create dynamic set of StringInfo. + */ +static +StringSet * string_set_init(void) +{ +/* Initial allocation size of str_set */ +#define STRING_SET_ALLOC_SIZE 1024 + + StringSet *string_set; + Size set_size; + + string_set = palloc0(sizeof(StringSet)); + string_set->set_index = 0; + set_size = STRING_SET_ALLOC_SIZE; + string_set->str_set = palloc(set_size * sizeof(StringInfo)); + string_set->set_size = set_size; + + return string_set; +} + +/* + * string_set_add + * Add StringInfo str to StringSet string_set. + */ +static +void +string_set_add(StringSet * string_set, StringInfo str) +{ + Size set_size; + + set_size = string_set->set_size; + if (string_set->set_index >= set_size) + { + set_size *= 2; + string_set->str_set = repalloc(string_set->str_set, + set_size * sizeof(StringInfo)); + string_set->set_size = set_size; + } + + string_set->str_set[string_set->set_index++] = str; + + return; +} + +/* + * string_set_get + * Returns StringInfo specified by index. + * If there's no data yet, returns NULL. + */ +static +StringInfo +string_set_get(StringSet * string_set, int index) +{ + /* no data? */ + if (index == 0 && string_set->set_index == 0) + return NULL; + + if (index < 0 || index >= string_set->set_index) + elog(ERROR, "invalid index: %d", index); + + return string_set->str_set[index]; +} + +/* + * string_set_get_size + * Returns the size of StringSet. + */ +static +int +string_set_get_size(StringSet * string_set) +{ + return string_set->set_index; +} + +/* + * string_set_discard + * Discard StringSet. + * All memory including StringSet itself is freed. + */ +static +void +string_set_discard(StringSet * string_set) +{ + int i; + + for (i = 0; i < string_set->set_index; i++) + { + StringInfo str = string_set->str_set[i]; + + if (str) + { + pfree(str->data); + pfree(str); + } + } + pfree(string_set->str_set); + pfree(string_set); +} + +/* + * variable_pos_init + * Create and initialize variable postion structure + */ +static +VariablePos * variable_pos_init(void) +{ + VariablePos *variable_pos; + + variable_pos = palloc(sizeof(VariablePos) * NUM_ALPHABETS); + MemSet(variable_pos, -1, sizeof(VariablePos) * NUM_ALPHABETS); + return variable_pos; +} + +/* + * variable_pos_register + * Register pattern variable whose initial is initial into postion index. + * pos is position of initial. + * If pos is already registered, register it at next empty slot. + */ +static +void +variable_pos_register(VariablePos * variable_pos, char initial, int pos) +{ + int index = initial - 'a'; + int slot; + int i; + + if (pos < 0 || pos > NUM_ALPHABETS) + elog(ERROR, "initial is not valid char: %c", initial); + + for (i = 0; i < NUM_ALPHABETS; i++) + { + slot = variable_pos[index].pos[i]; + if (slot < 0) + { + /* empty slot found */ + variable_pos[index].pos[i] = pos; + return; + } + } + elog(ERROR, "no empty slot for initial: %c", initial); +} + +/* + * variable_pos_compare + * Returns true if initial1 can be followed by initial2 + */ +static +bool +variable_pos_compare(VariablePos * variable_pos, char initial1, char initial2) +{ + int index1, + index2; + int pos1, + pos2; + + for (index1 = 0;; index1++) + { + pos1 = variable_pos_fetch(variable_pos, initial1, index1); + if (pos1 < 0) + break; + + for (index2 = 0;; index2++) + { + pos2 = variable_pos_fetch(variable_pos, initial2, index2); + if (pos2 < 0) + break; + if (pos1 <= pos2) + return true; + } + } + return false; +} + +/* + * variable_pos_fetch + * Fetch position of pattern variable whose initial is initial, and whose index + * is index. If no postion was registered by initial, index, returns -1. + */ +static +int +variable_pos_fetch(VariablePos * variable_pos, char initial, int index) +{ + int pos = initial - 'a'; + + if (pos < 0 || pos > NUM_ALPHABETS) + elog(ERROR, "initial is not valid char: %c", initial); + + if (index < 0 || index > NUM_ALPHABETS) + elog(ERROR, "index is not valid: %d", index); + + return variable_pos[pos].pos[index]; +} + +/* + * variable_pos_discard + * Discard VariablePos + */ +static +void +variable_pos_discard(VariablePos * variable_pos) +{ + pfree(variable_pos); +} diff --git a/src/backend/utils/adt/windowfuncs.c b/src/backend/utils/adt/windowfuncs.c index 473c61569f..92c528d38c 100644 --- a/src/backend/utils/adt/windowfuncs.c +++ b/src/backend/utils/adt/windowfuncs.c @@ -13,6 +13,9 @@ */ #include "postgres.h" +#include "catalog/pg_collation_d.h" +#include "executor/executor.h" +#include "nodes/execnodes.h" #include "nodes/parsenodes.h" #include "nodes/supportnodes.h" #include "utils/fmgrprotos.h" @@ -37,11 +40,19 @@ typedef struct int64 remainder; /* (total rows) % (bucket num) */ } ntile_context; +/* + * rpr process information. + * Used for AFTER MATCH SKIP PAST LAST ROW + */ +typedef struct SkipContext +{ + int64 pos; /* last row absolute position */ +} SkipContext; + static bool rank_up(WindowObject winobj); static Datum leadlag_common(FunctionCallInfo fcinfo, bool forward, bool withoffset, bool withdefault); - /* * utility routine for *_rank functions. */ @@ -674,7 +685,7 @@ window_last_value(PG_FUNCTION_ARGS) bool isnull; result = WinGetFuncArgInFrame(winobj, 0, - 0, WINDOW_SEEK_TAIL, true, + 0, WINDOW_SEEK_TAIL, false, &isnull, NULL); if (isnull) PG_RETURN_NULL(); @@ -714,3 +725,25 @@ window_nth_value(PG_FUNCTION_ARGS) PG_RETURN_DATUM(result); } + +/* + * prev + * Dummy function to invoke RPR's navigation operator "PREV". + * This is *not* a window function. + */ +Datum +window_prev(PG_FUNCTION_ARGS) +{ + PG_RETURN_DATUM(PG_GETARG_DATUM(0)); +} + +/* + * next + * Dummy function to invoke RPR's navigation operation "NEXT". + * This is *not* a window function. + */ +Datum +window_next(PG_FUNCTION_ARGS) +{ + PG_RETURN_DATUM(PG_GETARG_DATUM(0)); +} diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 6a5476d3c4..5e7506dabb 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -10479,6 +10479,12 @@ { oid => '3114', descr => 'fetch the Nth row value', proname => 'nth_value', prokind => 'w', prorettype => 'anyelement', proargtypes => 'anyelement int4', prosrc => 'window_nth_value' }, +{ oid => '6122', descr => 'previous value', + proname => 'prev', provolatile => 's', prorettype => 'anyelement', + proargtypes => 'anyelement', prosrc => 'window_prev' }, +{ oid => '6123', descr => 'next value', + proname => 'next', provolatile => 's', prorettype => 'anyelement', + proargtypes => 'anyelement', prosrc => 'window_next' }, # functions for range types { oid => '3832', descr => 'I/O', diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index 8bc421e7c0..4dd9a17eca 100644 --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -2554,6 +2554,11 @@ typedef enum WindowAggStatus * tuples during spool */ } WindowAggStatus; +#define RF_NOT_DETERMINED 0 +#define RF_FRAME_HEAD 1 +#define RF_SKIPPED 2 +#define RF_UNMATCHED 3 + typedef struct WindowAggState { ScanState ss; /* its first field is NodeTag */ @@ -2602,6 +2607,19 @@ typedef struct WindowAggState int64 groupheadpos; /* current row's peer group head position */ int64 grouptailpos; /* " " " " tail position (group end+1) */ + /* these fields are used in Row pattern recognition: */ + RPSkipTo rpSkipTo; /* Row Pattern Skip To type */ + List *patternVariableList; /* list of row pattern variables names + * (list of String) */ + List *patternRegexpList; /* list of row pattern regular expressions + * ('+' or ''. list of String) */ + List *defineVariableList; /* list of row pattern definition + * variables (list of String) */ + List *defineClauseList; /* expression for row pattern definition + * search conditions ExprState list */ + List *defineInitial; /* list of row pattern definition variable + * initials (list of String) */ + MemoryContext partcontext; /* context for partition-lifespan data */ MemoryContext aggcontext; /* shared context for aggregate working data */ MemoryContext curaggcontext; /* current aggregate's working data */ @@ -2638,6 +2656,18 @@ typedef struct WindowAggState TupleTableSlot *agg_row_slot; TupleTableSlot *temp_slot_1; TupleTableSlot *temp_slot_2; + + /* temporary slots for RPR */ + TupleTableSlot *prev_slot; /* PREV row navigation operator */ + TupleTableSlot *next_slot; /* NEXT row navigation operator */ + TupleTableSlot *null_slot; /* all NULL slot */ + + /* + * Each byte corresponds to a row positioned at absolute its pos in + * partition. See above definition for RF_* + */ + char *reduced_frame_map; + int64 alloc_sz; /* size of the map */ } WindowAggState; /* ---------------- -- 2.25.1 From 35ca0b9e84432fd98091d4d26ea7232f5aed75cf Mon Sep 17 00:00:00 2001 From: Tatsuo Ishii <ishii@postgresql.org> Date: Fri, 24 May 2024 11:26:02 +0900 Subject: [PATCH v20 6/8] Row pattern recognition patch (docs). --- doc/src/sgml/advanced.sgml | 82 ++++++++++++++++++++++++++++++++++++ doc/src/sgml/func.sgml | 54 ++++++++++++++++++++++++ doc/src/sgml/ref/select.sgml | 38 ++++++++++++++++- 3 files changed, 172 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/advanced.sgml b/doc/src/sgml/advanced.sgml index 755c9f1485..b0b1d1c51e 100644 --- a/doc/src/sgml/advanced.sgml +++ b/doc/src/sgml/advanced.sgml @@ -537,6 +537,88 @@ WHERE pos < 3; <literal>rank</literal> less than 3. </para> + <para> + Row pattern common syntax can be used to perform row pattern recognition + in a query. The row pattern common syntax includes two sub + clauses: <literal>DEFINE</literal> + and <literal>PATTERN</literal>. <literal>DEFINE</literal> defines + definition variables along with an expression. The expression must be a + logical expression, which means it must + return <literal>TRUE</literal>, <literal>FALSE</literal> + or <literal>NULL</literal>. The expression may comprise column references + and functions. Window functions, aggregate functions and subqueries are + not allowed. An example of <literal>DEFINE</literal> is as follows. + +<programlisting> +DEFINE + LOWPRICE AS price <= 100, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +</programlisting> + + Note that <function>PREV</function> returns the price column in the + previous row if it's called in a context of row pattern recognition. Thus in + the second line the definition variable "UP" is <literal>TRUE</literal> + when the price column in the current row is greater than the price column + in the previous row. Likewise, "DOWN" is <literal>TRUE</literal> when when + the price column in the current row is lower than the price column in the + previous row. + </para> + <para> + Once <literal>DEFINE</literal> exists, <literal>PATTERN</literal> can be + used. <literal>PATTERN</literal> defines a sequence of rows that satisfies + certain conditions. For example following <literal>PATTERN</literal> + defines that a row starts with the condition "LOWPRICE", then one or more + rows satisfy "UP" and finally one or more rows satisfy "DOWN". Note that + "+" means one or more matches. Also you can use "*", which means zero or + more matches. If a sequence of rows which satisfies the PATTERN is found, + in the starting row of the sequence of rows all window functions and + aggregates are shown in the target list. Note that aggregations only look + into the matched rows, rather than whole frame. On the second or + subsequent rows all window functions are NULL. Aggregates are NULL or 0 + (count case) depending on its aggregation definition. For rows that do not + match on the PATTERN, all window functions and aggregates are shown AS + NULL too, except count showing 0. This is because the rows do not match, + thus they are in an empty frame. Example of a <literal>SELECT</literal> + using the <literal>DEFINE</literal> and <literal>PATTERN</literal> clause + is as follows. + +<programlisting> +SELECT company, tdate, price, + first_value(price) OVER w, + max(price) OVER w, + count(price) OVER w +FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + INITIAL + PATTERN (LOWPRICE UP+ DOWN+) + DEFINE + LOWPRICE AS price <= 100, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); +</programlisting> +<screen> + company | tdate | price | first_value | max | count +----------+------------+-------+-------------+-----+------- + company1 | 2023-07-01 | 100 | 100 | 200 | 4 + company1 | 2023-07-02 | 200 | | | 0 + company1 | 2023-07-03 | 150 | | | 0 + company1 | 2023-07-04 | 140 | | | 0 + company1 | 2023-07-05 | 150 | | | 0 + company1 | 2023-07-06 | 90 | 90 | 130 | 4 + company1 | 2023-07-07 | 110 | | | 0 + company1 | 2023-07-08 | 130 | | | 0 + company1 | 2023-07-09 | 120 | | | 0 + company1 | 2023-07-10 | 130 | | | 0 +(10 rows) +</screen> + </para> + <para> When a query involves multiple window functions, it is possible to write out each one with a separate <literal>OVER</literal> clause, but this is diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 17c44bc338..8dbab31300 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -23124,6 +23124,7 @@ SELECT count(*) FROM sometable; returns <literal>NULL</literal> if there is no such row. </para></entry> </row> + </tbody> </tgroup> </table> @@ -23163,6 +23164,59 @@ SELECT count(*) FROM sometable; Other frame specifications can be used to obtain other effects. </para> + <para> + Row pattern recognition navigation functions are listed in + <xref linkend="functions-rpr-navigation-table"/>. These functions + can be used to describe DEFINE clause of Row pattern recognition. + </para> + + <table id="functions-rpr-navigation-table"> + <title>Row Pattern Navigation Functions</title> + <tgroup cols="1"> + <thead> + <row> + <entry role="func_table_entry"><para role="func_signature"> + Function + </para> + <para> + Description + </para></entry> + </row> + </thead> + + <tbody> + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>prev</primary> + </indexterm> + <function>prev</function> ( <parameter>value</parameter> <type>anyelement</type> ) + <returnvalue>anyelement</returnvalue> + </para> + <para> + Returns the column value at the previous row; + returns NULL if there is no previous row in the window frame. + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>next</primary> + </indexterm> + <function>next</function> ( <parameter>value</parameter> <type>anyelement</type> ) + <returnvalue>anyelement</returnvalue> + </para> + <para> + Returns the column value at the next row; + returns NULL if there is no next row in the window frame. + </para></entry> + </row> + + </tbody> + </tgroup> + </table> + <note> <para> The SQL standard defines a <literal>RESPECT NULLS</literal> or diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index 066aed44e6..8f18718d58 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -969,8 +969,8 @@ WINDOW <replaceable class="parameter">window_name</replaceable> AS ( <replaceabl The <replaceable class="parameter">frame_clause</replaceable> can be one of <synopsis> -{ RANGE | ROWS | GROUPS } <replaceable>frame_start</replaceable> [ <replaceable>frame_exclusion</replaceable> ] -{ RANGE | ROWS | GROUPS } BETWEEN <replaceable>frame_start</replaceable> AND <replaceable>frame_end</replaceable> [ <replaceable>frame_exclusion</replaceable>] +{ RANGE | ROWS | GROUPS } <replaceable>frame_start</replaceable> [ <replaceable>frame_exclusion</replaceable> ] [row_pattern_common_syntax] +{ RANGE | ROWS | GROUPS } BETWEEN <replaceable>frame_start</replaceable> AND <replaceable>frame_end</replaceable> [ <replaceable>frame_exclusion</replaceable>] [row_pattern_common_syntax] </synopsis> where <replaceable>frame_start</replaceable> @@ -1077,6 +1077,40 @@ EXCLUDE NO OTHERS a given peer group will be in the frame or excluded from it. </para> + <para> + The + optional <replaceable class="parameter">row_pattern_common_syntax</replaceable> + defines the <firstterm>row pattern recognition condition</firstterm> for + this + window. <replaceable class="parameter">row_pattern_common_syntax</replaceable> + includes following subclauses. <literal>AFTER MATCH SKIP PAST LAST + ROW</literal> or <literal>AFTER MATCH SKIP TO NEXT ROW</literal> controls + how to proceed to next row position after a match + found. With <literal>AFTER MATCH SKIP PAST LAST ROW</literal> (the + default) next row position is next to the last row of previous match. On + the other hand, with <literal>AFTER MATCH SKIP TO NEXT ROW</literal> next + row position is always next to the last row of previous + match. <literal>DEFINE</literal> defines definition variables along with a + boolean expression. <literal>PATTERN</literal> defines a sequence of rows + that satisfies certain conditions using variables defined + in <literal>DEFINE</literal> clause. If the variable is not defined in + the <literal>DEFINE</literal> clause, it is implicitly assumed + following is defined in the <literal>DEFINE</literal> clause. + +<synopsis> +<literal>variable_name</literal> AS TRUE +</synopsis> + + Note that the maximu number of variables defined + in <literal>DEFINE</literal> clause is 26. + +<synopsis> +[ AFTER MATCH SKIP PAST LAST ROW | AFTER MATCH SKIP TO NEXT ROW ] +PATTERN <replaceable class="parameter">pattern_variable_name</replaceable>[+] [, ...] +DEFINE <replaceable class="parameter">definition_varible_name</replaceable> AS <replaceable class="parameter">expression</replaceable>[, ...] +</synopsis> + </para> + <para> The purpose of a <literal>WINDOW</literal> clause is to specify the behavior of <firstterm>window functions</firstterm> appearing in the query's -- 2.25.1 From 2092d80f01b924a86bedd40845c5919c96643377 Mon Sep 17 00:00:00 2001 From: Tatsuo Ishii <ishii@postgresql.org> Date: Fri, 24 May 2024 11:26:02 +0900 Subject: [PATCH v20 7/8] Row pattern recognition patch (tests). --- src/test/regress/expected/rpr.out | 836 +++++++++++++++++++++++++++++ src/test/regress/parallel_schedule | 2 +- src/test/regress/sql/rpr.sql | 405 ++++++++++++++ 3 files changed, 1242 insertions(+), 1 deletion(-) create mode 100644 src/test/regress/expected/rpr.out create mode 100644 src/test/regress/sql/rpr.sql diff --git a/src/test/regress/expected/rpr.out b/src/test/regress/expected/rpr.out new file mode 100644 index 0000000000..0789e09435 --- /dev/null +++ b/src/test/regress/expected/rpr.out @@ -0,0 +1,836 @@ +-- +-- Test for row pattern definition clause +-- +CREATE TEMP TABLE stock ( + company TEXT, + tdate DATE, + price INTEGER +); +INSERT INTO stock VALUES ('company1', '2023-07-01', 100); +INSERT INTO stock VALUES ('company1', '2023-07-02', 200); +INSERT INTO stock VALUES ('company1', '2023-07-03', 150); +INSERT INTO stock VALUES ('company1', '2023-07-04', 140); +INSERT INTO stock VALUES ('company1', '2023-07-05', 150); +INSERT INTO stock VALUES ('company1', '2023-07-06', 90); +INSERT INTO stock VALUES ('company1', '2023-07-07', 110); +INSERT INTO stock VALUES ('company1', '2023-07-08', 130); +INSERT INTO stock VALUES ('company1', '2023-07-09', 120); +INSERT INTO stock VALUES ('company1', '2023-07-10', 130); +INSERT INTO stock VALUES ('company2', '2023-07-01', 50); +INSERT INTO stock VALUES ('company2', '2023-07-02', 2000); +INSERT INTO stock VALUES ('company2', '2023-07-03', 1500); +INSERT INTO stock VALUES ('company2', '2023-07-04', 1400); +INSERT INTO stock VALUES ('company2', '2023-07-05', 1500); +INSERT INTO stock VALUES ('company2', '2023-07-06', 60); +INSERT INTO stock VALUES ('company2', '2023-07-07', 1100); +INSERT INTO stock VALUES ('company2', '2023-07-08', 1300); +INSERT INTO stock VALUES ('company2', '2023-07-09', 1200); +INSERT INTO stock VALUES ('company2', '2023-07-10', 1300); +SELECT * FROM stock; + company | tdate | price +----------+------------+------- + company1 | 07-01-2023 | 100 + company1 | 07-02-2023 | 200 + company1 | 07-03-2023 | 150 + company1 | 07-04-2023 | 140 + company1 | 07-05-2023 | 150 + company1 | 07-06-2023 | 90 + company1 | 07-07-2023 | 110 + company1 | 07-08-2023 | 130 + company1 | 07-09-2023 | 120 + company1 | 07-10-2023 | 130 + company2 | 07-01-2023 | 50 + company2 | 07-02-2023 | 2000 + company2 | 07-03-2023 | 1500 + company2 | 07-04-2023 | 1400 + company2 | 07-05-2023 | 1500 + company2 | 07-06-2023 | 60 + company2 | 07-07-2023 | 1100 + company2 | 07-08-2023 | 1300 + company2 | 07-09-2023 | 1200 + company2 | 07-10-2023 | 1300 +(20 rows) + +-- basic test using PREV +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w, + nth_value(tdate, 2) OVER w AS nth_second + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP+ DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + company | tdate | price | first_value | last_value | nth_second +----------+------------+-------+-------------+------------+------------ + company1 | 07-01-2023 | 100 | 100 | 140 | 07-02-2023 + company1 | 07-02-2023 | 200 | | | + company1 | 07-03-2023 | 150 | | | + company1 | 07-04-2023 | 140 | | | + company1 | 07-05-2023 | 150 | | | + company1 | 07-06-2023 | 90 | 90 | 120 | 07-07-2023 + company1 | 07-07-2023 | 110 | | | + company1 | 07-08-2023 | 130 | | | + company1 | 07-09-2023 | 120 | | | + company1 | 07-10-2023 | 130 | | | + company2 | 07-01-2023 | 50 | 50 | 1400 | 07-02-2023 + company2 | 07-02-2023 | 2000 | | | + company2 | 07-03-2023 | 1500 | | | + company2 | 07-04-2023 | 1400 | | | + company2 | 07-05-2023 | 1500 | | | + company2 | 07-06-2023 | 60 | 60 | 1200 | 07-07-2023 + company2 | 07-07-2023 | 1100 | | | + company2 | 07-08-2023 | 1300 | | | + company2 | 07-09-2023 | 1200 | | | + company2 | 07-10-2023 | 1300 | | | +(20 rows) + +-- basic test using PREV. UP appears twice +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w, + nth_value(tdate, 2) OVER w AS nth_second + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP+ DOWN+ UP+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + company | tdate | price | first_value | last_value | nth_second +----------+------------+-------+-------------+------------+------------ + company1 | 07-01-2023 | 100 | 100 | 150 | 07-02-2023 + company1 | 07-02-2023 | 200 | | | + company1 | 07-03-2023 | 150 | | | + company1 | 07-04-2023 | 140 | | | + company1 | 07-05-2023 | 150 | | | + company1 | 07-06-2023 | 90 | 90 | 130 | 07-07-2023 + company1 | 07-07-2023 | 110 | | | + company1 | 07-08-2023 | 130 | | | + company1 | 07-09-2023 | 120 | | | + company1 | 07-10-2023 | 130 | | | + company2 | 07-01-2023 | 50 | 50 | 1500 | 07-02-2023 + company2 | 07-02-2023 | 2000 | | | + company2 | 07-03-2023 | 1500 | | | + company2 | 07-04-2023 | 1400 | | | + company2 | 07-05-2023 | 1500 | | | + company2 | 07-06-2023 | 60 | 60 | 1300 | 07-07-2023 + company2 | 07-07-2023 | 1100 | | | + company2 | 07-08-2023 | 1300 | | | + company2 | 07-09-2023 | 1200 | | | + company2 | 07-10-2023 | 1300 | | | +(20 rows) + +-- basic test using PREV. Use '*' +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w, + nth_value(tdate, 2) OVER w AS nth_second + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP* DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + company | tdate | price | first_value | last_value | nth_second +----------+------------+-------+-------------+------------+------------ + company1 | 07-01-2023 | 100 | 100 | 140 | 07-02-2023 + company1 | 07-02-2023 | 200 | | | + company1 | 07-03-2023 | 150 | | | + company1 | 07-04-2023 | 140 | | | + company1 | 07-05-2023 | 150 | 150 | 90 | 07-06-2023 + company1 | 07-06-2023 | 90 | | | + company1 | 07-07-2023 | 110 | 110 | 120 | 07-08-2023 + company1 | 07-08-2023 | 130 | | | + company1 | 07-09-2023 | 120 | | | + company1 | 07-10-2023 | 130 | | | + company2 | 07-01-2023 | 50 | 50 | 1400 | 07-02-2023 + company2 | 07-02-2023 | 2000 | | | + company2 | 07-03-2023 | 1500 | | | + company2 | 07-04-2023 | 1400 | | | + company2 | 07-05-2023 | 1500 | 1500 | 60 | 07-06-2023 + company2 | 07-06-2023 | 60 | | | + company2 | 07-07-2023 | 1100 | 1100 | 1200 | 07-08-2023 + company2 | 07-08-2023 | 1300 | | | + company2 | 07-09-2023 | 1200 | | | + company2 | 07-10-2023 | 1300 | | | +(20 rows) + +-- basic test with none greedy pattern +SELECT company, tdate, price, count(*) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (A A A) + DEFINE + A AS price >= 140 AND price <= 150 +); + company | tdate | price | count +----------+------------+-------+------- + company1 | 07-01-2023 | 100 | 0 + company1 | 07-02-2023 | 200 | 0 + company1 | 07-03-2023 | 150 | 3 + company1 | 07-04-2023 | 140 | 0 + company1 | 07-05-2023 | 150 | 0 + company1 | 07-06-2023 | 90 | 0 + company1 | 07-07-2023 | 110 | 0 + company1 | 07-08-2023 | 130 | 0 + company1 | 07-09-2023 | 120 | 0 + company1 | 07-10-2023 | 130 | 0 + company2 | 07-01-2023 | 50 | 0 + company2 | 07-02-2023 | 2000 | 0 + company2 | 07-03-2023 | 1500 | 0 + company2 | 07-04-2023 | 1400 | 0 + company2 | 07-05-2023 | 1500 | 0 + company2 | 07-06-2023 | 60 | 0 + company2 | 07-07-2023 | 1100 | 0 + company2 | 07-08-2023 | 1300 | 0 + company2 | 07-09-2023 | 1200 | 0 + company2 | 07-10-2023 | 1300 | 0 +(20 rows) + +-- last_value() should remain consistent +SELECT company, tdate, price, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP+ DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + company | tdate | price | last_value +----------+------------+-------+------------ + company1 | 07-01-2023 | 100 | 140 + company1 | 07-02-2023 | 200 | + company1 | 07-03-2023 | 150 | + company1 | 07-04-2023 | 140 | + company1 | 07-05-2023 | 150 | + company1 | 07-06-2023 | 90 | 120 + company1 | 07-07-2023 | 110 | + company1 | 07-08-2023 | 130 | + company1 | 07-09-2023 | 120 | + company1 | 07-10-2023 | 130 | + company2 | 07-01-2023 | 50 | 1400 + company2 | 07-02-2023 | 2000 | + company2 | 07-03-2023 | 1500 | + company2 | 07-04-2023 | 1400 | + company2 | 07-05-2023 | 1500 | + company2 | 07-06-2023 | 60 | 1200 + company2 | 07-07-2023 | 1100 | + company2 | 07-08-2023 | 1300 | + company2 | 07-09-2023 | 1200 | + company2 | 07-10-2023 | 1300 | +(20 rows) + +-- omit "START" in DEFINE but it is ok because "START AS TRUE" is +-- implicitly defined. per spec. +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w, + nth_value(tdate, 2) OVER w AS nth_second + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP+ DOWN+) + DEFINE + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + company | tdate | price | first_value | last_value | nth_second +----------+------------+-------+-------------+------------+------------ + company1 | 07-01-2023 | 100 | 100 | 140 | 07-02-2023 + company1 | 07-02-2023 | 200 | | | + company1 | 07-03-2023 | 150 | | | + company1 | 07-04-2023 | 140 | | | + company1 | 07-05-2023 | 150 | | | + company1 | 07-06-2023 | 90 | 90 | 120 | 07-07-2023 + company1 | 07-07-2023 | 110 | | | + company1 | 07-08-2023 | 130 | | | + company1 | 07-09-2023 | 120 | | | + company1 | 07-10-2023 | 130 | | | + company2 | 07-01-2023 | 50 | 50 | 1400 | 07-02-2023 + company2 | 07-02-2023 | 2000 | | | + company2 | 07-03-2023 | 1500 | | | + company2 | 07-04-2023 | 1400 | | | + company2 | 07-05-2023 | 1500 | | | + company2 | 07-06-2023 | 60 | 60 | 1200 | 07-07-2023 + company2 | 07-07-2023 | 1100 | | | + company2 | 07-08-2023 | 1300 | | | + company2 | 07-09-2023 | 1200 | | | + company2 | 07-10-2023 | 1300 | | | +(20 rows) + +-- the first row start with less than or equal to 100 +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (LOWPRICE UP+ DOWN+) + DEFINE + LOWPRICE AS price <= 100, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + company | tdate | price | first_value | last_value +----------+------------+-------+-------------+------------ + company1 | 07-01-2023 | 100 | 100 | 140 + company1 | 07-02-2023 | 200 | | + company1 | 07-03-2023 | 150 | | + company1 | 07-04-2023 | 140 | | + company1 | 07-05-2023 | 150 | | + company1 | 07-06-2023 | 90 | 90 | 120 + company1 | 07-07-2023 | 110 | | + company1 | 07-08-2023 | 130 | | + company1 | 07-09-2023 | 120 | | + company1 | 07-10-2023 | 130 | | + company2 | 07-01-2023 | 50 | 50 | 1400 + company2 | 07-02-2023 | 2000 | | + company2 | 07-03-2023 | 1500 | | + company2 | 07-04-2023 | 1400 | | + company2 | 07-05-2023 | 1500 | | + company2 | 07-06-2023 | 60 | 60 | 1200 + company2 | 07-07-2023 | 1100 | | + company2 | 07-08-2023 | 1300 | | + company2 | 07-09-2023 | 1200 | | + company2 | 07-10-2023 | 1300 | | +(20 rows) + +-- second row raises 120% +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (LOWPRICE UP+ DOWN+) + DEFINE + LOWPRICE AS price <= 100, + UP AS price > PREV(price) * 1.2, + DOWN AS price < PREV(price) +); + company | tdate | price | first_value | last_value +----------+------------+-------+-------------+------------ + company1 | 07-01-2023 | 100 | 100 | 140 + company1 | 07-02-2023 | 200 | | + company1 | 07-03-2023 | 150 | | + company1 | 07-04-2023 | 140 | | + company1 | 07-05-2023 | 150 | | + company1 | 07-06-2023 | 90 | | + company1 | 07-07-2023 | 110 | | + company1 | 07-08-2023 | 130 | | + company1 | 07-09-2023 | 120 | | + company1 | 07-10-2023 | 130 | | + company2 | 07-01-2023 | 50 | 50 | 1400 + company2 | 07-02-2023 | 2000 | | + company2 | 07-03-2023 | 1500 | | + company2 | 07-04-2023 | 1400 | | + company2 | 07-05-2023 | 1500 | | + company2 | 07-06-2023 | 60 | | + company2 | 07-07-2023 | 1100 | | + company2 | 07-08-2023 | 1300 | | + company2 | 07-09-2023 | 1200 | | + company2 | 07-10-2023 | 1300 | | +(20 rows) + +-- using NEXT +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UPDOWN) + DEFINE + START AS TRUE, + UPDOWN AS price > PREV(price) AND price > NEXT(price) +); + company | tdate | price | first_value | last_value +----------+------------+-------+-------------+------------ + company1 | 07-01-2023 | 100 | 100 | 200 + company1 | 07-02-2023 | 200 | | + company1 | 07-03-2023 | 150 | | + company1 | 07-04-2023 | 140 | 140 | 150 + company1 | 07-05-2023 | 150 | | + company1 | 07-06-2023 | 90 | | + company1 | 07-07-2023 | 110 | 110 | 130 + company1 | 07-08-2023 | 130 | | + company1 | 07-09-2023 | 120 | | + company1 | 07-10-2023 | 130 | | + company2 | 07-01-2023 | 50 | 50 | 2000 + company2 | 07-02-2023 | 2000 | | + company2 | 07-03-2023 | 1500 | | + company2 | 07-04-2023 | 1400 | 1400 | 1500 + company2 | 07-05-2023 | 1500 | | + company2 | 07-06-2023 | 60 | | + company2 | 07-07-2023 | 1100 | 1100 | 1300 + company2 | 07-08-2023 | 1300 | | + company2 | 07-09-2023 | 1200 | | + company2 | 07-10-2023 | 1300 | | +(20 rows) + +-- using AFTER MATCH SKIP TO NEXT ROW +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP TO NEXT ROW + INITIAL + PATTERN (START UPDOWN) + DEFINE + START AS TRUE, + UPDOWN AS price > PREV(price) AND price > NEXT(price) +); + company | tdate | price | first_value | last_value +----------+------------+-------+-------------+------------ + company1 | 07-01-2023 | 100 | 100 | 200 + company1 | 07-02-2023 | 200 | | + company1 | 07-03-2023 | 150 | | + company1 | 07-04-2023 | 140 | 140 | 150 + company1 | 07-05-2023 | 150 | | + company1 | 07-06-2023 | 90 | | + company1 | 07-07-2023 | 110 | 110 | 130 + company1 | 07-08-2023 | 130 | | + company1 | 07-09-2023 | 120 | | + company1 | 07-10-2023 | 130 | | + company2 | 07-01-2023 | 50 | 50 | 2000 + company2 | 07-02-2023 | 2000 | | + company2 | 07-03-2023 | 1500 | | + company2 | 07-04-2023 | 1400 | 1400 | 1500 + company2 | 07-05-2023 | 1500 | | + company2 | 07-06-2023 | 60 | | + company2 | 07-07-2023 | 1100 | 1100 | 1300 + company2 | 07-08-2023 | 1300 | | + company2 | 07-09-2023 | 1200 | | + company2 | 07-10-2023 | 1300 | | +(20 rows) + +-- match everything +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + INITIAL + PATTERN (A+) + DEFINE + A AS TRUE +); + company | tdate | price | first_value | last_value +----------+------------+-------+-------------+------------ + company1 | 07-01-2023 | 100 | 100 | 130 + company1 | 07-02-2023 | 200 | | + company1 | 07-03-2023 | 150 | | + company1 | 07-04-2023 | 140 | | + company1 | 07-05-2023 | 150 | | + company1 | 07-06-2023 | 90 | | + company1 | 07-07-2023 | 110 | | + company1 | 07-08-2023 | 130 | | + company1 | 07-09-2023 | 120 | | + company1 | 07-10-2023 | 130 | | + company2 | 07-01-2023 | 50 | 50 | 1300 + company2 | 07-02-2023 | 2000 | | + company2 | 07-03-2023 | 1500 | | + company2 | 07-04-2023 | 1400 | | + company2 | 07-05-2023 | 1500 | | + company2 | 07-06-2023 | 60 | | + company2 | 07-07-2023 | 1100 | | + company2 | 07-08-2023 | 1300 | | + company2 | 07-09-2023 | 1200 | | + company2 | 07-10-2023 | 1300 | | +(20 rows) + +-- backtracking with reclassification of rows +-- using AFTER MATCH SKIP PAST LAST ROW +SELECT company, tdate, price, first_value(tdate) OVER w, last_value(tdate) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + INITIAL + PATTERN (A+ B+) + DEFINE + A AS price > 100, + B AS price > 100 +); + company | tdate | price | first_value | last_value +----------+------------+-------+-------------+------------ + company1 | 07-01-2023 | 100 | | + company1 | 07-02-2023 | 200 | 07-02-2023 | 07-05-2023 + company1 | 07-03-2023 | 150 | | + company1 | 07-04-2023 | 140 | | + company1 | 07-05-2023 | 150 | | + company1 | 07-06-2023 | 90 | | + company1 | 07-07-2023 | 110 | 07-07-2023 | 07-10-2023 + company1 | 07-08-2023 | 130 | | + company1 | 07-09-2023 | 120 | | + company1 | 07-10-2023 | 130 | | + company2 | 07-01-2023 | 50 | | + company2 | 07-02-2023 | 2000 | 07-02-2023 | 07-05-2023 + company2 | 07-03-2023 | 1500 | | + company2 | 07-04-2023 | 1400 | | + company2 | 07-05-2023 | 1500 | | + company2 | 07-06-2023 | 60 | | + company2 | 07-07-2023 | 1100 | 07-07-2023 | 07-10-2023 + company2 | 07-08-2023 | 1300 | | + company2 | 07-09-2023 | 1200 | | + company2 | 07-10-2023 | 1300 | | +(20 rows) + +-- backtracking with reclassification of rows +-- using AFTER MATCH SKIP TO NEXT ROW +SELECT company, tdate, price, first_value(tdate) OVER w, last_value(tdate) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP TO NEXT ROW + INITIAL + PATTERN (A+ B+) + DEFINE + A AS price > 100, + B AS price > 100 +); + company | tdate | price | first_value | last_value +----------+------------+-------+-------------+------------ + company1 | 07-01-2023 | 100 | | + company1 | 07-02-2023 | 200 | 07-02-2023 | 07-05-2023 + company1 | 07-03-2023 | 150 | 07-03-2023 | 07-05-2023 + company1 | 07-04-2023 | 140 | 07-04-2023 | 07-05-2023 + company1 | 07-05-2023 | 150 | | + company1 | 07-06-2023 | 90 | | + company1 | 07-07-2023 | 110 | 07-07-2023 | 07-10-2023 + company1 | 07-08-2023 | 130 | 07-08-2023 | 07-10-2023 + company1 | 07-09-2023 | 120 | 07-09-2023 | 07-10-2023 + company1 | 07-10-2023 | 130 | | + company2 | 07-01-2023 | 50 | | + company2 | 07-02-2023 | 2000 | 07-02-2023 | 07-05-2023 + company2 | 07-03-2023 | 1500 | 07-03-2023 | 07-05-2023 + company2 | 07-04-2023 | 1400 | 07-04-2023 | 07-05-2023 + company2 | 07-05-2023 | 1500 | | + company2 | 07-06-2023 | 60 | | + company2 | 07-07-2023 | 1100 | 07-07-2023 | 07-10-2023 + company2 | 07-08-2023 | 1300 | 07-08-2023 | 07-10-2023 + company2 | 07-09-2023 | 1200 | 07-09-2023 | 07-10-2023 + company2 | 07-10-2023 | 1300 | | +(20 rows) + +-- ROWS BETWEEN CURRENT ROW AND offset FOLLOWING +SELECT company, tdate, price, first_value(tdate) OVER w, last_value(tdate) OVER w, + count(*) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (START UP+ DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + company | tdate | price | first_value | last_value | count +----------+------------+-------+-------------+------------+------- + company1 | 07-01-2023 | 100 | 07-01-2023 | 07-03-2023 | 3 + company1 | 07-02-2023 | 200 | | | 0 + company1 | 07-03-2023 | 150 | | | 0 + company1 | 07-04-2023 | 140 | 07-04-2023 | 07-06-2023 | 3 + company1 | 07-05-2023 | 150 | | | 0 + company1 | 07-06-2023 | 90 | | | 0 + company1 | 07-07-2023 | 110 | 07-07-2023 | 07-09-2023 | 3 + company1 | 07-08-2023 | 130 | | | 0 + company1 | 07-09-2023 | 120 | | | 0 + company1 | 07-10-2023 | 130 | | | 0 + company2 | 07-01-2023 | 50 | 07-01-2023 | 07-03-2023 | 3 + company2 | 07-02-2023 | 2000 | | | 0 + company2 | 07-03-2023 | 1500 | | | 0 + company2 | 07-04-2023 | 1400 | 07-04-2023 | 07-06-2023 | 3 + company2 | 07-05-2023 | 1500 | | | 0 + company2 | 07-06-2023 | 60 | | | 0 + company2 | 07-07-2023 | 1100 | 07-07-2023 | 07-09-2023 | 3 + company2 | 07-08-2023 | 1300 | | | 0 + company2 | 07-09-2023 | 1200 | | | 0 + company2 | 07-10-2023 | 1300 | | | 0 +(20 rows) + +-- +-- Aggregates +-- +-- using AFTER MATCH SKIP PAST LAST ROW +SELECT company, tdate, price, + first_value(price) OVER w, + last_value(price) OVER w, + max(price) OVER w, + min(price) OVER w, + sum(price) OVER w, + avg(price) OVER w, + count(price) OVER w +FROM stock +WINDOW w AS ( +PARTITION BY company +ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING +AFTER MATCH SKIP PAST LAST ROW +INITIAL +PATTERN (START UP+ DOWN+) +DEFINE +START AS TRUE, +UP AS price > PREV(price), +DOWN AS price < PREV(price) +); + company | tdate | price | first_value | last_value | max | min | sum | avg | count +----------+------------+-------+-------------+------------+------+-----+------+-----------------------+------- + company1 | 07-01-2023 | 100 | 100 | 140 | 200 | 100 | 590 | 147.5000000000000000 | 4 + company1 | 07-02-2023 | 200 | | | | | | | 0 + company1 | 07-03-2023 | 150 | | | | | | | 0 + company1 | 07-04-2023 | 140 | | | | | | | 0 + company1 | 07-05-2023 | 150 | | | | | | | 0 + company1 | 07-06-2023 | 90 | 90 | 120 | 130 | 90 | 450 | 112.5000000000000000 | 4 + company1 | 07-07-2023 | 110 | | | | | | | 0 + company1 | 07-08-2023 | 130 | | | | | | | 0 + company1 | 07-09-2023 | 120 | | | | | | | 0 + company1 | 07-10-2023 | 130 | | | | | | | 0 + company2 | 07-01-2023 | 50 | 50 | 1400 | 2000 | 50 | 4950 | 1237.5000000000000000 | 4 + company2 | 07-02-2023 | 2000 | | | | | | | 0 + company2 | 07-03-2023 | 1500 | | | | | | | 0 + company2 | 07-04-2023 | 1400 | | | | | | | 0 + company2 | 07-05-2023 | 1500 | | | | | | | 0 + company2 | 07-06-2023 | 60 | 60 | 1200 | 1300 | 60 | 3660 | 915.0000000000000000 | 4 + company2 | 07-07-2023 | 1100 | | | | | | | 0 + company2 | 07-08-2023 | 1300 | | | | | | | 0 + company2 | 07-09-2023 | 1200 | | | | | | | 0 + company2 | 07-10-2023 | 1300 | | | | | | | 0 +(20 rows) + +-- using AFTER MATCH SKIP TO NEXT ROW +SELECT company, tdate, price, + first_value(price) OVER w, + last_value(price) OVER w, + max(price) OVER w, + min(price) OVER w, + sum(price) OVER w, + avg(price) OVER w, + count(price) OVER w +FROM stock +WINDOW w AS ( +PARTITION BY company +ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING +AFTER MATCH SKIP TO NEXT ROW +INITIAL +PATTERN (START UP+ DOWN+) +DEFINE +START AS TRUE, +UP AS price > PREV(price), +DOWN AS price < PREV(price) +); + company | tdate | price | first_value | last_value | max | min | sum | avg | count +----------+------------+-------+-------------+------------+------+------+------+-----------------------+------- + company1 | 07-01-2023 | 100 | 100 | 140 | 200 | 100 | 590 | 147.5000000000000000 | 4 + company1 | 07-02-2023 | 200 | | | | | | | 0 + company1 | 07-03-2023 | 150 | | | | | | | 0 + company1 | 07-04-2023 | 140 | 140 | 90 | 150 | 90 | 380 | 126.6666666666666667 | 3 + company1 | 07-05-2023 | 150 | | | | | | | 0 + company1 | 07-06-2023 | 90 | 90 | 120 | 130 | 90 | 450 | 112.5000000000000000 | 4 + company1 | 07-07-2023 | 110 | 110 | 120 | 130 | 110 | 360 | 120.0000000000000000 | 3 + company1 | 07-08-2023 | 130 | | | | | | | 0 + company1 | 07-09-2023 | 120 | | | | | | | 0 + company1 | 07-10-2023 | 130 | | | | | | | 0 + company2 | 07-01-2023 | 50 | 50 | 1400 | 2000 | 50 | 4950 | 1237.5000000000000000 | 4 + company2 | 07-02-2023 | 2000 | | | | | | | 0 + company2 | 07-03-2023 | 1500 | | | | | | | 0 + company2 | 07-04-2023 | 1400 | 1400 | 60 | 1500 | 60 | 2960 | 986.6666666666666667 | 3 + company2 | 07-05-2023 | 1500 | | | | | | | 0 + company2 | 07-06-2023 | 60 | 60 | 1200 | 1300 | 60 | 3660 | 915.0000000000000000 | 4 + company2 | 07-07-2023 | 1100 | 1100 | 1200 | 1300 | 1100 | 3600 | 1200.0000000000000000 | 3 + company2 | 07-08-2023 | 1300 | | | | | | | 0 + company2 | 07-09-2023 | 1200 | | | | | | | 0 + company2 | 07-10-2023 | 1300 | | | | | | | 0 +(20 rows) + +-- JOIN case +CREATE TEMP TABLE t1 (i int, v1 int); +CREATE TEMP TABLE t2 (j int, v2 int); +INSERT INTO t1 VALUES(1,10); +INSERT INTO t1 VALUES(1,11); +INSERT INTO t1 VALUES(1,12); +INSERT INTO t2 VALUES(2,10); +INSERT INTO t2 VALUES(2,11); +INSERT INTO t2 VALUES(2,12); +SELECT * FROM t1, t2 WHERE t1.v1 <= 11 AND t2.v2 <= 11; + i | v1 | j | v2 +---+----+---+---- + 1 | 10 | 2 | 10 + 1 | 10 | 2 | 11 + 1 | 11 | 2 | 10 + 1 | 11 | 2 | 11 +(4 rows) + +SELECT *, count(*) OVER w FROM t1, t2 +WINDOW w AS ( + PARTITION BY t1.i + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (A) + DEFINE + A AS v1 <= 11 AND v2 <= 11 +); + i | v1 | j | v2 | count +---+----+---+----+------- + 1 | 10 | 2 | 10 | 1 + 1 | 10 | 2 | 11 | 1 + 1 | 10 | 2 | 12 | 0 + 1 | 11 | 2 | 10 | 1 + 1 | 11 | 2 | 11 | 1 + 1 | 11 | 2 | 12 | 0 + 1 | 12 | 2 | 10 | 0 + 1 | 12 | 2 | 11 | 0 + 1 | 12 | 2 | 12 | 0 +(9 rows) + +-- WITH case +WITH wstock AS ( + SELECT * FROM stock WHERE tdate < '2023-07-08' +) +SELECT tdate, price, +first_value(tdate) OVER w, +count(*) OVER w + FROM wstock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP+ DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + tdate | price | first_value | count +------------+-------+-------------+------- + 07-01-2023 | 100 | 07-01-2023 | 4 + 07-02-2023 | 200 | | 0 + 07-03-2023 | 150 | | 0 + 07-04-2023 | 140 | | 0 + 07-05-2023 | 150 | | 0 + 07-06-2023 | 90 | | 0 + 07-07-2023 | 110 | | 0 + 07-01-2023 | 50 | 07-01-2023 | 4 + 07-02-2023 | 2000 | | 0 + 07-03-2023 | 1500 | | 0 + 07-04-2023 | 1400 | | 0 + 07-05-2023 | 1500 | | 0 + 07-06-2023 | 60 | | 0 + 07-07-2023 | 1100 | | 0 +(14 rows) + +-- +-- Error cases +-- +-- row pattern definition variable name must not appear more than once +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP+ DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price), + UP AS price > PREV(price) +); +ERROR: row pattern definition variable name "up" appears more than once in DEFINE clause +LINE 11: UP AS price > PREV(price), + ^ +-- subqueries in DEFINE clause are not supported +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START LOWPRICE) + DEFINE + START AS TRUE, + LOWPRICE AS price < (SELECT 100) +); +ERROR: cannot use subquery in DEFINE expression +LINE 11: LOWPRICE AS price < (SELECT 100) + ^ +-- aggregates in DEFINE clause are not supported +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START LOWPRICE) + DEFINE + START AS TRUE, + LOWPRICE AS price < count(*) +); +ERROR: aggregate functions are not allowed in DEFINE +LINE 11: LOWPRICE AS price < count(*) + ^ +-- FRAME must start at current row when row patttern recognition is used +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP+ DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); +ERROR: FRAME must start at current row when row patttern recognition is used +-- SEEK is not supported +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP TO NEXT ROW + SEEK + PATTERN (START UP+ DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); +ERROR: SEEK is not supported +LINE 8: SEEK + ^ +HINT: Use INITIAL. diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule index 969ced994f..3eaa36dca7 100644 --- a/src/test/regress/parallel_schedule +++ b/src/test/regress/parallel_schedule @@ -98,7 +98,7 @@ test: publication subscription # Another group of parallel tests # select_views depends on create_view # ---------- -test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data windowxmlmap functional_deps advisory_lock indirect_toast equivclass +test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data windowxmlmap functional_deps advisory_lock indirect_toast equivclass rpr # ---------- # Another group of parallel tests (JSON related) diff --git a/src/test/regress/sql/rpr.sql b/src/test/regress/sql/rpr.sql new file mode 100644 index 0000000000..302e2b86a5 --- /dev/null +++ b/src/test/regress/sql/rpr.sql @@ -0,0 +1,405 @@ +-- +-- Test for row pattern definition clause +-- + +CREATE TEMP TABLE stock ( + company TEXT, + tdate DATE, + price INTEGER +); +INSERT INTO stock VALUES ('company1', '2023-07-01', 100); +INSERT INTO stock VALUES ('company1', '2023-07-02', 200); +INSERT INTO stock VALUES ('company1', '2023-07-03', 150); +INSERT INTO stock VALUES ('company1', '2023-07-04', 140); +INSERT INTO stock VALUES ('company1', '2023-07-05', 150); +INSERT INTO stock VALUES ('company1', '2023-07-06', 90); +INSERT INTO stock VALUES ('company1', '2023-07-07', 110); +INSERT INTO stock VALUES ('company1', '2023-07-08', 130); +INSERT INTO stock VALUES ('company1', '2023-07-09', 120); +INSERT INTO stock VALUES ('company1', '2023-07-10', 130); +INSERT INTO stock VALUES ('company2', '2023-07-01', 50); +INSERT INTO stock VALUES ('company2', '2023-07-02', 2000); +INSERT INTO stock VALUES ('company2', '2023-07-03', 1500); +INSERT INTO stock VALUES ('company2', '2023-07-04', 1400); +INSERT INTO stock VALUES ('company2', '2023-07-05', 1500); +INSERT INTO stock VALUES ('company2', '2023-07-06', 60); +INSERT INTO stock VALUES ('company2', '2023-07-07', 1100); +INSERT INTO stock VALUES ('company2', '2023-07-08', 1300); +INSERT INTO stock VALUES ('company2', '2023-07-09', 1200); +INSERT INTO stock VALUES ('company2', '2023-07-10', 1300); + +SELECT * FROM stock; + +-- basic test using PREV +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w, + nth_value(tdate, 2) OVER w AS nth_second + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP+ DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + +-- basic test using PREV. UP appears twice +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w, + nth_value(tdate, 2) OVER w AS nth_second + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP+ DOWN+ UP+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + +-- basic test using PREV. Use '*' +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w, + nth_value(tdate, 2) OVER w AS nth_second + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP* DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + +-- basic test with none greedy pattern +SELECT company, tdate, price, count(*) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (A A A) + DEFINE + A AS price >= 140 AND price <= 150 +); + +-- last_value() should remain consistent +SELECT company, tdate, price, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP+ DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + +-- omit "START" in DEFINE but it is ok because "START AS TRUE" is +-- implicitly defined. per spec. +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w, + nth_value(tdate, 2) OVER w AS nth_second + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP+ DOWN+) + DEFINE + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + +-- the first row start with less than or equal to 100 +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (LOWPRICE UP+ DOWN+) + DEFINE + LOWPRICE AS price <= 100, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + +-- second row raises 120% +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (LOWPRICE UP+ DOWN+) + DEFINE + LOWPRICE AS price <= 100, + UP AS price > PREV(price) * 1.2, + DOWN AS price < PREV(price) +); + +-- using NEXT +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UPDOWN) + DEFINE + START AS TRUE, + UPDOWN AS price > PREV(price) AND price > NEXT(price) +); + +-- using AFTER MATCH SKIP TO NEXT ROW +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP TO NEXT ROW + INITIAL + PATTERN (START UPDOWN) + DEFINE + START AS TRUE, + UPDOWN AS price > PREV(price) AND price > NEXT(price) +); + +-- match everything + +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + INITIAL + PATTERN (A+) + DEFINE + A AS TRUE +); + +-- backtracking with reclassification of rows +-- using AFTER MATCH SKIP PAST LAST ROW +SELECT company, tdate, price, first_value(tdate) OVER w, last_value(tdate) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + INITIAL + PATTERN (A+ B+) + DEFINE + A AS price > 100, + B AS price > 100 +); + +-- backtracking with reclassification of rows +-- using AFTER MATCH SKIP TO NEXT ROW +SELECT company, tdate, price, first_value(tdate) OVER w, last_value(tdate) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP TO NEXT ROW + INITIAL + PATTERN (A+ B+) + DEFINE + A AS price > 100, + B AS price > 100 +); + +-- ROWS BETWEEN CURRENT ROW AND offset FOLLOWING +SELECT company, tdate, price, first_value(tdate) OVER w, last_value(tdate) OVER w, + count(*) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + PATTERN (START UP+ DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + +-- +-- Aggregates +-- + +-- using AFTER MATCH SKIP PAST LAST ROW +SELECT company, tdate, price, + first_value(price) OVER w, + last_value(price) OVER w, + max(price) OVER w, + min(price) OVER w, + sum(price) OVER w, + avg(price) OVER w, + count(price) OVER w +FROM stock +WINDOW w AS ( +PARTITION BY company +ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING +AFTER MATCH SKIP PAST LAST ROW +INITIAL +PATTERN (START UP+ DOWN+) +DEFINE +START AS TRUE, +UP AS price > PREV(price), +DOWN AS price < PREV(price) +); + +-- using AFTER MATCH SKIP TO NEXT ROW +SELECT company, tdate, price, + first_value(price) OVER w, + last_value(price) OVER w, + max(price) OVER w, + min(price) OVER w, + sum(price) OVER w, + avg(price) OVER w, + count(price) OVER w +FROM stock +WINDOW w AS ( +PARTITION BY company +ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING +AFTER MATCH SKIP TO NEXT ROW +INITIAL +PATTERN (START UP+ DOWN+) +DEFINE +START AS TRUE, +UP AS price > PREV(price), +DOWN AS price < PREV(price) +); + +-- JOIN case +CREATE TEMP TABLE t1 (i int, v1 int); +CREATE TEMP TABLE t2 (j int, v2 int); +INSERT INTO t1 VALUES(1,10); +INSERT INTO t1 VALUES(1,11); +INSERT INTO t1 VALUES(1,12); +INSERT INTO t2 VALUES(2,10); +INSERT INTO t2 VALUES(2,11); +INSERT INTO t2 VALUES(2,12); + +SELECT * FROM t1, t2 WHERE t1.v1 <= 11 AND t2.v2 <= 11; + +SELECT *, count(*) OVER w FROM t1, t2 +WINDOW w AS ( + PARTITION BY t1.i + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (A) + DEFINE + A AS v1 <= 11 AND v2 <= 11 +); + +-- WITH case +WITH wstock AS ( + SELECT * FROM stock WHERE tdate < '2023-07-08' +) +SELECT tdate, price, +first_value(tdate) OVER w, +count(*) OVER w + FROM wstock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP+ DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + +-- +-- Error cases +-- + +-- row pattern definition variable name must not appear more than once +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP+ DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price), + UP AS price > PREV(price) +); + +-- subqueries in DEFINE clause are not supported +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START LOWPRICE) + DEFINE + START AS TRUE, + LOWPRICE AS price < (SELECT 100) +); + +-- aggregates in DEFINE clause are not supported +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START LOWPRICE) + DEFINE + START AS TRUE, + LOWPRICE AS price < count(*) +); + +-- FRAME must start at current row when row patttern recognition is used +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP+ DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + +-- SEEK is not supported +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP TO NEXT ROW + SEEK + PATTERN (START UP+ DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); -- 2.25.1 From a53ac3fc04833b30c6fe7b386883e5e3479e477c Mon Sep 17 00:00:00 2001 From: Tatsuo Ishii <ishii@postgresql.org> Date: Fri, 24 May 2024 11:26:02 +0900 Subject: [PATCH v20 8/8] Allow to print raw parse tree. --- src/backend/tcop/postgres.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 45a3794b8e..ecbf8e7999 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -653,6 +653,10 @@ pg_parse_query(const char *query_string) } #endif + if (Debug_print_parse) + elog_node_display(LOG, "raw parse tree", raw_parsetree_list, + Debug_pretty_print); + TRACE_POSTGRESQL_QUERY_PARSE_DONE(query_string); return raw_parsetree_list; -- 2.25.1
pgsql-hackers by date: