Re: Row pattern recognition - Mailing list pgsql-hackers
From | Tatsuo Ishii |
---|---|
Subject | Re: Row pattern recognition |
Date | |
Msg-id | 20230902.155235.1032214204265330340.t-ishii@sranhm.sra.co.jp Whole thread Raw |
In response to | Re: Row pattern recognition (Tatsuo Ishii <ishii@sraoss.co.jp>) |
Responses |
Re: Row pattern recognition
|
List | pgsql-hackers |
Attached is the v5 patch. Differences from previous patch include: * Resolve complaint from "PostgreSQL Patch Tester" https://commitfest.postgresql.org/44/4460/ - Change gram.y to use PATTERN_P instead of PATTERN. Using PATTERN seems to make trouble with Visual Studio build. : : [10:07:57.853] FAILED: src/backend/parser/parser.a.p/meson-generated_.._gram.c.obj [10:07:57.853] "cl" "-Isrc\backend\parser\parser.a.p" "-Isrc\backend\parser" "-I..\src\backend\parser" "-Isrc\include" "-I..\src\include""-Ic:\openssl\1.1\include" "-I..\src\include\port\win32" "-I..\src\include\port\win32_msvc" "/MDd" "/nologo""/showIncludes" "/utf-8" "/W2" "/Od" "/Zi" "/DWIN32" "/DWINDOWS" "/D__WINDOWS__" "/D__WIN32__" "/D_CRT_SECURE_NO_DEPRECATE""/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244" "/wd4273" "/wd4101" "/wd4102" "/wd4090" "/wd4267""-DBUILDING_DLL" "/Fdsrc\backend\parser\parser.a.p\meson-generated_.._gram.c.pdb" /Fosrc/backend/parser/parser.a.p/meson-generated_.._gram.c.obj"/c" src/backend/parser/gram.c [10:07:57.860] c:\cirrus\build\src\backend\parser\gram.h(379): error C2365: 'PATTERN': redefinition; previous definitionwas 'typedef' [10:07:57.860] C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\um\wingdi.h(1375): note: see declaration of 'PATTERN' [10:07:57.860] c:\cirrus\build\src\backend\parser\gram.h(379): error C2086: 'yytokentype PATTERN': redefinition [10:07:57.860] c:\cirrus\build\src\backend\parser\gram.h(379): note: see declaration of 'PATTERN' [10:07:57.860] ninja: build stopped: subcommand failed. * Resolve complaint from "make headerscheck" - Change Windowapi.h and nodeWindowAgg.c to remove unecessary extern and public functions. Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp From 3e02bccbd3dc02304d6dc34f5ab6cc6dd2ee26d1 Mon Sep 17 00:00:00 2001 From: Tatsuo Ishii <ishii@postgresql.org> Date: Sat, 2 Sep 2023 15:32:49 +0900 Subject: [PATCH v5 1/7] Row pattern recognition patch for raw parser. --- src/backend/parser/gram.y | 216 +++++++++++++++++++++++++++++--- src/include/nodes/parsenodes.h | 56 +++++++++ src/include/parser/kwlist.h | 8 ++ src/include/parser/parse_node.h | 1 + 4 files changed, 267 insertions(+), 14 deletions(-) diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 7d2032885e..70409cdc9a 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -251,6 +251,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); DefElem *defelt; SortBy *sortby; WindowDef *windef; + RPCommonSyntax *rpcom; + RPSubsetItem *rpsubset; JoinExpr *jexpr; IndexElem *ielem; StatsElem *selem; @@ -453,8 +455,12 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); TriggerTransitions TriggerReferencing vacuum_relation_list opt_vacuum_relation_list drop_option_list pub_obj_list - -%type <node> opt_routine_body + row_pattern_measure_list row_pattern_definition_list + opt_row_pattern_subset_clause + row_pattern_subset_list row_pattern_subset_rhs + row_pattern +%type <rpsubset> row_pattern_subset_item +%type <node> opt_routine_body row_pattern_term %type <groupclause> group_clause %type <list> group_by_list %type <node> group_by_item empty_grouping_set rollup_clause cube_clause @@ -551,6 +557,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %type <range> relation_expr_opt_alias %type <node> tablesample_clause opt_repeatable_clause %type <target> target_el set_target insert_column_item + row_pattern_measure_item row_pattern_definition %type <str> generic_option_name %type <node> generic_option_arg @@ -633,6 +640,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %type <list> window_clause window_definition_list opt_partition_clause %type <windef> window_definition over_clause window_specification opt_frame_clause frame_extent frame_bound +%type <rpcom> opt_row_pattern_common_syntax opt_row_pattern_skip_to +%type <boolean> opt_row_pattern_initial_or_seek +%type <list> opt_row_pattern_measures %type <ival> opt_window_exclusion_clause %type <str> opt_existing_window_name %type <boolean> opt_if_not_exists @@ -659,7 +669,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); json_object_constructor_null_clause_opt json_array_constructor_null_clause_opt - /* * Non-keyword token types. These are hard-wired into the "flex" lexer. * They must be listed first so that their numeric codes do not depend on @@ -702,7 +711,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE DATA_P DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS - DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DEPTH DESC + DEFERRABLE DEFERRED DEFINE DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DEPTH DESC DETACH DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P DOUBLE_P DROP @@ -718,7 +727,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); HANDLER HAVING HEADER_P HOLD HOUR_P IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE - INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P + INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIAL INITIALLY INLINE_P INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION @@ -731,7 +740,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED - MAPPING MATCH MATCHED MATERIALIZED MAXVALUE MERGE METHOD + MAPPING MATCH MATCHED MATERIALIZED MAXVALUE MEASURES MERGE METHOD MINUTE_P MINVALUE MODE MONTH_P MOVE NAME_P NAMES NATIONAL NATURAL NCHAR NEW NEXT NFC NFD NFKC NFKD NO NONE @@ -743,8 +752,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); ORDER ORDINALITY OTHERS OUT_P OUTER_P OVER OVERLAPS OVERLAY OVERRIDING OWNED OWNER - PARALLEL PARAMETER PARSER PARTIAL PARTITION PASSING PASSWORD - PLACING PLANS POLICY + PARALLEL PARAMETER PARSER PARTIAL PARTITION PASSING PASSWORD PAST + PATTERN_P PERMUTE PLACING PLANS POLICY POSITION PRECEDING PRECISION PRESERVE PREPARE PREPARED PRIMARY PRIOR PRIVILEGES PROCEDURAL PROCEDURE PROCEDURES PROGRAM PUBLICATION @@ -755,12 +764,13 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP ROUTINE ROUTINES ROW ROWS RULE - SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SELECT + SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SEEK SELECT SEQUENCE SEQUENCES + SERIALIZABLE SERVER SESSION SESSION_USER SET SETS SETOF SHARE SHOW SIMILAR SIMPLE SKIP SMALLINT SNAPSHOT SOME SQL_P STABLE STANDALONE_P START STATEMENT STATISTICS STDIN STDOUT STORAGE STORED STRICT_P STRIP_P - SUBSCRIPTION SUBSTRING SUPPORT SYMMETRIC SYSID SYSTEM_P SYSTEM_USER + SUBSCRIPTION SUBSET SUBSTRING SUPPORT SYMMETRIC SYSID SYSTEM_P SYSTEM_USER TABLE TABLES TABLESAMPLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN TIES TIME TIMESTAMP TO TRAILING TRANSACTION TRANSFORM @@ -853,6 +863,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); */ %nonassoc UNBOUNDED /* ideally would have same precedence as IDENT */ %nonassoc IDENT PARTITION RANGE ROWS GROUPS PRECEDING FOLLOWING CUBE ROLLUP +%nonassoc MEASURES AFTER INITIAL SEEK PATTERN_P %left Op OPERATOR /* multi-character ops and user-defined operators */ %left '+' '-' %left '*' '/' '%' @@ -15857,7 +15868,8 @@ over_clause: OVER window_specification ; window_specification: '(' opt_existing_window_name opt_partition_clause - opt_sort_clause opt_frame_clause ')' + opt_sort_clause opt_row_pattern_measures opt_frame_clause + opt_row_pattern_common_syntax ')' { WindowDef *n = makeNode(WindowDef); @@ -15865,10 +15877,12 @@ window_specification: '(' opt_existing_window_name opt_partition_clause n->refname = $2; n->partitionClause = $3; n->orderClause = $4; + n->rowPatternMeasures = $5; /* copy relevant fields of opt_frame_clause */ - n->frameOptions = $5->frameOptions; - n->startOffset = $5->startOffset; - n->endOffset = $5->endOffset; + n->frameOptions = $6->frameOptions; + n->startOffset = $6->startOffset; + n->endOffset = $6->endOffset; + n->rpCommonSyntax = $7; n->location = @1; $$ = n; } @@ -15892,6 +15906,31 @@ opt_partition_clause: PARTITION BY expr_list { $$ = $3; } | /*EMPTY*/ { $$ = NIL; } ; +/* + * ROW PATTERN_P MEASURES + */ +opt_row_pattern_measures: MEASURES row_pattern_measure_list { $$ = $2; } + | /*EMPTY*/ { $$ = NIL; } + ; + +row_pattern_measure_list: + row_pattern_measure_item + { $$ = list_make1($1); } + | row_pattern_measure_list ',' row_pattern_measure_item + { $$ = lappend($1, $3); } + ; + +row_pattern_measure_item: + a_expr AS ColLabel + { + $$ = makeNode(ResTarget); + $$->name = $3; + $$->indirection = NIL; + $$->val = (Node *) $1; + $$->location = @1; + } + ; + /* * For frame clauses, we return a WindowDef, but only some fields are used: * frameOptions, startOffset, and endOffset. @@ -16051,6 +16090,139 @@ opt_window_exclusion_clause: | /*EMPTY*/ { $$ = 0; } ; +opt_row_pattern_common_syntax: +opt_row_pattern_skip_to opt_row_pattern_initial_or_seek + PATTERN_P '(' row_pattern ')' + opt_row_pattern_subset_clause + DEFINE row_pattern_definition_list + { + RPCommonSyntax *n = makeNode(RPCommonSyntax); + n->rpSkipTo = $1->rpSkipTo; + n->rpSkipVariable = $1->rpSkipVariable; + n->initial = $2; + n->rpPatterns = $5; + n->rpSubsetClause = $7; + n->rpDefs = $9; + $$ = n; + } + | /*EMPTY*/ { $$ = NULL; } + ; + +opt_row_pattern_skip_to: + AFTER MATCH SKIP TO NEXT ROW + { + RPCommonSyntax *n = makeNode(RPCommonSyntax); + n->rpSkipTo = ST_NEXT_ROW; + n->rpSkipVariable = NULL; + $$ = n; + } + | AFTER MATCH SKIP PAST LAST_P ROW + { + RPCommonSyntax *n = makeNode(RPCommonSyntax); + n->rpSkipTo = ST_PAST_LAST_ROW; + n->rpSkipVariable = NULL; + $$ = n; + } +/* + | AFTER MATCH SKIP TO FIRST_P ColId %prec FIRST_P + { + RPCommonSyntax *n = makeNode(RPCommonSyntax); + n->rpSkipTo = ST_FIRST_VARIABLE; + n->rpSkipVariable = $6; + $$ = n; + } + | AFTER MATCH SKIP TO LAST_P ColId %prec LAST_P + { + RPCommonSyntax *n = makeNode(RPCommonSyntax); + n->rpSkipTo = ST_LAST_VARIABLE; + n->rpSkipVariable = $6; + $$ = n; + } + * Shift/reduce + | AFTER MATCH SKIP TO ColId + { + RPCommonSyntax *n = makeNode(RPCommonSyntax); + n->rpSkipTo = ST_VARIABLE; + n->rpSkipVariable = $5; + $$ = n; + } +*/ + | /*EMPTY*/ + { + RPCommonSyntax *n = makeNode(RPCommonSyntax); + /* temporary set default to ST_NEXT_ROW */ + n->rpSkipTo = ST_PAST_LAST_ROW; + n->rpSkipVariable = NULL; + $$ = n; + } + ; + +opt_row_pattern_initial_or_seek: + INITIAL { $$ = true; } + | SEEK + { + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("SEEK is not supported"), + errhint("Use INITIAL."), + parser_errposition(@1))); + } + | /*EMPTY*/ { $$ = true; } + ; + +row_pattern: + row_pattern_term { $$ = list_make1($1); } + | row_pattern row_pattern_term { $$ = lappend($1, $2); } + ; + +row_pattern_term: + ColId { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "", (Node *)makeString($1), NULL, @1); } + | ColId '*' { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "*", (Node *)makeString($1), NULL, @1); } + | ColId '+' { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "+", (Node *)makeString($1), NULL, @1); } + | ColId '?' { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "?", (Node *)makeString($1), NULL, @1); } + ; + +opt_row_pattern_subset_clause: + SUBSET row_pattern_subset_list { $$ = $2; } + | /*EMPTY*/ { $$ = NIL; } + ; + +row_pattern_subset_list: + row_pattern_subset_item { $$ = list_make1($1); } + | row_pattern_subset_list ',' row_pattern_subset_item { $$ = lappend($1, $3); } + | /*EMPTY*/ { $$ = NIL; } + ; + +row_pattern_subset_item: ColId '=' '(' row_pattern_subset_rhs ')' + { + RPSubsetItem *n = makeNode(RPSubsetItem); + n->name = $1; + n->rhsVariable = $4; + $$ = n; + } + ; + +row_pattern_subset_rhs: + ColId { $$ = list_make1(makeStringConst($1, @1)); } + | row_pattern_subset_rhs ',' ColId { $$ = lappend($1, makeStringConst($3, @1)); } + | /*EMPTY*/ { $$ = NIL; } + ; + +row_pattern_definition_list: + row_pattern_definition { $$ = list_make1($1); } + | row_pattern_definition_list ',' row_pattern_definition { $$ = lappend($1, $3); } + ; + +row_pattern_definition: + ColId AS a_expr + { + $$ = makeNode(ResTarget); + $$->name = $1; + $$->indirection = NIL; + $$->val = (Node *) $3; + $$->location = @1; + } + ; /* * Supporting nonterminals for expressions. @@ -17146,6 +17318,7 @@ unreserved_keyword: | INDEXES | INHERIT | INHERITS + | INITIAL | INLINE_P | INPUT_P | INSENSITIVE @@ -17173,6 +17346,7 @@ unreserved_keyword: | MATCHED | MATERIALIZED | MAXVALUE + | MEASURES | MERGE | METHOD | MINUTE_P @@ -17215,6 +17389,9 @@ unreserved_keyword: | PARTITION | PASSING | PASSWORD + | PAST + | PATTERN_P + | PERMUTE | PLANS | POLICY | PRECEDING @@ -17265,6 +17442,7 @@ unreserved_keyword: | SEARCH | SECOND_P | SECURITY + | SEEK | SEQUENCE | SEQUENCES | SERIALIZABLE @@ -17290,6 +17468,7 @@ unreserved_keyword: | STRICT_P | STRIP_P | SUBSCRIPTION + | SUBSET | SUPPORT | SYSID | SYSTEM_P @@ -17477,6 +17656,7 @@ reserved_keyword: | CURRENT_USER | DEFAULT | DEFERRABLE + | DEFINE | DESC | DISTINCT | DO @@ -17639,6 +17819,7 @@ bare_label_keyword: | DEFAULTS | DEFERRABLE | DEFERRED + | DEFINE | DEFINER | DELETE_P | DELIMITER @@ -17714,6 +17895,7 @@ bare_label_keyword: | INDEXES | INHERIT | INHERITS + | INITIAL | INITIALLY | INLINE_P | INNER_P @@ -17763,6 +17945,7 @@ bare_label_keyword: | MATCHED | MATERIALIZED | MAXVALUE + | MEASURES | MERGE | METHOD | MINVALUE @@ -17816,6 +17999,9 @@ bare_label_keyword: | PARTITION | PASSING | PASSWORD + | PAST + | PATTERN_P + | PERMUTE | PLACING | PLANS | POLICY @@ -17872,6 +18058,7 @@ bare_label_keyword: | SCROLL | SEARCH | SECURITY + | SEEK | SELECT | SEQUENCE | SEQUENCES @@ -17903,6 +18090,7 @@ bare_label_keyword: | STRICT_P | STRIP_P | SUBSCRIPTION + | SUBSET | SUBSTRING | SUPPORT | SYMMETRIC diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index fef4c714b8..4cc1fb417b 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -547,6 +547,44 @@ typedef struct SortBy int location; /* operator location, or -1 if none/unknown */ } SortBy; +/* + * AFTER MATCH row pattern skip to types in row pattern common syntax + */ +typedef enum RPSkipTo +{ + ST_NONE, /* AFTER MATCH omitted */ + ST_NEXT_ROW, /* SKIP TO NEXT ROW */ + ST_PAST_LAST_ROW, /* SKIP TO PAST LAST ROW */ + ST_FIRST_VARIABLE, /* SKIP TO FIRST variable name */ + ST_LAST_VARIABLE, /* SKIP TO LAST variable name */ + ST_VARIABLE /* SKIP TO variable name */ +} RPSkipTo; + +/* + * Row Pattern SUBSET clause item + */ +typedef struct RPSubsetItem +{ + NodeTag type; + char *name; /* Row Pattern SUBSET clause variable name */ + List *rhsVariable; /* Row Pattern SUBSET rhs variables (list of char *string) */ +} RPSubsetItem; + +/* + * RowPatternCommonSyntax - raw representation of row pattern common syntax + * + */ +typedef struct RPCommonSyntax +{ + NodeTag type; + RPSkipTo rpSkipTo; /* Row Pattern AFTER MATCH SKIP type */ + char *rpSkipVariable; /* Row Pattern Skip To variable name, if any */ + bool initial; /* true if <row pattern initial or seek> is initial */ + List *rpPatterns; /* PATTERN variables (list of A_Expr) */ + List *rpSubsetClause; /* row pattern subset clause (list of RPSubsetItem), if any */ + List *rpDefs; /* row pattern definitions clause (list of ResTarget) */ +} RPCommonSyntax; + /* * WindowDef - raw representation of WINDOW and OVER clauses * @@ -562,6 +600,8 @@ typedef struct WindowDef char *refname; /* referenced window name, if any */ List *partitionClause; /* PARTITION BY expression list */ List *orderClause; /* ORDER BY (list of SortBy) */ + List *rowPatternMeasures; /* row pattern measures (list of ResTarget) */ + RPCommonSyntax *rpCommonSyntax; /* row pattern common syntax */ int frameOptions; /* frame_clause options, see below */ Node *startOffset; /* expression for starting bound, if any */ Node *endOffset; /* expression for ending bound, if any */ @@ -1483,6 +1523,11 @@ typedef struct GroupingSet * the orderClause might or might not be copied (see copiedOrder); the framing * options are never copied, per spec. * + * "defineClause" is Row Pattern Recognition DEFINE clause (list of + * TargetEntry). TargetEntry.resname represents row pattern definition + * variable name. "patternVariable" and "patternRegexp" represents PATTERN + * clause. + * * The information relevant for the query jumbling is the partition clause * type and its bounds. */ @@ -1514,6 +1559,17 @@ typedef struct WindowClause Index winref; /* ID referenced by window functions */ /* did we copy orderClause from refname? */ bool copiedOrder pg_node_attr(query_jumble_ignore); + /* Row Pattern AFTER MACH SKIP clause */ + RPSkipTo rpSkipTo; /* Row Pattern Skip To type */ + bool initial; /* true if <row pattern initial or seek> is initial */ + /* Row Pattern DEFINE clause (list of TargetEntry) */ + List *defineClause; + /* Row Pattern DEFINE variable initial names (list of String) */ + List *defineInitial; + /* Row Pattern PATTERN variable name (list of String) */ + List *patternVariable; + /* Row Pattern PATTERN regular expression quantifier ('+' or ''. list of String) */ + List *patternRegexp; } WindowClause; /* diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h index 5984dcfa4b..2804333b53 100644 --- a/src/include/parser/kwlist.h +++ b/src/include/parser/kwlist.h @@ -128,6 +128,7 @@ PG_KEYWORD("default", DEFAULT, RESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("defaults", DEFAULTS, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("deferrable", DEFERRABLE, RESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("deferred", DEFERRED, UNRESERVED_KEYWORD, BARE_LABEL) +PG_KEYWORD("define", DEFINE, RESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("definer", DEFINER, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("delete", DELETE_P, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("delimiter", DELIMITER, UNRESERVED_KEYWORD, BARE_LABEL) @@ -212,6 +213,7 @@ PG_KEYWORD("index", INDEX, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("indexes", INDEXES, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("inherit", INHERIT, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("inherits", INHERITS, UNRESERVED_KEYWORD, BARE_LABEL) +PG_KEYWORD("initial", INITIAL, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("initially", INITIALLY, RESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("inline", INLINE_P, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("inner", INNER_P, TYPE_FUNC_NAME_KEYWORD, BARE_LABEL) @@ -265,6 +267,7 @@ PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("matched", MATCHED, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("materialized", MATERIALIZED, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("maxvalue", MAXVALUE, UNRESERVED_KEYWORD, BARE_LABEL) +PG_KEYWORD("measures", MEASURES, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("merge", MERGE, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("method", METHOD, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("minute", MINUTE_P, UNRESERVED_KEYWORD, AS_LABEL) @@ -326,6 +329,9 @@ PG_KEYWORD("partial", PARTIAL, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("partition", PARTITION, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("passing", PASSING, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("password", PASSWORD, UNRESERVED_KEYWORD, BARE_LABEL) +PG_KEYWORD("past", PAST, UNRESERVED_KEYWORD, BARE_LABEL) +PG_KEYWORD("pattern", PATTERN_P, UNRESERVED_KEYWORD, BARE_LABEL) +PG_KEYWORD("permute", PERMUTE, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("placing", PLACING, RESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("plans", PLANS, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("policy", POLICY, UNRESERVED_KEYWORD, BARE_LABEL) @@ -385,6 +391,7 @@ PG_KEYWORD("scroll", SCROLL, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("search", SEARCH, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("second", SECOND_P, UNRESERVED_KEYWORD, AS_LABEL) PG_KEYWORD("security", SECURITY, UNRESERVED_KEYWORD, BARE_LABEL) +PG_KEYWORD("seek", SEEK, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("select", SELECT, RESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("sequence", SEQUENCE, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("sequences", SEQUENCES, UNRESERVED_KEYWORD, BARE_LABEL) @@ -416,6 +423,7 @@ PG_KEYWORD("stored", STORED, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("strict", STRICT_P, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("strip", STRIP_P, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("subscription", SUBSCRIPTION, UNRESERVED_KEYWORD, BARE_LABEL) +PG_KEYWORD("subset", SUBSET, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("substring", SUBSTRING, COL_NAME_KEYWORD, BARE_LABEL) PG_KEYWORD("support", SUPPORT, UNRESERVED_KEYWORD, BARE_LABEL) PG_KEYWORD("symmetric", SYMMETRIC, RESERVED_KEYWORD, BARE_LABEL) diff --git a/src/include/parser/parse_node.h b/src/include/parser/parse_node.h index f589112d5e..6640090910 100644 --- a/src/include/parser/parse_node.h +++ b/src/include/parser/parse_node.h @@ -51,6 +51,7 @@ typedef enum ParseExprKind EXPR_KIND_WINDOW_FRAME_RANGE, /* window frame clause with RANGE */ EXPR_KIND_WINDOW_FRAME_ROWS, /* window frame clause with ROWS */ EXPR_KIND_WINDOW_FRAME_GROUPS, /* window frame clause with GROUPS */ + EXPR_KIND_RPR_DEFINE, /* DEFINE */ EXPR_KIND_SELECT_TARGET, /* SELECT target list item */ EXPR_KIND_INSERT_TARGET, /* INSERT target list item */ EXPR_KIND_UPDATE_SOURCE, /* UPDATE assignment source item */ -- 2.25.1 From b6381fb376dad9b1f0ad19ff38b67d420932a707 Mon Sep 17 00:00:00 2001 From: Tatsuo Ishii <ishii@postgresql.org> Date: Sat, 2 Sep 2023 15:32:49 +0900 Subject: [PATCH v5 2/7] Row pattern recognition patch (parse/analysis). --- src/backend/parser/parse_agg.c | 7 + src/backend/parser/parse_clause.c | 292 +++++++++++++++++++++++++++++- src/backend/parser/parse_expr.c | 4 + src/backend/parser/parse_func.c | 3 + 4 files changed, 305 insertions(+), 1 deletion(-) diff --git a/src/backend/parser/parse_agg.c b/src/backend/parser/parse_agg.c index 85cd47b7ae..aa7a1cee80 100644 --- a/src/backend/parser/parse_agg.c +++ b/src/backend/parser/parse_agg.c @@ -564,6 +564,10 @@ check_agglevels_and_constraints(ParseState *pstate, Node *expr) errkind = true; break; + case EXPR_KIND_RPR_DEFINE: + errkind = true; + break; + /* * There is intentionally no default: case here, so that the * compiler will warn if we add a new ParseExprKind without @@ -953,6 +957,9 @@ transformWindowFuncCall(ParseState *pstate, WindowFunc *wfunc, case EXPR_KIND_CYCLE_MARK: errkind = true; break; + case EXPR_KIND_RPR_DEFINE: + errkind = true; + break; /* * There is intentionally no default: case here, so that the diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c index 334b9b42bd..60020a7025 100644 --- a/src/backend/parser/parse_clause.c +++ b/src/backend/parser/parse_clause.c @@ -100,7 +100,10 @@ static WindowClause *findWindowClause(List *wclist, const char *name); static Node *transformFrameOffset(ParseState *pstate, int frameOptions, Oid rangeopfamily, Oid rangeopcintype, Oid *inRangeFunc, Node *clause); - +static void transformRPR(ParseState *pstate, WindowClause *wc, WindowDef *windef, List **targetlist); +static List *transformDefineClause(ParseState *pstate, WindowClause *wc, WindowDef *windef, List **targetlist); +static void transformPatternClause(ParseState *pstate, WindowClause *wc, WindowDef *windef); +static List *transformMeasureClause(ParseState *pstate, WindowClause *wc, WindowDef *windef); /* * transformFromClause - @@ -2950,6 +2953,10 @@ transformWindowDefinitions(ParseState *pstate, rangeopfamily, rangeopcintype, &wc->endInRangeFunc, windef->endOffset); + + /* Process Row Pattern Recognition related clauses */ + transformRPR(pstate, wc, windef, targetlist); + wc->runCondition = NIL; wc->winref = winref; @@ -3815,3 +3822,286 @@ transformFrameOffset(ParseState *pstate, int frameOptions, return node; } + +/* + * transformRPR + * Process Row Pattern Recognition related clauses + */ +static void +transformRPR(ParseState *pstate, WindowClause *wc, WindowDef *windef, List **targetlist) +{ + /* + * Window definition exists? + */ + if (windef == NULL) + return; + + /* + * Row Pattern Common Syntax clause exists? + */ + if (windef->rpCommonSyntax == NULL) + return; + + /* Check Frame option. Frame must start at current row */ + if ((wc->frameOptions & FRAMEOPTION_START_CURRENT_ROW) == 0) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("FRAME must start at current row when row patttern recognition is used"))); + + /* Transform AFTER MACH SKIP TO clause */ + wc->rpSkipTo = windef->rpCommonSyntax->rpSkipTo; + + /* Transform SEEK or INITIAL clause */ + wc->initial = windef->rpCommonSyntax->initial; + + /* Transform DEFINE clause into list of TargetEntry's */ + wc->defineClause = transformDefineClause(pstate, wc, windef, targetlist); + + /* Check PATTERN clause and copy to patternClause */ + transformPatternClause(pstate, wc, windef); + + /* Transform MEASURE clause */ + transformMeasureClause(pstate, wc, windef); +} + +/* + * transformDefineClause Process DEFINE clause and transform ResTarget into + * list of TargetEntry. + * + * XXX we only support column reference in row pattern definition search + * condition, e.g. "price". <row pattern definition variable name>.<column + * reference> is not supported, e.g. "A.price". + */ +static List * +transformDefineClause(ParseState *pstate, WindowClause *wc, WindowDef *windef, List **targetlist) +{ + /* DEFINE variable name initials */ + static char *defineVariableInitials = "abcdefghijklmnopqrstuvwxyz"; + + ListCell *lc, *l; + ResTarget *restarget, *r; + List *restargets; + char *name; + int initialLen; + int i; + + /* + * If Row Definition Common Syntax exists, DEFINE clause must exist. + * (the raw parser should have already checked it.) + */ + Assert(windef->rpCommonSyntax->rpDefs != NULL); + + /* + * Check and add "A AS A IS TRUE" if pattern variable is missing in DEFINE + * per the SQL standard. + */ + restargets = NIL; + foreach(lc, windef->rpCommonSyntax->rpPatterns) + { + A_Expr *a; + bool found = false; + + if (!IsA(lfirst(lc), A_Expr)) + ereport(ERROR, + errmsg("node type is not A_Expr")); + + a = (A_Expr *)lfirst(lc); + name = strVal(a->lexpr); + + foreach(l, windef->rpCommonSyntax->rpDefs) + { + restarget = (ResTarget *)lfirst(l); + + if (!strcmp(restarget->name, name)) + { + found = true; + break; + } + } + + if (!found) + { + /* + * "name" is missing. So create "name AS name IS TRUE" ResTarget + * node and add it to the temporary list. + */ + A_Const *n; + + restarget = makeNode(ResTarget); + n = makeNode(A_Const); + n->val.boolval.type = T_Boolean; + n->val.boolval.boolval = true; + n->location = -1; + restarget->name = pstrdup(name); + restarget->indirection = NIL; + restarget->val = (Node *)n; + restarget->location = -1; + restargets = lappend((List *)restargets, restarget); + } + } + + if (list_length(restargets) >= 1) + { + /* add missing DEFINEs */ + windef->rpCommonSyntax->rpDefs = list_concat(windef->rpCommonSyntax->rpDefs, + restargets); + list_free(restargets); + } + + /* + * Check for duplicate row pattern definition variables. The standard + * requires that no two row pattern definition variable names shall be + * equivalent. + */ + restargets = NIL; + foreach(lc, windef->rpCommonSyntax->rpDefs) + { + restarget = (ResTarget *)lfirst(lc); + name = restarget->name; + + /* + * Add DEFINE expression (Restarget->val) to the targetlist as a + * TargetEntry if it does not exist yet. Planner will add the column + * ref var node to the outer plan's target list later on. This makes + * DEFINE expression could access the outer tuple while evaluating + * PATTERN. + * + * XXX: adding whole expressions of DEFINE to the plan.targetlist is + * not so good, because it's not necessary to evalute the expression + * in the target list while running the plan. We should extract the + * var nodes only then add them to the plan.targetlist. + */ + findTargetlistEntrySQL99(pstate, (Node *)restarget->val, targetlist, EXPR_KIND_RPR_DEFINE); + + /* + * Make sure that the row pattern definition search condition is a + * boolean expression. + */ + transformWhereClause(pstate, restarget->val, + EXPR_KIND_RPR_DEFINE, "DEFINE"); + + foreach(l, restargets) + { + char *n; + + r = (ResTarget *) lfirst(l); + n = r->name; + + if (!strcmp(n, name)) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("row pattern definition variable name \"%s\" appears more than once in DEFINE clause", + name), + parser_errposition(pstate, exprLocation((Node *)r)))); + } + restargets = lappend(restargets, restarget); + } + list_free(restargets); + + /* + * Create list of row pattern DEFINE variable name's initial. + * We assign [a-z] to them (up to 26 variable names are allowed). + */ + restargets = NIL; + i = 0; + initialLen = strlen(defineVariableInitials); + + foreach(lc, windef->rpCommonSyntax->rpDefs) + { + char initial[2]; + + restarget = (ResTarget *)lfirst(lc); + name = restarget->name; + + if (i >= initialLen) + { + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("number of row pattern definition variable names exceeds %d", initialLen), + parser_errposition(pstate, exprLocation((Node *)restarget)))); + } + initial[0] = defineVariableInitials[i++]; + initial[1] = '\0'; + wc->defineInitial = lappend(wc->defineInitial, makeString(pstrdup(initial))); + } + + return transformTargetList(pstate, windef->rpCommonSyntax->rpDefs, + EXPR_KIND_RPR_DEFINE); +} + +/* + * transformPatternClause + * Process PATTERN clause and return PATTERN clause in the raw parse tree + */ +static void +transformPatternClause(ParseState *pstate, WindowClause *wc, WindowDef *windef) +{ + ListCell *lc, *l; + + /* + * Row Pattern Common Syntax clause exists? + */ + if (windef->rpCommonSyntax == NULL) + return; + + /* + * Primary row pattern variable names in PATTERN clause must appear in + * DEFINE clause as row pattern definition variable names. + */ + wc->patternVariable = NIL; + wc->patternRegexp = NIL; + foreach(lc, windef->rpCommonSyntax->rpPatterns) + { + A_Expr *a; + char *name; + char *regexp; + bool found = false; + + if (!IsA(lfirst(lc), A_Expr)) + ereport(ERROR, + errmsg("node type is not A_Expr")); + + a = (A_Expr *)lfirst(lc); + name = strVal(a->lexpr); + + foreach(l, windef->rpCommonSyntax->rpDefs) + { + ResTarget *restarget = (ResTarget *)lfirst(l); + + if (!strcmp(restarget->name, name)) + { + found = true; + break; + } + } + + if (!found) + { + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("primary row pattern variable name \"%s\" does not appear in DEFINE clause", + name), + parser_errposition(pstate, exprLocation((Node *)a)))); + } + wc->patternVariable = lappend(wc->patternVariable, makeString(pstrdup(name))); + regexp = strVal(lfirst(list_head(a->name))); + wc->patternRegexp = lappend(wc->patternRegexp, makeString(pstrdup(regexp))); + } +} + +/* + * transformMeasureClause + * Process MEASURE clause + * XXX MEASURE clause is not supported yet + */ +static List * +transformMeasureClause(ParseState *pstate, WindowClause *wc, WindowDef *windef) +{ + if (windef->rowPatternMeasures == NIL) + return NIL; + + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("%s","MEASURE clause is not supported yet"), + parser_errposition(pstate, exprLocation((Node *)windef->rowPatternMeasures)))); +} diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 64c582c344..18b58ac263 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -557,6 +557,7 @@ transformColumnRef(ParseState *pstate, ColumnRef *cref) case EXPR_KIND_COPY_WHERE: case EXPR_KIND_GENERATED_COLUMN: case EXPR_KIND_CYCLE_MARK: + case EXPR_KIND_RPR_DEFINE: /* okay */ break; @@ -1770,6 +1771,7 @@ transformSubLink(ParseState *pstate, SubLink *sublink) case EXPR_KIND_VALUES: case EXPR_KIND_VALUES_SINGLE: case EXPR_KIND_CYCLE_MARK: + case EXPR_KIND_RPR_DEFINE: /* okay */ break; case EXPR_KIND_CHECK_CONSTRAINT: @@ -3149,6 +3151,8 @@ ParseExprKindName(ParseExprKind exprKind) return "GENERATED AS"; case EXPR_KIND_CYCLE_MARK: return "CYCLE"; + case EXPR_KIND_RPR_DEFINE: + return "DEFINE"; /* * There is intentionally no default: case here, so that the diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c index b3f0b6a137..2ff3699538 100644 --- a/src/backend/parser/parse_func.c +++ b/src/backend/parser/parse_func.c @@ -2656,6 +2656,9 @@ check_srf_call_placement(ParseState *pstate, Node *last_srf, int location) case EXPR_KIND_CYCLE_MARK: errkind = true; break; + case EXPR_KIND_RPR_DEFINE: + errkind = true; + break; /* * There is intentionally no default: case here, so that the -- 2.25.1 From 504b57b929954b421f32fe9d9e5152c235ea5a33 Mon Sep 17 00:00:00 2001 From: Tatsuo Ishii <ishii@postgresql.org> Date: Sat, 2 Sep 2023 15:32:49 +0900 Subject: [PATCH v5 3/7] Row pattern recognition patch (planner). --- src/backend/optimizer/plan/createplan.c | 23 ++++++++++++++++++----- src/backend/optimizer/plan/setrefs.c | 23 +++++++++++++++++++++++ src/include/nodes/plannodes.h | 15 +++++++++++++++ 3 files changed, 56 insertions(+), 5 deletions(-) diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index 34ca6d4ac2..e3c07ded65 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -286,9 +286,10 @@ static WindowAgg *make_windowagg(List *tlist, Index winref, int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations, int frameOptions, Node *startOffset, Node *endOffset, Oid startInRangeFunc, Oid endInRangeFunc, - Oid inRangeColl, bool inRangeAsc, bool inRangeNullsFirst, - List *runCondition, List *qual, bool topWindow, - Plan *lefttree); + Oid inRangeColl, bool inRangeAsc, bool inRangeNullsFirst, List *runCondition, + RPSkipTo rpSkipTo, List *patternVariable, List *patternRegexp, List *defineClause, + List *defineInitial, + List *qual, bool topWindow, Plan *lefttree); static Group *make_group(List *tlist, List *qual, int numGroupCols, AttrNumber *grpColIdx, Oid *grpOperators, Oid *grpCollations, Plan *lefttree); @@ -2698,6 +2699,11 @@ create_windowagg_plan(PlannerInfo *root, WindowAggPath *best_path) wc->inRangeAsc, wc->inRangeNullsFirst, wc->runCondition, + wc->rpSkipTo, + wc->patternVariable, + wc->patternRegexp, + wc->defineClause, + wc->defineInitial, best_path->qual, best_path->topwindow, subplan); @@ -6601,8 +6607,10 @@ make_windowagg(List *tlist, Index winref, int ordNumCols, AttrNumber *ordColIdx, Oid *ordOperators, Oid *ordCollations, int frameOptions, Node *startOffset, Node *endOffset, Oid startInRangeFunc, Oid endInRangeFunc, - Oid inRangeColl, bool inRangeAsc, bool inRangeNullsFirst, - List *runCondition, List *qual, bool topWindow, Plan *lefttree) + Oid inRangeColl, bool inRangeAsc, bool inRangeNullsFirst, List *runCondition, + RPSkipTo rpSkipTo, List *patternVariable, List *patternRegexp, List *defineClause, + List *defineInitial, + List *qual, bool topWindow, Plan *lefttree) { WindowAgg *node = makeNode(WindowAgg); Plan *plan = &node->plan; @@ -6628,6 +6636,11 @@ make_windowagg(List *tlist, Index winref, node->inRangeAsc = inRangeAsc; node->inRangeNullsFirst = inRangeNullsFirst; node->topWindow = topWindow; + node->rpSkipTo = rpSkipTo, + node->patternVariable = patternVariable; + node->patternRegexp = patternRegexp; + node->defineClause = defineClause; + node->defineInitial = defineInitial; plan->targetlist = tlist; plan->lefttree = lefttree; diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c index 97fa561e4e..2ed00b5d41 100644 --- a/src/backend/optimizer/plan/setrefs.c +++ b/src/backend/optimizer/plan/setrefs.c @@ -2456,6 +2456,29 @@ set_upper_references(PlannerInfo *root, Plan *plan, int rtoffset) NRM_EQUAL, NUM_EXEC_QUAL(plan)); + /* + * Modifies an expression tree in each DEFINE clause so that all Var + * nodes reference outputs of a subplan. + */ + if (IsA(plan, WindowAgg)) + { + WindowAgg *wplan = (WindowAgg *) plan; + + foreach(l, wplan->defineClause) + { + TargetEntry *tle = (TargetEntry *) lfirst(l); + + tle->expr = (Expr *) + fix_upper_expr(root, + (Node *) tle->expr, + subplan_itlist, + OUTER_VAR, + rtoffset, + NRM_EQUAL, + NUM_EXEC_QUAL(plan)); + } + } + pfree(subplan_itlist); } diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h index 1b787fe031..19815a98bb 100644 --- a/src/include/nodes/plannodes.h +++ b/src/include/nodes/plannodes.h @@ -1096,6 +1096,21 @@ typedef struct WindowAgg /* nulls sort first for in_range tests? */ bool inRangeNullsFirst; + /* Row Pattern Recognition AFTER MACH SKIP clause */ + RPSkipTo rpSkipTo; /* Row Pattern Skip To type */ + + /* Row Pattern PATTERN variable name (list of String) */ + List *patternVariable; + + /* Row Pattern RPATTERN regular expression quantifier ('+' or ''. list of String) */ + List *patternRegexp; + + /* Row Pattern DEFINE clause (list of TargetEntry) */ + List *defineClause; + + /* Row Pattern DEFINE variable initial names (list of String) */ + List *defineInitial; + /* * false for all apart from the WindowAgg that's closest to the root of * the plan -- 2.25.1 From 5248e0065b82b4d66cefc654d22c9ca0c7a1bafb Mon Sep 17 00:00:00 2001 From: Tatsuo Ishii <ishii@postgresql.org> Date: Sat, 2 Sep 2023 15:32:49 +0900 Subject: [PATCH v5 4/7] Row pattern recognition patch (executor). --- src/backend/executor/nodeWindowAgg.c | 671 ++++++++++++++++++++++++++- src/backend/utils/adt/windowfuncs.c | 38 +- src/include/catalog/pg_proc.dat | 6 + src/include/nodes/execnodes.h | 19 + 4 files changed, 722 insertions(+), 12 deletions(-) diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c index 310ac23e3a..2e59369a71 100644 --- a/src/backend/executor/nodeWindowAgg.c +++ b/src/backend/executor/nodeWindowAgg.c @@ -36,6 +36,7 @@ #include "access/htup_details.h" #include "catalog/objectaccess.h" #include "catalog/pg_aggregate.h" +#include "catalog/pg_collation_d.h" #include "catalog/pg_proc.h" #include "executor/executor.h" #include "executor/nodeWindowAgg.h" @@ -48,6 +49,7 @@ #include "utils/acl.h" #include "utils/builtins.h" #include "utils/datum.h" +#include "utils/fmgroids.h" #include "utils/expandeddatum.h" #include "utils/lsyscache.h" #include "utils/memutils.h" @@ -182,8 +184,9 @@ static void begin_partition(WindowAggState *winstate); static void spool_tuples(WindowAggState *winstate, int64 pos); static void release_partition(WindowAggState *winstate); -static int row_is_in_frame(WindowAggState *winstate, int64 pos, +static int row_is_in_frame(WindowAggState *winstate, int64 pos, TupleTableSlot *slot); + static void update_frameheadpos(WindowAggState *winstate); static void update_frametailpos(WindowAggState *winstate); static void update_grouptailpos(WindowAggState *winstate); @@ -195,9 +198,25 @@ static Datum GetAggInitVal(Datum textInitVal, Oid transtype); static bool are_peers(WindowAggState *winstate, TupleTableSlot *slot1, TupleTableSlot *slot2); -static bool window_gettupleslot(WindowObject winobj, int64 pos, - TupleTableSlot *slot); +static int WinGetSlotInFrame(WindowObject winobj, TupleTableSlot *slot, + int relpos, int seektype, bool set_mark, + bool *isnull, bool *isout); +static bool window_gettupleslot(WindowObject winobj, int64 pos, TupleTableSlot *slot); + +static void attno_map(Node *node); +static bool attno_map_walker(Node *node, void *context); +static int row_is_in_reduced_frame(WindowObject winobj, int64 pos); + +static int64 evaluate_pattern(WindowObject winobj, int64 current_pos, + char *vname, StringInfo encoded_str, bool *result); + +static bool get_slots(WindowObject winobj, int64 current_pos); + +static int search_str_set(char *pattern, StringInfo *str_set, int set_size); +static void search_str_set_recurse(char *pattern, StringInfo *str_set, int set_size, int set_index, + char *encoded_str, int *resultlen); +static char pattern_initial(WindowAggState *winstate, char *vname); /* * initialize_windowaggregate @@ -673,6 +692,9 @@ eval_windowaggregates(WindowAggState *winstate) WindowObject agg_winobj; TupleTableSlot *agg_row_slot; TupleTableSlot *temp_slot; + bool reduced_frame_set; + bool check_reduced_frame; + int num_rows_in_reduced_frame; numaggs = winstate->numaggs; if (numaggs == 0) @@ -790,6 +812,7 @@ eval_windowaggregates(WindowAggState *winstate) (winstate->frameOptions & FRAMEOPTION_EXCLUSION) || winstate->aggregatedupto <= winstate->frameheadpos) { + elog(DEBUG1, "peraggstate->restart is set"); peraggstate->restart = true; numaggs_restart++; } @@ -861,8 +884,10 @@ eval_windowaggregates(WindowAggState *winstate) * If we created a mark pointer for aggregates, keep it pushed up to frame * head, so that tuplestore can discard unnecessary rows. */ +#ifdef NOT_USED if (agg_winobj->markptr >= 0) WinSetMarkPosition(agg_winobj, winstate->frameheadpos); +#endif /* * Now restart the aggregates that require it. @@ -919,6 +944,10 @@ eval_windowaggregates(WindowAggState *winstate) ExecClearTuple(agg_row_slot); } + reduced_frame_set = false; + check_reduced_frame = false; + num_rows_in_reduced_frame = 0; + /* * Advance until we reach a row not in frame (or end of partition). * @@ -930,12 +959,18 @@ eval_windowaggregates(WindowAggState *winstate) { int ret; + elog(DEBUG1, "===== loop in frame starts: " INT64_FORMAT, winstate->aggregatedupto); + /* Fetch next row if we didn't already */ if (TupIsNull(agg_row_slot)) { if (!window_gettupleslot(agg_winobj, winstate->aggregatedupto, agg_row_slot)) + { + if (check_reduced_frame) + winstate->aggregatedupto--; break; /* must be end of partition */ + } } /* @@ -944,10 +979,47 @@ eval_windowaggregates(WindowAggState *winstate) */ ret = row_is_in_frame(winstate, winstate->aggregatedupto, agg_row_slot); if (ret < 0) + { + if (winstate->patternVariableList != NIL && check_reduced_frame) + winstate->aggregatedupto--; break; + } if (ret == 0) goto next_tuple; + if (winstate->patternVariableList != NIL) + { + if (!reduced_frame_set) + { + num_rows_in_reduced_frame = row_is_in_reduced_frame(winstate->agg_winobj, winstate->aggregatedupto); + reduced_frame_set = true; + elog(DEBUG1, "set num_rows_in_reduced_frame: %d pos: " INT64_FORMAT, + num_rows_in_reduced_frame, winstate->aggregatedupto); + + if (num_rows_in_reduced_frame <= 0) + break; + + else if (num_rows_in_reduced_frame > 0) + check_reduced_frame = true; + } + + if (check_reduced_frame) + { + elog(DEBUG1, "decrease num_rows_in_reduced_frame: %d pos: " INT64_FORMAT, + num_rows_in_reduced_frame, winstate->aggregatedupto); + num_rows_in_reduced_frame--; + if (num_rows_in_reduced_frame < 0) + { + /* + * No more rows remain in the reduced frame. Finish + * accumulating row into the aggregates. + */ + winstate->aggregatedupto--; + break; + } + } + } + /* Set tuple context for evaluation of aggregate arguments */ winstate->tmpcontext->ecxt_outertuple = agg_row_slot; @@ -976,6 +1048,8 @@ next_tuple: ExecClearTuple(agg_row_slot); } + elog(DEBUG1, "===== break loop in frame starts: " INT64_FORMAT, winstate->aggregatedupto); + /* The frame's end is not supposed to move backwards, ever */ Assert(aggregatedupto_nonrestarted <= winstate->aggregatedupto); @@ -2053,6 +2127,8 @@ ExecWindowAgg(PlanState *pstate) CHECK_FOR_INTERRUPTS(); + elog(DEBUG1, "ExecWindowAgg called. pos: " INT64_FORMAT , winstate->currentpos); + if (winstate->status == WINDOWAGG_DONE) return NULL; @@ -2388,6 +2464,9 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags) TupleDesc scanDesc; ListCell *l; + TargetEntry *te; + Expr *expr; + /* check for unsupported flags */ Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK))); @@ -2483,6 +2562,16 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags) winstate->temp_slot_2 = ExecInitExtraTupleSlot(estate, scanDesc, &TTSOpsMinimalTuple); + winstate->prev_slot = ExecInitExtraTupleSlot(estate, scanDesc, + &TTSOpsMinimalTuple); + + winstate->next_slot = ExecInitExtraTupleSlot(estate, scanDesc, + &TTSOpsMinimalTuple); + + winstate->null_slot = ExecInitExtraTupleSlot(estate, scanDesc, + &TTSOpsMinimalTuple); + winstate->null_slot = ExecStoreAllNullTuple(winstate->null_slot); + /* * create frame head and tail slots only if needed (must create slots in * exactly the same cases that update_frameheadpos and update_frametailpos @@ -2667,6 +2756,39 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags) winstate->inRangeAsc = node->inRangeAsc; winstate->inRangeNullsFirst = node->inRangeNullsFirst; + /* Set up SKIP TO type */ + winstate->rpSkipTo = node->rpSkipTo; + /* Set up row pattern recognition PATTERN clause */ + winstate->patternVariableList = node->patternVariable; + winstate->patternRegexpList = node->patternRegexp; + + /* Set up row pattern recognition DEFINE clause */ + winstate->defineInitial = node->defineInitial; + winstate->defineVariableList = NIL; + winstate->defineClauseList = NIL; + if (node->defineClause != NIL) + { + /* + * Tweak arg var of PREV/NEXT so that it refers to scan/inner slot. + */ + foreach(l, node->defineClause) + { + char *name; + ExprState *exps; + + te = lfirst(l); + name = te->resname; + expr = te->expr; + + elog(DEBUG1, "defineVariable name: %s", name); + winstate->defineVariableList = lappend(winstate->defineVariableList, + makeString(pstrdup(name))); + attno_map((Node *)expr); + exps = ExecInitExpr(expr, (PlanState *) winstate); + winstate->defineClauseList = lappend(winstate->defineClauseList, exps); + } + } + winstate->all_first = true; winstate->partition_spooled = false; winstate->more_partitions = false; @@ -2674,6 +2796,57 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags) return winstate; } +/* + * Rewrite varno of Var node that is the argument of PREV/NET so that it sees + * scan tuple (PREV) or inner tuple (NEXT). + */ +static void +attno_map(Node *node) +{ + (void) expression_tree_walker(node, attno_map_walker, NULL); +} + +static bool +attno_map_walker(Node *node, void *context) +{ + FuncExpr *func; + int nargs; + Expr *expr; + Var *var; + + if (node == NULL) + return false; + + if (IsA(node, FuncExpr)) + { + func = (FuncExpr *)node; + + if (func->funcid == F_PREV || func->funcid == F_NEXT) + { + /* sanity check */ + nargs = list_length(func->args); + if (list_length(func->args) != 1) + elog(ERROR, "PREV/NEXT must have 1 argument but function %d has %d args", func->funcid, nargs); + + expr = (Expr *) lfirst(list_head(func->args)); + if (!IsA(expr, Var)) + elog(ERROR, "PREV/NEXT's arg is not Var"); /* XXX: is it possible that arg type is Const? */ + var = (Var *)expr; + + if (func->funcid == F_PREV) + /* + * Rewrite varno from OUTER_VAR to regular var no so that the + * var references scan tuple. + */ + var->varno = var->varnosyn; + else + var->varno = INNER_VAR; + elog(DEBUG1, "PREV/NEXT's varno is rewritten to: %d", var->varno); + } + } + return expression_tree_walker(node, attno_map_walker, NULL); +} + /* ----------------- * ExecEndWindowAgg * ----------------- @@ -2691,6 +2864,8 @@ ExecEndWindowAgg(WindowAggState *node) ExecClearTuple(node->agg_row_slot); ExecClearTuple(node->temp_slot_1); ExecClearTuple(node->temp_slot_2); + ExecClearTuple(node->prev_slot); + ExecClearTuple(node->next_slot); if (node->framehead_slot) ExecClearTuple(node->framehead_slot); if (node->frametail_slot) @@ -2740,6 +2915,8 @@ ExecReScanWindowAgg(WindowAggState *node) ExecClearTuple(node->agg_row_slot); ExecClearTuple(node->temp_slot_1); ExecClearTuple(node->temp_slot_2); + ExecClearTuple(node->prev_slot); + ExecClearTuple(node->next_slot); if (node->framehead_slot) ExecClearTuple(node->framehead_slot); if (node->frametail_slot) @@ -3100,7 +3277,7 @@ window_gettupleslot(WindowObject winobj, int64 pos, TupleTableSlot *slot) return false; if (pos < winobj->markpos) - elog(ERROR, "cannot fetch row before WindowObject's mark position"); + elog(ERROR, "cannot fetch row: " INT64_FORMAT " before WindowObject's mark position: " INT64_FORMAT, pos, winobj->markpos); oldcontext = MemoryContextSwitchTo(winstate->ss.ps.ps_ExprContext->ecxt_per_query_memory); @@ -3420,14 +3597,54 @@ WinGetFuncArgInFrame(WindowObject winobj, int argno, WindowAggState *winstate; ExprContext *econtext; TupleTableSlot *slot; - int64 abs_pos; - int64 mark_pos; Assert(WindowObjectIsValid(winobj)); winstate = winobj->winstate; econtext = winstate->ss.ps.ps_ExprContext; slot = winstate->temp_slot_1; + if (WinGetSlotInFrame(winobj, slot, + relpos, seektype, set_mark, + isnull, isout) == 0) + { + econtext->ecxt_outertuple = slot; + return ExecEvalExpr((ExprState *) list_nth(winobj->argstates, argno), + econtext, isnull); + } + + if (isout) + *isout = true; + *isnull = true; + return (Datum) 0; +} + +/* + * WinGetSlotInFrame + * slot: TupleTableSlot to store the result + * relpos: signed rowcount offset from the seek position + * seektype: WINDOW_SEEK_HEAD or WINDOW_SEEK_TAIL + * set_mark: If the row is found/in frame and set_mark is true, the mark is + * moved to the row as a side-effect. + * isnull: output argument, receives isnull status of result + * isout: output argument, set to indicate whether target row position + * is out of frame (can pass NULL if caller doesn't care about this) + * + * Returns 0 if we successfullt got the slot. false if out of frame. + * (also isout is set) + */ +static int +WinGetSlotInFrame(WindowObject winobj, TupleTableSlot *slot, + int relpos, int seektype, bool set_mark, + bool *isnull, bool *isout) +{ + WindowAggState *winstate; + int64 abs_pos; + int64 mark_pos; + int num_reduced_frame; + + Assert(WindowObjectIsValid(winobj)); + winstate = winobj->winstate; + switch (seektype) { case WINDOW_SEEK_CURRENT: @@ -3494,6 +3711,12 @@ WinGetFuncArgInFrame(WindowObject winobj, int argno, winstate->frameOptions); break; } + num_reduced_frame = row_is_in_reduced_frame(winobj, winstate->frameheadpos); + if (num_reduced_frame < 0) + goto out_of_frame; + else if (num_reduced_frame > 0) + if (relpos >= num_reduced_frame) + goto out_of_frame; break; case WINDOW_SEEK_TAIL: /* rejecting relpos > 0 is easy and simplifies code below */ @@ -3565,6 +3788,12 @@ WinGetFuncArgInFrame(WindowObject winobj, int argno, mark_pos = 0; /* keep compiler quiet */ break; } + + num_reduced_frame = row_is_in_reduced_frame(winobj, winstate->frameheadpos + relpos); + if (num_reduced_frame < 0) + goto out_of_frame; + else if (num_reduced_frame > 0) + abs_pos = winstate->frameheadpos + relpos + num_reduced_frame - 1; break; default: elog(ERROR, "unrecognized window seek type: %d", seektype); @@ -3583,15 +3812,13 @@ WinGetFuncArgInFrame(WindowObject winobj, int argno, *isout = false; if (set_mark) WinSetMarkPosition(winobj, mark_pos); - econtext->ecxt_outertuple = slot; - return ExecEvalExpr((ExprState *) list_nth(winobj->argstates, argno), - econtext, isnull); + return 0; out_of_frame: if (isout) *isout = true; *isnull = true; - return (Datum) 0; + return -1; } /* @@ -3622,3 +3849,427 @@ WinGetFuncArgCurrent(WindowObject winobj, int argno, bool *isnull) return ExecEvalExpr((ExprState *) list_nth(winobj->argstates, argno), econtext, isnull); } + +/* + * row_is_in_reduced_frame + * Determine whether a row is in the current row's reduced window frame according + * to row pattern matching + * + * The row must has been already determined that it is in a full window frame + * and fetched it into slot. + * + * Returns: + * = 0, RPR is not defined. + * >0, if the row is the first in the reduced frame. Return the number of rows in the reduced frame. + * -1, if the row is unmatched row + * -2, if the row is in the reduced frame but needed to be skipped because of + * AFTER MATCH SKIP PAST LAST ROW + */ +static +int row_is_in_reduced_frame(WindowObject winobj, int64 pos) +{ + WindowAggState *winstate = winobj->winstate; + ListCell *lc1, *lc2; + bool expression_result; + int num_matched_rows; + int64 original_pos; + bool anymatch; + StringInfo encoded_str; + StringInfo pattern_str = makeStringInfo(); + + /* + * Array of pattern variables evaluted to true. + * Each character corresponds to pattern variable. + * Example: + * str_set[0] = "AB"; + * str_set[1] = "AC"; + * In this case at row 0 A and B are true, and A and C are true in row 1. + */ + #define ENCODED_STR_ARRAY_ALLOC_SIZE 128 + StringInfo *str_set = NULL; + int str_set_index; + int str_set_size; + + if (winstate->patternVariableList == NIL) + { + /* + * RPR is not defined. Assume that we are always in the the reduced + * window frame. + */ + return 0; + } + + /* save original pos */ + original_pos = pos; + + /* + * Check whether the row speicied by pos is in the reduced frame. The + * second and subsequent rows need to be recognized as "unmatched" rows if + * AFTER MATCH SKIP PAST LAST ROW is defined. + */ + if (winstate->rpSkipTo == ST_PAST_LAST_ROW && + pos > winstate->headpos_in_reduced_frame && + pos < (winstate->headpos_in_reduced_frame + winstate->num_rows_in_reduced_frame)) + return -2; + + /* + * Loop over until none of pattern matches or encounters end of frame. + */ + for (;;) + { + int64 result_pos = -1; + + /* + * Loop over each PATTERN variable. + */ + anymatch = false; + encoded_str = makeStringInfo(); + + forboth(lc1, winstate->patternVariableList, lc2, winstate->patternRegexpList) + { + char *vname = strVal(lfirst(lc1)); + char *quantifier = strVal(lfirst(lc2)); + + elog(DEBUG1, "pos: " INT64_FORMAT " pattern vname: %s quantifier: %s", pos, vname, quantifier); + + expression_result = false; + + /* evaluate row pattern against current row */ + result_pos = evaluate_pattern(winobj, pos, vname, encoded_str, &expression_result); + if (expression_result) + { + elog(DEBUG1, "expression result is true"); + anymatch = true; + } + + /* + * If out of frame, we are done. + */ + if (result_pos < 0) + break; + } + + if (!anymatch) + { + /* none of patterns matched. */ + break; + } + + /* build encoded string array */ + if (str_set == NULL) + { + str_set_index = 0; + str_set_size = ENCODED_STR_ARRAY_ALLOC_SIZE * sizeof(StringInfo); + str_set = palloc(str_set_size); + } + + str_set[str_set_index++] = encoded_str; + + elog(DEBUG1, "pos: " INT64_FORMAT " str_set_index: %d encoded_str: %s", pos, str_set_index, encoded_str->data); + + if (str_set_index >= str_set_size) + { + str_set_size *= 2; + str_set = repalloc(str_set, str_set_size); + } + + /* move to next row */ + pos++; + + if (result_pos < 0) + { + /* out of frame */ + break; + } + } + + if (str_set == NULL) + { + /* no matches found in the first row */ + return -1; + } + + elog(DEBUG1, "pos: " INT64_FORMAT " encoded_str: %s", pos, encoded_str->data); + + /* build regular expression */ + pattern_str = makeStringInfo(); + appendStringInfoChar(pattern_str, '^'); + forboth (lc1, winstate->patternVariableList, lc2, winstate->patternRegexpList) + { + char *vname = strVal(lfirst(lc1)); + char *quantifier = strVal(lfirst(lc2)); + char initial; + + initial = pattern_initial(winstate, vname); + Assert(initial != 0); + appendStringInfoChar(pattern_str, initial); + if (quantifier[0]) + appendStringInfoChar(pattern_str, quantifier[0]); + elog(DEBUG1, "vname: %s initial: %c quantifier: %s", vname, initial, quantifier); + } + + elog(DEBUG1, "pos: " INT64_FORMAT " pattern: %s", pos, pattern_str->data); + + /* look for matching pattern variable sequence */ + num_matched_rows = search_str_set(pattern_str->data, str_set, str_set_index); + if (num_matched_rows <= 0) + return -1; + + /* + * We are at the first row in the reduced frame. Save the number of + * matched rows as the number of rows in the reduced frame. + */ + winstate->headpos_in_reduced_frame = original_pos; + winstate->num_rows_in_reduced_frame = num_matched_rows; + + return num_matched_rows; +} + +/* + * search set of encode_str. + * set_size: size of set_str array. + */ +static +int search_str_set(char *pattern, StringInfo *str_set, int set_size) +{ + char *encoded_str = palloc0(set_size+1); + int resultlen = 0; + + search_str_set_recurse(pattern, str_set, set_size, 0, encoded_str, &resultlen); + elog(DEBUG1, "search_str_set returns %d", resultlen); + return resultlen; +} + +static +void search_str_set_recurse(char *pattern, StringInfo *str_set, + int set_size, int set_index, char *encoded_str, int *resultlen) +{ + char *p; + + if (set_index >= set_size) + { + Datum d; + text *res; + char *substr; + + /* + * We first perform pattern matching using regexp_instr, then call + * textregexsubstr to get matched substring to know how log the + * matched string is. That is the number of rows in the reduced window + * frame. The reason why we can't call textregexsubstr is, it error + * out if pattern is not match. + */ + if (DatumGetInt32(DirectFunctionCall2Coll(regexp_instr, DEFAULT_COLLATION_OID, + PointerGetDatum(cstring_to_text(encoded_str)), + PointerGetDatum(cstring_to_text(pattern)))) > 0) + { + d = DirectFunctionCall2Coll(textregexsubstr, + DEFAULT_COLLATION_OID, + PointerGetDatum(cstring_to_text(encoded_str)), + PointerGetDatum(cstring_to_text(pattern))); + if (d != 0) + { + int len; + + res = DatumGetTextPP(d); + substr = text_to_cstring(res); + len = strlen(substr); + if (len > *resultlen) + /* remember the longest match */ + *resultlen = len; + } + } + return; + } + + p = str_set[set_index]->data; + while (*p) + { + encoded_str[set_index] = *p; + p++; + search_str_set_recurse(pattern, str_set, set_size, set_index + 1, encoded_str, resultlen); + } +} + + +/* + * Evaluate expression associated with PATTERN variable vname. + * relpos is relative row position in a frame (starting from 0). + * "quantifier" is the quatifier part of the PATTERN regular expression. + * Currently only '+' is allowed. + * result is out paramater representing the expression evaluation result + * is true of false. + * Return values are: + * >=0: the last match absolute row position + * other wise out of frame. + */ +static +int64 evaluate_pattern(WindowObject winobj, int64 current_pos, + char *vname, StringInfo encoded_str, bool *result) +{ + WindowAggState *winstate = winobj->winstate; + ExprContext *econtext = winstate->ss.ps.ps_ExprContext; + ListCell *lc1, *lc2, *lc3; + ExprState *pat; + Datum eval_result; + bool out_of_frame = false; + bool isnull; + + forthree (lc1, winstate->defineVariableList, lc2, winstate->defineClauseList, lc3, winstate->defineInitial) + { + char initial; + char *name = strVal(lfirst(lc1)); + + elog(DEBUG1, "evaluate_pattern: define variable: %s, pattern variable: %s", name, vname); + + if (strcmp(vname, name)) + continue; + + initial = *(strVal(lfirst(lc3))); + + /* set expression to evaluate */ + pat = lfirst(lc2); + + /* get current, previous and next tuples */ + if (!get_slots(winobj, current_pos)) + { + out_of_frame = true; + } + else + { + /* evaluate the expression */ + eval_result = ExecEvalExpr(pat, econtext, &isnull); + if (isnull) + { + /* expression is NULL */ + elog(DEBUG1, "expression for %s is NULL at row: " INT64_FORMAT, vname, current_pos); + *result = false; + } + else + { + if (!DatumGetBool(eval_result)) + { + /* expression is false */ + elog(DEBUG1, "expression for %s is false at row: " INT64_FORMAT, vname, current_pos); + *result = false; + } + else + { + /* expression is true */ + elog(DEBUG1, "expression for %s is true at row: " INT64_FORMAT, vname, current_pos); + appendStringInfoChar(encoded_str, initial); + *result = true; + } + } + break; + } + + if (out_of_frame) + { + *result = false; + return -1; + } + } + return current_pos; +} + +/* + * Get current, previous and next tuples. + * Returns false if current row is out of partition/full frame. + */ +static +bool get_slots(WindowObject winobj, int64 current_pos) +{ + WindowAggState *winstate = winobj->winstate; + TupleTableSlot *slot; + int ret; + ExprContext *econtext; + + econtext = winstate->ss.ps.ps_ExprContext; + + /* set up current row tuple slot */ + slot = winstate->temp_slot_1; + if (!window_gettupleslot(winobj, current_pos, slot)) + { + elog(DEBUG1, "current row is out of partition at:" INT64_FORMAT, current_pos); + return false; + + ret = row_is_in_frame(winstate, current_pos, slot); + if (ret <= 0) + { + elog(DEBUG1, "current row is out of frame at: " INT64_FORMAT, current_pos); + return false; + } + } + econtext->ecxt_outertuple = slot; + + /* for PREV */ + if (current_pos > 0) + { + slot = winstate->prev_slot; + if (!window_gettupleslot(winobj, current_pos - 1, slot)) + { + elog(DEBUG1, "previous row is out of partition at: " INT64_FORMAT, current_pos - 1); + econtext->ecxt_scantuple = winstate->null_slot; + } + else + { + ret = row_is_in_frame(winstate, current_pos - 1, slot); + if (ret <= 0) + { + elog(DEBUG1, "previous row is out of frame at: " INT64_FORMAT, current_pos - 1); + econtext->ecxt_scantuple = winstate->null_slot; + } + else + { + econtext->ecxt_scantuple = slot; + } + } + } + else + econtext->ecxt_scantuple = winstate->null_slot; + + /* for NEXT */ + slot = winstate->next_slot; + if (!window_gettupleslot(winobj, current_pos + 1, slot)) + { + elog(DEBUG1, "next row is out of partiton at: " INT64_FORMAT, current_pos + 1); + econtext->ecxt_innertuple = winstate->null_slot; + } + else + { + ret = row_is_in_frame(winstate, current_pos + 1, slot); + if (ret <= 0) + { + elog(DEBUG1, "next row is out of frame at: " INT64_FORMAT, current_pos + 1); + econtext->ecxt_innertuple = winstate->null_slot; + } + else + econtext->ecxt_innertuple = slot; + } + return true; +} + +/* + * Return pattern variable initial character + * matching with pattern variable name vname. + * If not found, return 0. + */ +static +char pattern_initial(WindowAggState *winstate, char *vname) +{ + char initial; + char *name; + ListCell *lc1, *lc2; + + forboth (lc1, winstate->defineVariableList, lc2, winstate->defineInitial) + { + name = strVal(lfirst(lc1)); /* DEFINE variable name */ + initial = *(strVal(lfirst(lc2))); /* DEFINE variable initial */ + + + if (!strcmp(name, vname)) + return initial; /* found */ + } + return 0; +} diff --git a/src/backend/utils/adt/windowfuncs.c b/src/backend/utils/adt/windowfuncs.c index b87a624fb2..e4cab36ec9 100644 --- a/src/backend/utils/adt/windowfuncs.c +++ b/src/backend/utils/adt/windowfuncs.c @@ -13,6 +13,9 @@ */ #include "postgres.h" +#include "catalog/pg_collation_d.h" +#include "executor/executor.h" +#include "nodes/execnodes.h" #include "nodes/supportnodes.h" #include "utils/builtins.h" #include "windowapi.h" @@ -36,11 +39,19 @@ typedef struct int64 remainder; /* (total rows) % (bucket num) */ } ntile_context; +/* + * rpr process information. + * Used for AFTER MATCH SKIP PAST LAST ROW + */ +typedef struct SkipContext +{ + int64 pos; /* last row absolute position */ +} SkipContext; + static bool rank_up(WindowObject winobj); static Datum leadlag_common(FunctionCallInfo fcinfo, bool forward, bool withoffset, bool withdefault); - /* * utility routine for *_rank functions. */ @@ -673,7 +684,7 @@ window_last_value(PG_FUNCTION_ARGS) bool isnull; result = WinGetFuncArgInFrame(winobj, 0, - 0, WINDOW_SEEK_TAIL, true, + 0, WINDOW_SEEK_TAIL, false, &isnull, NULL); if (isnull) PG_RETURN_NULL(); @@ -713,3 +724,26 @@ window_nth_value(PG_FUNCTION_ARGS) PG_RETURN_DATUM(result); } + +/* + * prev + * Dummy function to invoke RPR's navigation operator "PREV". + * This is *not* a window function. + */ +Datum +window_prev(PG_FUNCTION_ARGS) +{ + PG_RETURN_DATUM(PG_GETARG_DATUM(0)); +} + +/* + * next + * Dummy function to invoke RPR's navigation operation "NEXT". + * This is *not* a window function. + */ +Datum +window_next(PG_FUNCTION_ARGS) +{ + PG_RETURN_DATUM(PG_GETARG_DATUM(0)); +} + diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 9805bc6118..d20f803cf5 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -10416,6 +10416,12 @@ { oid => '3114', descr => 'fetch the Nth row value', proname => 'nth_value', prokind => 'w', prorettype => 'anyelement', proargtypes => 'anyelement int4', prosrc => 'window_nth_value' }, +{ oid => '6122', descr => 'previous value', + proname => 'prev', provolatile => 's', prorettype => 'anyelement', + proargtypes => 'anyelement', prosrc => 'window_prev' }, +{ oid => '6123', descr => 'next value', + proname => 'next', provolatile => 's', prorettype => 'anyelement', + proargtypes => 'anyelement', prosrc => 'window_next' }, # functions for range types { oid => '3832', descr => 'I/O', diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index cb714f4a19..2bd6fcb5e1 100644 --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -2519,6 +2519,15 @@ typedef struct WindowAggState int64 groupheadpos; /* current row's peer group head position */ int64 grouptailpos; /* " " " " tail position (group end+1) */ + /* these fields are used in Row pattern recognition: */ + RPSkipTo rpSkipTo; /* Row Pattern Skip To type */ + List *patternVariableList; /* list of row pattern variables names (list of String) */ + List *patternRegexpList; /* list of row pattern regular expressions ('+' or ''. list of String) */ + List *defineVariableList; /* list of row pattern definition variables (list of String) */ + List *defineClauseList; /* expression for row pattern definition + * search conditions ExprState list */ + List *defineInitial; /* list of row pattern definition variable initials (list of String) */ + MemoryContext partcontext; /* context for partition-lifespan data */ MemoryContext aggcontext; /* shared context for aggregate working data */ MemoryContext curaggcontext; /* current aggregate's working data */ @@ -2555,6 +2564,16 @@ typedef struct WindowAggState TupleTableSlot *agg_row_slot; TupleTableSlot *temp_slot_1; TupleTableSlot *temp_slot_2; + + /* temporary slots for RPR */ + TupleTableSlot *prev_slot; /* PREV row navigation operator */ + TupleTableSlot *next_slot; /* NEXT row navigation operator */ + TupleTableSlot *null_slot; /* all NULL slot */ + + /* head of the reduced window frame */ + int64 headpos_in_reduced_frame; + /* number of rows in the reduced window frame */ + int64 num_rows_in_reduced_frame; } WindowAggState; /* ---------------- -- 2.25.1 From fa12c3521351e4200d5476506f963b07ae70cec9 Mon Sep 17 00:00:00 2001 From: Tatsuo Ishii <ishii@postgresql.org> Date: Sat, 2 Sep 2023 15:32:49 +0900 Subject: [PATCH v5 5/7] Row pattern recognition patch (docs). --- doc/src/sgml/advanced.sgml | 52 ++++++++++++++++++++++++++++++++++ doc/src/sgml/func.sgml | 54 ++++++++++++++++++++++++++++++++++++ doc/src/sgml/ref/select.sgml | 38 +++++++++++++++++++++++-- 3 files changed, 142 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/advanced.sgml b/doc/src/sgml/advanced.sgml index 755c9f1485..eda3612822 100644 --- a/doc/src/sgml/advanced.sgml +++ b/doc/src/sgml/advanced.sgml @@ -537,6 +537,58 @@ WHERE pos < 3; <literal>rank</literal> less than 3. </para> + <para> + Row pattern common syntax can be used with row pattern common syntax to + perform row pattern recognition in a query. Row pattern common syntax + includes two sub clauses. <literal>DEFINE</literal> defines definition + variables along with an expression. The expression must be a logical + expression, which means it must + return <literal>TRUE</literal>, <literal>FALSE</literal> + or <literal>NULL</literal>. Moreover if the expression comprises a column + reference, it must be the argument of <function>rpr</function>. An example + of <literal>DEFINE</literal> is as follows. + +<programlisting> +DEFINE + LOWPRICE AS price <= 100, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +</programlisting> + + Note that <function>PREV</function> returns price column in the previous + row if it's called in a context of row pattern recognition. So in the + second line means the definition variable "UP" is <literal>TRUE</literal> + when price column in the current row is greater than the price column in + the previous row. Likewise, "DOWN" is <literal>TRUE</literal> when when + price column in the current row is lower than the price column in the + previous row. + </para> + <para> + Once <literal>DEFINE</literal> exists, <literal>PATTERN</literal> can be + used. <literal>PATTERN</literal> defines a sequence of rows that satisfies + certain conditions. For example following <literal>PATTERN</literal> + defines that a row starts with the condition "LOWPRICE", then one or more + rows satisfy "UP" and finally one or more rows satisfy "DOWN". If a + sequence of rows found, rpr returns the column at the starting row. + Example of a <literal>SELECT</literal> using the <literal>DEFINE</literal> + and <literal>PATTERN</literal> clause is as follows. + +<programlisting> +SELECT company, tdate, price, max(price) OVER w FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + INITIAL + PATTERN (LOWPRICE UP+ DOWN+) + DEFINE + LOWPRICE AS price <= 100, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); +</programlisting> + </para> + <para> When a query involves multiple window functions, it is possible to write out each one with a separate <literal>OVER</literal> clause, but this is diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 7a0d4b9134..b7bfc9271e 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -21772,6 +21772,7 @@ SELECT count(*) FROM sometable; returns <literal>NULL</literal> if there is no such row. </para></entry> </row> + </tbody> </tgroup> </table> @@ -21811,6 +21812,59 @@ SELECT count(*) FROM sometable; Other frame specifications can be used to obtain other effects. </para> + <para> + Row pattern recognition navigation functions are listed in + <xref linkend="functions-rpr-navigation-table"/>. These functions + can be used to describe DEFINE clause of Row pattern recognition. + </para> + + <table id="functions-rpr-navigation-table"> + <title>Row Pattern Navigation Functions</title> + <tgroup cols="1"> + <thead> + <row> + <entry role="func_table_entry"><para role="func_signature"> + Function + </para> + <para> + Description + </para></entry> + </row> + </thead> + + <tbody> + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>prev</primary> + </indexterm> + <function>prev</function> ( <parameter>value</parameter> <type>anyelement</type> ) + <returnvalue>anyelement</returnvalue> + </para> + <para> + Returns the column value at the previous row; + returns NULL if there is no previous row in the window frame. + </para></entry> + </row> + + <row> + <entry role="func_table_entry"><para role="func_signature"> + <indexterm> + <primary>next</primary> + </indexterm> + <function>next</function> ( <parameter>value</parameter> <type>anyelement</type> ) + <returnvalue>anyelement</returnvalue> + </para> + <para> + Returns the column value at the next row; + returns NULL if there is no next row in the window frame. + </para></entry> + </row> + + </tbody> + </tgroup> + </table> + <note> <para> The SQL standard defines a <literal>RESPECT NULLS</literal> or diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index 0ee0cc7e64..8d3becd57a 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -966,8 +966,8 @@ WINDOW <replaceable class="parameter">window_name</replaceable> AS ( <replaceabl The <replaceable class="parameter">frame_clause</replaceable> can be one of <synopsis> -{ RANGE | ROWS | GROUPS } <replaceable>frame_start</replaceable> [ <replaceable>frame_exclusion</replaceable> ] -{ RANGE | ROWS | GROUPS } BETWEEN <replaceable>frame_start</replaceable> AND <replaceable>frame_end</replaceable> [ <replaceable>frame_exclusion</replaceable>] +{ RANGE | ROWS | GROUPS } <replaceable>frame_start</replaceable> [ <replaceable>frame_exclusion</replaceable> ] [row_pattern_common_syntax] +{ RANGE | ROWS | GROUPS } BETWEEN <replaceable>frame_start</replaceable> AND <replaceable>frame_end</replaceable> [ <replaceable>frame_exclusion</replaceable>] [row_pattern_common_syntax] </synopsis> where <replaceable>frame_start</replaceable> @@ -1074,6 +1074,40 @@ EXCLUDE NO OTHERS a given peer group will be in the frame or excluded from it. </para> + <para> + The + optional <replaceable class="parameter">row_pattern_common_syntax</replaceable> + defines the <firstterm>row pattern recognition condition</firstterm> for + this + window. <replaceable class="parameter">row_pattern_common_syntax</replaceable> + includes following subclauses. <literal>AFTER MATCH SKIP PAST LAST + ROW</literal> or <literal>AFTER MATCH SKIP TO NEXT ROW</literal> controls + how to proceed to next row position after a match + found. With <literal>AFTER MATCH SKIP PAST LAST ROW</literal> (the + default) next row position is next to the last row of previous match. On + the other hand, with <literal>AFTER MATCH SKIP TO NEXT ROW</literal> next + row position is always next to the last row of previous + match. <literal>DEFINE</literal> defines definition variables along with a + boolean expression. <literal>PATTERN</literal> defines a sequence of rows + that satisfies certain conditions using variables defined + in <literal>DEFINE</literal> clause. If the variable is not defined in + the <literal>DEFINE</literal> clause, it is implicitly assumed + following is defined in the <literal>DEFINE</literal> clause. + +<synopsis> +<literal>variable_name</literal> AS TRUE +</synopsis> + + Note that the maximu number of variables defined + in <literal>DEFINE</literal> clause is 26. + +<synopsis> +[ AFTER MATCH SKIP PAST LAST ROW | AFTER MATCH SKIP TO NEXT ROW ] +PATTERN <replaceable class="parameter">pattern_variable_name</replaceable>[+] [, ...] +DEFINE <replaceable class="parameter">definition_varible_name</replaceable> AS <replaceable class="parameter">expression</replaceable>[, ...] +</synopsis> + </para> + <para> The purpose of a <literal>WINDOW</literal> clause is to specify the behavior of <firstterm>window functions</firstterm> appearing in the query's -- 2.25.1 From fe7ea2e80f1deb0c255e795b9c340c66e1f19356 Mon Sep 17 00:00:00 2001 From: Tatsuo Ishii <ishii@postgresql.org> Date: Sat, 2 Sep 2023 15:32:49 +0900 Subject: [PATCH v5 6/7] Row pattern recognition patch (tests). --- src/test/regress/expected/rpr.out | 463 +++++++++++++++++++++++++++++ src/test/regress/parallel_schedule | 2 +- src/test/regress/sql/rpr.sql | 229 ++++++++++++++ 3 files changed, 693 insertions(+), 1 deletion(-) create mode 100644 src/test/regress/expected/rpr.out create mode 100644 src/test/regress/sql/rpr.sql diff --git a/src/test/regress/expected/rpr.out b/src/test/regress/expected/rpr.out new file mode 100644 index 0000000000..340ccf242c --- /dev/null +++ b/src/test/regress/expected/rpr.out @@ -0,0 +1,463 @@ +-- +-- Test for row pattern definition clause +-- +CREATE TEMP TABLE stock ( + company TEXT, + tdate DATE, + price INTEGER + ); +INSERT INTO stock VALUES ('company1', '2023-07-01', 100); +INSERT INTO stock VALUES ('company1', '2023-07-02', 200); +INSERT INTO stock VALUES ('company1', '2023-07-03', 150); +INSERT INTO stock VALUES ('company1', '2023-07-04', 140); +INSERT INTO stock VALUES ('company1', '2023-07-05', 150); +INSERT INTO stock VALUES ('company1', '2023-07-06', 90); +INSERT INTO stock VALUES ('company1', '2023-07-07', 110); +INSERT INTO stock VALUES ('company1', '2023-07-08', 130); +INSERT INTO stock VALUES ('company1', '2023-07-09', 120); +INSERT INTO stock VALUES ('company1', '2023-07-10', 130); +INSERT INTO stock VALUES ('company2', '2023-07-01', 50); +INSERT INTO stock VALUES ('company2', '2023-07-02', 2000); +INSERT INTO stock VALUES ('company2', '2023-07-03', 1500); +INSERT INTO stock VALUES ('company2', '2023-07-04', 1400); +INSERT INTO stock VALUES ('company2', '2023-07-05', 1500); +INSERT INTO stock VALUES ('company2', '2023-07-06', 60); +INSERT INTO stock VALUES ('company2', '2023-07-07', 1100); +INSERT INTO stock VALUES ('company2', '2023-07-08', 1300); +INSERT INTO stock VALUES ('company2', '2023-07-09', 1200); +INSERT INTO stock VALUES ('company2', '2023-07-10', 1300); +SELECT * FROM stock; + company | tdate | price +----------+------------+------- + company1 | 07-01-2023 | 100 + company1 | 07-02-2023 | 200 + company1 | 07-03-2023 | 150 + company1 | 07-04-2023 | 140 + company1 | 07-05-2023 | 150 + company1 | 07-06-2023 | 90 + company1 | 07-07-2023 | 110 + company1 | 07-08-2023 | 130 + company1 | 07-09-2023 | 120 + company1 | 07-10-2023 | 130 + company2 | 07-01-2023 | 50 + company2 | 07-02-2023 | 2000 + company2 | 07-03-2023 | 1500 + company2 | 07-04-2023 | 1400 + company2 | 07-05-2023 | 1500 + company2 | 07-06-2023 | 60 + company2 | 07-07-2023 | 1100 + company2 | 07-08-2023 | 1300 + company2 | 07-09-2023 | 1200 + company2 | 07-10-2023 | 1300 +(20 rows) + +-- basic test using PREV +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w, + nth_value(tdate, 2) OVER w AS nth_second + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP+ DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + company | tdate | price | first_value | last_value | nth_second +----------+------------+-------+-------------+------------+------------ + company1 | 07-01-2023 | 100 | 100 | 140 | 07-02-2023 + company1 | 07-02-2023 | 200 | | | + company1 | 07-03-2023 | 150 | | | + company1 | 07-04-2023 | 140 | | | + company1 | 07-05-2023 | 150 | | | + company1 | 07-06-2023 | 90 | 90 | 120 | 07-07-2023 + company1 | 07-07-2023 | 110 | | | + company1 | 07-08-2023 | 130 | | | + company1 | 07-09-2023 | 120 | | | + company1 | 07-10-2023 | 130 | | | + company2 | 07-01-2023 | 50 | 50 | 1400 | 07-02-2023 + company2 | 07-02-2023 | 2000 | | | + company2 | 07-03-2023 | 1500 | | | + company2 | 07-04-2023 | 1400 | | | + company2 | 07-05-2023 | 1500 | | | + company2 | 07-06-2023 | 60 | 60 | 1200 | 07-07-2023 + company2 | 07-07-2023 | 1100 | | | + company2 | 07-08-2023 | 1300 | | | + company2 | 07-09-2023 | 1200 | | | + company2 | 07-10-2023 | 1300 | | | +(20 rows) + +-- omit "START" in DEFINE but it is ok because "START AS TRUE" is +-- implicitly defined. per spec. +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w, + nth_value(tdate, 2) OVER w AS nth_second + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP+ DOWN+) + DEFINE + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + company | tdate | price | first_value | last_value | nth_second +----------+------------+-------+-------------+------------+------------ + company1 | 07-01-2023 | 100 | 100 | 140 | 07-02-2023 + company1 | 07-02-2023 | 200 | | | + company1 | 07-03-2023 | 150 | | | + company1 | 07-04-2023 | 140 | | | + company1 | 07-05-2023 | 150 | | | + company1 | 07-06-2023 | 90 | 90 | 120 | 07-07-2023 + company1 | 07-07-2023 | 110 | | | + company1 | 07-08-2023 | 130 | | | + company1 | 07-09-2023 | 120 | | | + company1 | 07-10-2023 | 130 | | | + company2 | 07-01-2023 | 50 | 50 | 1400 | 07-02-2023 + company2 | 07-02-2023 | 2000 | | | + company2 | 07-03-2023 | 1500 | | | + company2 | 07-04-2023 | 1400 | | | + company2 | 07-05-2023 | 1500 | | | + company2 | 07-06-2023 | 60 | 60 | 1200 | 07-07-2023 + company2 | 07-07-2023 | 1100 | | | + company2 | 07-08-2023 | 1300 | | | + company2 | 07-09-2023 | 1200 | | | + company2 | 07-10-2023 | 1300 | | | +(20 rows) + +-- the first row start with less than or equal to 100 +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (LOWPRICE UP+ DOWN+) + DEFINE + LOWPRICE AS price <= 100, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + company | tdate | price | first_value | last_value +----------+------------+-------+-------------+------------ + company1 | 07-01-2023 | 100 | 100 | 140 + company1 | 07-02-2023 | 200 | | + company1 | 07-03-2023 | 150 | | + company1 | 07-04-2023 | 140 | | + company1 | 07-05-2023 | 150 | | + company1 | 07-06-2023 | 90 | 90 | 120 + company1 | 07-07-2023 | 110 | | + company1 | 07-08-2023 | 130 | | + company1 | 07-09-2023 | 120 | | + company1 | 07-10-2023 | 130 | | + company2 | 07-01-2023 | 50 | 50 | 1400 + company2 | 07-02-2023 | 2000 | | + company2 | 07-03-2023 | 1500 | | + company2 | 07-04-2023 | 1400 | | + company2 | 07-05-2023 | 1500 | | + company2 | 07-06-2023 | 60 | 60 | 1200 + company2 | 07-07-2023 | 1100 | | + company2 | 07-08-2023 | 1300 | | + company2 | 07-09-2023 | 1200 | | + company2 | 07-10-2023 | 1300 | | +(20 rows) + +-- second row raises 120% +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (LOWPRICE UP+ DOWN+) + DEFINE + LOWPRICE AS price <= 100, + UP AS price > PREV(price) * 1.2, + DOWN AS price < PREV(price) +); + company | tdate | price | first_value | last_value +----------+------------+-------+-------------+------------ + company1 | 07-01-2023 | 100 | 100 | 140 + company1 | 07-02-2023 | 200 | | + company1 | 07-03-2023 | 150 | | + company1 | 07-04-2023 | 140 | | + company1 | 07-05-2023 | 150 | | + company1 | 07-06-2023 | 90 | | + company1 | 07-07-2023 | 110 | | + company1 | 07-08-2023 | 130 | | + company1 | 07-09-2023 | 120 | | + company1 | 07-10-2023 | 130 | | + company2 | 07-01-2023 | 50 | 50 | 1400 + company2 | 07-02-2023 | 2000 | | + company2 | 07-03-2023 | 1500 | | + company2 | 07-04-2023 | 1400 | | + company2 | 07-05-2023 | 1500 | | + company2 | 07-06-2023 | 60 | | + company2 | 07-07-2023 | 1100 | | + company2 | 07-08-2023 | 1300 | | + company2 | 07-09-2023 | 1200 | | + company2 | 07-10-2023 | 1300 | | +(20 rows) + +-- using NEXT +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UPDOWN) + DEFINE + START AS TRUE, + UPDOWN AS price > PREV(price) AND price > NEXT(price) +); + company | tdate | price | first_value | last_value +----------+------------+-------+-------------+------------ + company1 | 07-01-2023 | 100 | 100 | 200 + company1 | 07-02-2023 | 200 | | + company1 | 07-03-2023 | 150 | | + company1 | 07-04-2023 | 140 | 140 | 150 + company1 | 07-05-2023 | 150 | | + company1 | 07-06-2023 | 90 | | + company1 | 07-07-2023 | 110 | 110 | 130 + company1 | 07-08-2023 | 130 | | + company1 | 07-09-2023 | 120 | | + company1 | 07-10-2023 | 130 | | + company2 | 07-01-2023 | 50 | 50 | 2000 + company2 | 07-02-2023 | 2000 | | + company2 | 07-03-2023 | 1500 | | + company2 | 07-04-2023 | 1400 | 1400 | 1500 + company2 | 07-05-2023 | 1500 | | + company2 | 07-06-2023 | 60 | | + company2 | 07-07-2023 | 1100 | 1100 | 1300 + company2 | 07-08-2023 | 1300 | | + company2 | 07-09-2023 | 1200 | | + company2 | 07-10-2023 | 1300 | | +(20 rows) + +-- using AFTER MATCH SKIP TO NEXT ROW +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP TO NEXT ROW + INITIAL + PATTERN (START UPDOWN) + DEFINE + START AS TRUE, + UPDOWN AS price > PREV(price) AND price > NEXT(price) +); + company | tdate | price | first_value | last_value +----------+------------+-------+-------------+------------ + company1 | 07-01-2023 | 100 | 100 | 200 + company1 | 07-02-2023 | 200 | | + company1 | 07-03-2023 | 150 | | + company1 | 07-04-2023 | 140 | 140 | 150 + company1 | 07-05-2023 | 150 | | + company1 | 07-06-2023 | 90 | | + company1 | 07-07-2023 | 110 | 110 | 130 + company1 | 07-08-2023 | 130 | | + company1 | 07-09-2023 | 120 | | + company1 | 07-10-2023 | 130 | | + company2 | 07-01-2023 | 50 | 50 | 2000 + company2 | 07-02-2023 | 2000 | | + company2 | 07-03-2023 | 1500 | | + company2 | 07-04-2023 | 1400 | 1400 | 1500 + company2 | 07-05-2023 | 1500 | | + company2 | 07-06-2023 | 60 | | + company2 | 07-07-2023 | 1100 | 1100 | 1300 + company2 | 07-08-2023 | 1300 | | + company2 | 07-09-2023 | 1200 | | + company2 | 07-10-2023 | 1300 | | +(20 rows) + +-- match everything +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + INITIAL + PATTERN (A+) + DEFINE + A AS TRUE +); + company | tdate | price | first_value | last_value +----------+------------+-------+-------------+------------ + company1 | 07-01-2023 | 100 | 100 | 130 + company1 | 07-02-2023 | 200 | | + company1 | 07-03-2023 | 150 | | + company1 | 07-04-2023 | 140 | | + company1 | 07-05-2023 | 150 | | + company1 | 07-06-2023 | 90 | | + company1 | 07-07-2023 | 110 | | + company1 | 07-08-2023 | 130 | | + company1 | 07-09-2023 | 120 | | + company1 | 07-10-2023 | 130 | | + company2 | 07-01-2023 | 50 | 50 | 1300 + company2 | 07-02-2023 | 2000 | | + company2 | 07-03-2023 | 1500 | | + company2 | 07-04-2023 | 1400 | | + company2 | 07-05-2023 | 1500 | | + company2 | 07-06-2023 | 60 | | + company2 | 07-07-2023 | 1100 | | + company2 | 07-08-2023 | 1300 | | + company2 | 07-09-2023 | 1200 | | + company2 | 07-10-2023 | 1300 | | +(20 rows) + +-- backtracking with reclassification of rows +-- using AFTER MATCH SKIP PAST LAST ROW +SELECT company, tdate, price, first_value(tdate) OVER w, last_value(tdate) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + INITIAL + PATTERN (A+ B+) + DEFINE + A AS price > 100, + B AS price > 100 +); + company | tdate | price | first_value | last_value +----------+------------+-------+-------------+------------ + company1 | 07-01-2023 | 100 | | + company1 | 07-02-2023 | 200 | 07-02-2023 | 07-05-2023 + company1 | 07-03-2023 | 150 | | + company1 | 07-04-2023 | 140 | | + company1 | 07-05-2023 | 150 | | + company1 | 07-06-2023 | 90 | | + company1 | 07-07-2023 | 110 | 07-07-2023 | 07-10-2023 + company1 | 07-08-2023 | 130 | | + company1 | 07-09-2023 | 120 | | + company1 | 07-10-2023 | 130 | | + company2 | 07-01-2023 | 50 | | + company2 | 07-02-2023 | 2000 | 07-02-2023 | 07-05-2023 + company2 | 07-03-2023 | 1500 | | + company2 | 07-04-2023 | 1400 | | + company2 | 07-05-2023 | 1500 | | + company2 | 07-06-2023 | 60 | | + company2 | 07-07-2023 | 1100 | 07-07-2023 | 07-10-2023 + company2 | 07-08-2023 | 1300 | | + company2 | 07-09-2023 | 1200 | | + company2 | 07-10-2023 | 1300 | | +(20 rows) + +-- backtracking with reclassification of rows +-- using AFTER MATCH SKIP TO NEXT ROW +SELECT company, tdate, price, first_value(tdate) OVER w, last_value(tdate) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP TO NEXT ROW + INITIAL + PATTERN (A+ B+) + DEFINE + A AS price > 100, + B AS price > 100 +); + company | tdate | price | first_value | last_value +----------+------------+-------+-------------+------------ + company1 | 07-01-2023 | 100 | | + company1 | 07-02-2023 | 200 | 07-02-2023 | 07-05-2023 + company1 | 07-03-2023 | 150 | 07-03-2023 | 07-05-2023 + company1 | 07-04-2023 | 140 | 07-04-2023 | 07-05-2023 + company1 | 07-05-2023 | 150 | | + company1 | 07-06-2023 | 90 | | + company1 | 07-07-2023 | 110 | 07-07-2023 | 07-10-2023 + company1 | 07-08-2023 | 130 | 07-08-2023 | 07-10-2023 + company1 | 07-09-2023 | 120 | 07-09-2023 | 07-10-2023 + company1 | 07-10-2023 | 130 | | + company2 | 07-01-2023 | 50 | | + company2 | 07-02-2023 | 2000 | 07-02-2023 | 07-05-2023 + company2 | 07-03-2023 | 1500 | 07-03-2023 | 07-05-2023 + company2 | 07-04-2023 | 1400 | 07-04-2023 | 07-05-2023 + company2 | 07-05-2023 | 1500 | | + company2 | 07-06-2023 | 60 | | + company2 | 07-07-2023 | 1100 | 07-07-2023 | 07-10-2023 + company2 | 07-08-2023 | 1300 | 07-08-2023 | 07-10-2023 + company2 | 07-09-2023 | 1200 | 07-09-2023 | 07-10-2023 + company2 | 07-10-2023 | 1300 | | +(20 rows) + +-- +-- Error cases +-- +-- row pattern definition variable name must not appear more than once +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + ORDER BY tdate + INITIAL + PATTERN (START UP+ DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price), + UP AS price > PREV(price) +); +ERROR: syntax error at or near "ORDER" +LINE 6: ORDER BY tdate + ^ +-- pattern variable name must appear in DEFINE +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP+ DOWN+ END) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); +ERROR: syntax error at or near "END" +LINE 8: PATTERN (START UP+ DOWN+ END) + ^ +-- FRAME must start at current row when row patttern recognition is used +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP+ DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); +ERROR: FRAME must start at current row when row patttern recognition is used +-- SEEK is not supported +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP TO NEXT ROW + SEEK + PATTERN (START UP+ DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); +ERROR: SEEK is not supported +LINE 8: SEEK + ^ +HINT: Use INITIAL. diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule index 4df9d8503b..896531002b 100644 --- a/src/test/regress/parallel_schedule +++ b/src/test/regress/parallel_schedule @@ -98,7 +98,7 @@ test: publication subscription # Another group of parallel tests # select_views depends on create_view # ---------- -test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data windowxmlmap functional_deps advisory_lock indirect_toast equivclass +test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data windowxmlmap functional_deps advisory_lock indirect_toast equivclass rpr # ---------- # Another group of parallel tests (JSON related) diff --git a/src/test/regress/sql/rpr.sql b/src/test/regress/sql/rpr.sql new file mode 100644 index 0000000000..c4f704cdc4 --- /dev/null +++ b/src/test/regress/sql/rpr.sql @@ -0,0 +1,229 @@ +-- +-- Test for row pattern definition clause +-- + +CREATE TEMP TABLE stock ( + company TEXT, + tdate DATE, + price INTEGER + ); +INSERT INTO stock VALUES ('company1', '2023-07-01', 100); +INSERT INTO stock VALUES ('company1', '2023-07-02', 200); +INSERT INTO stock VALUES ('company1', '2023-07-03', 150); +INSERT INTO stock VALUES ('company1', '2023-07-04', 140); +INSERT INTO stock VALUES ('company1', '2023-07-05', 150); +INSERT INTO stock VALUES ('company1', '2023-07-06', 90); +INSERT INTO stock VALUES ('company1', '2023-07-07', 110); +INSERT INTO stock VALUES ('company1', '2023-07-08', 130); +INSERT INTO stock VALUES ('company1', '2023-07-09', 120); +INSERT INTO stock VALUES ('company1', '2023-07-10', 130); +INSERT INTO stock VALUES ('company2', '2023-07-01', 50); +INSERT INTO stock VALUES ('company2', '2023-07-02', 2000); +INSERT INTO stock VALUES ('company2', '2023-07-03', 1500); +INSERT INTO stock VALUES ('company2', '2023-07-04', 1400); +INSERT INTO stock VALUES ('company2', '2023-07-05', 1500); +INSERT INTO stock VALUES ('company2', '2023-07-06', 60); +INSERT INTO stock VALUES ('company2', '2023-07-07', 1100); +INSERT INTO stock VALUES ('company2', '2023-07-08', 1300); +INSERT INTO stock VALUES ('company2', '2023-07-09', 1200); +INSERT INTO stock VALUES ('company2', '2023-07-10', 1300); + +SELECT * FROM stock; + +-- basic test using PREV +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w, + nth_value(tdate, 2) OVER w AS nth_second + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP+ DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + +-- omit "START" in DEFINE but it is ok because "START AS TRUE" is +-- implicitly defined. per spec. +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w, + nth_value(tdate, 2) OVER w AS nth_second + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP+ DOWN+) + DEFINE + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + +-- the first row start with less than or equal to 100 +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (LOWPRICE UP+ DOWN+) + DEFINE + LOWPRICE AS price <= 100, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + +-- second row raises 120% +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (LOWPRICE UP+ DOWN+) + DEFINE + LOWPRICE AS price <= 100, + UP AS price > PREV(price) * 1.2, + DOWN AS price < PREV(price) +); + +-- using NEXT +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UPDOWN) + DEFINE + START AS TRUE, + UPDOWN AS price > PREV(price) AND price > NEXT(price) +); + +-- using AFTER MATCH SKIP TO NEXT ROW +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP TO NEXT ROW + INITIAL + PATTERN (START UPDOWN) + DEFINE + START AS TRUE, + UPDOWN AS price > PREV(price) AND price > NEXT(price) +); + +-- match everything + +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + INITIAL + PATTERN (A+) + DEFINE + A AS TRUE +); + +-- backtracking with reclassification of rows +-- using AFTER MATCH SKIP PAST LAST ROW +SELECT company, tdate, price, first_value(tdate) OVER w, last_value(tdate) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + INITIAL + PATTERN (A+ B+) + DEFINE + A AS price > 100, + B AS price > 100 +); + +-- backtracking with reclassification of rows +-- using AFTER MATCH SKIP TO NEXT ROW +SELECT company, tdate, price, first_value(tdate) OVER w, last_value(tdate) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP TO NEXT ROW + INITIAL + PATTERN (A+ B+) + DEFINE + A AS price > 100, + B AS price > 100 +); + +-- +-- Error cases +-- + +-- row pattern definition variable name must not appear more than once +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + ORDER BY tdate + INITIAL + PATTERN (START UP+ DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price), + UP AS price > PREV(price) +); + +-- pattern variable name must appear in DEFINE +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP+ DOWN+ END) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + +-- FRAME must start at current row when row patttern recognition is used +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING + INITIAL + PATTERN (START UP+ DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); + +-- SEEK is not supported +SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w + FROM stock + WINDOW w AS ( + PARTITION BY company + ORDER BY tdate + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP TO NEXT ROW + SEEK + PATTERN (START UP+ DOWN+) + DEFINE + START AS TRUE, + UP AS price > PREV(price), + DOWN AS price < PREV(price) +); -- 2.25.1 From f6f5e8594ae388d7873b91568f3b8fcc5d8db07c Mon Sep 17 00:00:00 2001 From: Tatsuo Ishii <ishii@postgresql.org> Date: Sat, 2 Sep 2023 15:32:49 +0900 Subject: [PATCH v5 7/7] Allow to print raw parse tree. --- src/backend/tcop/postgres.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index e4756f8be2..fc8efa915b 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -653,6 +653,10 @@ pg_parse_query(const char *query_string) } #endif + if (Debug_print_parse) + elog_node_display(LOG, "raw parse tree", raw_parsetree_list, + Debug_pretty_print); + TRACE_POSTGRESQL_QUERY_PARSE_DONE(query_string); return raw_parsetree_list; -- 2.25.1
pgsql-hackers by date: