Thread: Possible bug: SQL function parameter in window frame definition
Hi all,
I noticed this strange behaviour whilst trying to write a function for Postgres 11.5 (PostgreSQL 11.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit) and reduced it to this minimal example. Using a function parameter in the window frame definition seems to be the cause of the error.
create or replace function f(group_size bigint) returns setof int[] as
$$
select array_agg(s) over w
from generate_series(1,10) s
window w as (order by s rows between current row and group_size following)
$$ language sql immutable;
Calling the function without a column list succeeds:
postgres=# select f(3);
f
------------
{1,2,3,4}
{2,3,4,5}
{3,4,5,6}
{4,5,6,7}
{5,6,7,8}
{6,7,8,9}
{7,8,9,10}
{8,9,10}
{9,10}
{10}
(10 rows)
Calling the function with select * fails:
postgres=# select * from f(3);
ERROR: 42704: no value found for parameter 1
LOCATION: ExecEvalParamExtern, execExprInterp.c:2296
Using a plpgsql function with a stringified query works, which is my current workaround:
create or replace function f1(group_size bigint) returns setof int[] as
$$
begin
return query execute format($q$
select array_agg(s) over w as t
from generate_series(1,10) s
window w as (order by s rows between current row and %1$s following)
$q$,group_size);
end;
$$ language plpgsql immutable;
This appears to be a bug to me. If confirmed that this is not some expected behaviour unknown to me I will report this.
Alastair
>>>>> "Alastair" == Alastair McKinley <a.mckinley@analyticsengines.com> writes: Alastair> Hi all, Alastair> I noticed this strange behaviour whilst trying to write a Alastair> function for Postgres 11.5 (PostgreSQL 11.5 on Alastair> x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 Alastair> (Red Hat 4.8.5-36), 64-bit) and reduced it to this minimal Alastair> example. Using a function parameter in the window frame Alastair> definition seems to be the cause of the error. [...] Alastair> This appears to be a bug to me. Yes, it's a bug, related to function inlining (the select f(3); is not inlined and therefore works, but the select * from f(3); is being inlined, but the original Param is somehow making it into the final plan rather than being substituted with its value). Looking into why. -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > "Alastair" == Alastair McKinley <a.mckinley@analyticsengines.com> writes: > Alastair> This appears to be a bug to me. > Yes, it's a bug, related to function inlining (the select f(3); is not > inlined and therefore works, but the select * from f(3); is being > inlined, but the original Param is somehow making it into the final plan > rather than being substituted with its value). Looking into why. It looks to me that the reason is that query_tree_mutator (likewise query_tree_walker) fails to visit query->windowClause, which is a bug of the first magnitude if we allow those to contain expressions. Not sure how we've missed that up to now. Looking at struct Query, it seems like that's not the only questionable omission. We're also not descending into Node *utilityStmt; /* non-null if commandType == CMD_UTILITY */ List *groupClause; /* a list of SortGroupClause's */ List *groupingSets; /* a list of GroupingSet's if present */ List *distinctClause; /* a list of SortGroupClause's */ List *sortClause; /* a list of SortGroupClause's */ List *rowMarks; /* a list of RowMarkClause's */ Now probably this is never called on utility statements, and maybe there is never a reason for anyone to examine or mutate SortGroupClauses, GroupingSets, or RowMarkClauses, but I'm not sure it's any business of this module to assume that. regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> It looks to me that the reason is that query_tree_mutator Tom> (likewise query_tree_walker) fails to visit query->windowClause, I noticed this too. I spent some time looking at what might break if that was changed (found two places so far, see attached draft patch). Tom> which is a bug of the first magnitude if we allow those to contain Tom> expressions. Not sure how we've missed that up to now. I suspect because the partition/order by expressions are actually in the targetlist instead (with only SortGroupClause nodes in the windowClause), so only window framing expressions are being missed. Tom> Looking at struct Query, it seems like that's not the only Tom> questionable omission. We're also not descending into Tom> Node *utilityStmt; /* non-null if commandType == CMD_UTILITY */ I assume that utility statements are doing any necessary expression processing themselves... Tom> List *groupClause; /* a list of SortGroupClause's */ There's at least one place that walks this (and the distinct and sort clauses) explicitly (find_expr_references_walker) but most places just aren't interested in SortGroupClause nodes given that the actual expressions are elsewhere. Tom> List *groupingSets; /* a list of GroupingSet's if present */ Likewise, GroupingSet nodes are not any form of expression, they only reference the groupClause entries. Tom> List *distinctClause; /* a list of SortGroupClause's */ Tom> List *sortClause; /* a list of SortGroupClause's */ Same goes as for groupClause. Tom> List *rowMarks; /* a list of RowMarkClause's */ Tom> Now probably this is never called on utility statements, and maybe Tom> there is never a reason for anyone to examine or mutate Tom> SortGroupClauses, GroupingSets, or RowMarkClauses, but I'm not Tom> sure it's any business of this module to assume that. I think the logic that query_tree_walker is specifically there to walk places that might contain _expressions_ is reasonably valid. That said, the fact that we do have one caller that finds it necessary to explicitly walk some of the places that query_tree_walker omits suggests that this decision may have been a mistake. -- Andrew (irc:RhodiumToad) diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index dd0a7d8dac..2862c47314 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -2217,7 +2217,6 @@ find_expr_references_walker(Node *node, /* query_tree_walker ignores ORDER BY etc, but we need those opers */ find_expr_references_walker((Node *) query->sortClause, context); find_expr_references_walker((Node *) query->groupClause, context); - find_expr_references_walker((Node *) query->windowClause, context); find_expr_references_walker((Node *) query->distinctClause, context); /* Examine substructure of query */ diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c index 18bd5ac903..7f485ae29a 100644 --- a/src/backend/nodes/nodeFuncs.c +++ b/src/backend/nodes/nodeFuncs.c @@ -2292,6 +2292,8 @@ query_tree_walker(Query *query, return true; if (walker(query->havingQual, context)) return true; + if (walker(query->windowClause, context)) + return true; if (walker(query->limitOffset, context)) return true; if (walker(query->limitCount, context)) @@ -3151,6 +3153,7 @@ query_tree_mutator(Query *query, MUTATE(query->jointree, query->jointree, FromExpr *); MUTATE(query->setOperations, query->setOperations, Node *); MUTATE(query->havingQual, query->havingQual, Node *); + MUTATE(query->windowClause, query->windowClause, List *); MUTATE(query->limitOffset, query->limitOffset, Node *); MUTATE(query->limitCount, query->limitCount, Node *); if (!(flags & QTW_IGNORE_CTE_SUBQUERIES)) diff --git a/src/backend/parser/parse_collate.c b/src/backend/parser/parse_collate.c index 31a5f702a9..dabd904999 100644 --- a/src/backend/parser/parse_collate.c +++ b/src/backend/parser/parse_collate.c @@ -485,6 +485,7 @@ assign_collations_walker(Node *node, assign_collations_context *context) case T_FromExpr: case T_OnConflictExpr: case T_SortGroupClause: + case T_WindowClause: (void) expression_tree_walker(node, assign_collations_walker, (void *) &loccontext);
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > Tom> Now probably this is never called on utility statements, and maybe > Tom> there is never a reason for anyone to examine or mutate > Tom> SortGroupClauses, GroupingSets, or RowMarkClauses, but I'm not > Tom> sure it's any business of this module to assume that. > I think the logic that query_tree_walker is specifically there to walk > places that might contain _expressions_ is reasonably valid. That said, > the fact that we do have one caller that finds it necessary to > explicitly walk some of the places that query_tree_walker omits suggests > that this decision may have been a mistake. I'm okay with assuming that these functions aren't used on utility statements (but maybe we should add Assert(query->utilityStmt == NULL)?). I'm a bit uncomfortable with skipping the other lists. Admittedly, there's probably not huge value in examining SortGroupClauses in a vacuum (that is, without knowing which list they appear in). The only application I can think of offhand is extracting dependencies, which is already covered by that one caller you mention. However, we need to fix this in all active branches, and I definitely agree with minimizing the amount of change to back branches. The fact that the minimal change breaks (or exposes an oversight in) assign_collations_walker makes it very plausible that it will also break somebody's third-party code. If we push the API change further we increase the risk of breaking stuff. That seems OK in HEAD but not in back branches. regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> However, we need to fix this in all active branches, and I Tom> definitely agree with minimizing the amount of change to back Tom> branches. The fact that the minimal change breaks (or exposes an Tom> oversight in) assign_collations_walker makes it very plausible Tom> that it will also break somebody's third-party code. If we push Tom> the API change further we increase the risk of breaking stuff. Tom> That seems OK in HEAD but not in back branches. We could minimize the chance of breakage in a back-patched fix by having query_tree_walker/mutator iterate the windowClause list itself and invoke the walker only on offset expressions; is it worth it? Walkers that follow the recommended code structure should be unaffected; it only shows up in the collations walker because that treats expressions as the "default" case and tries to explicitly handle all non-expression nodes. -- Andrew (irc:RhodiumToad)
>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes: Andrew> We could minimize the chance of breakage in a back-patched fix Andrew> by having query_tree_walker/mutator iterate the windowClause Andrew> list itself Here is a draft patch along those lines; the intent of this one is that no existing walker or mutator should need to change (the change to the dependency code is basically cosmetic I believe, just avoids walking some things twice). Also added some tests. -- Andrew (irc:RhodiumToad) diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index dd0a7d8dac..03582781f6 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -2214,18 +2214,13 @@ find_expr_references_walker(Node *node, context->addrs); } - /* query_tree_walker ignores ORDER BY etc, but we need those opers */ - find_expr_references_walker((Node *) query->sortClause, context); - find_expr_references_walker((Node *) query->groupClause, context); - find_expr_references_walker((Node *) query->windowClause, context); - find_expr_references_walker((Node *) query->distinctClause, context); - /* Examine substructure of query */ context->rtables = lcons(query->rtable, context->rtables); result = query_tree_walker(query, find_expr_references_walker, (void *) context, - QTW_IGNORE_JOINALIASES); + QTW_IGNORE_JOINALIASES | + QTW_EXAMINE_SORTGROUP); context->rtables = list_delete_first(context->rtables); return result; } diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c index 18bd5ac903..d063bee271 100644 --- a/src/backend/nodes/nodeFuncs.c +++ b/src/backend/nodes/nodeFuncs.c @@ -2296,6 +2296,33 @@ query_tree_walker(Query *query, return true; if (walker(query->limitCount, context)) return true; + if ((flags & QTW_EXAMINE_SORTGROUP)) + { + if (walker((Node *) query->groupClause, context)) + return true; + if (walker((Node *) query->windowClause, context)) + return true; + if (walker((Node *) query->sortClause, context)) + return true; + if (walker((Node *) query->distinctClause, context)) + return true; + } + else + { + /* + * We need to walk the expressions in WindowClause nodes even if we're + * not interested in SortGroupClause nodes. + */ + ListCell *lc; + foreach(lc, query->windowClause) + { + WindowClause *wc = lfirst_node(WindowClause, lc); + if (walker(wc->startOffset, context)) + return true; + if (walker(wc->endOffset, context)) + return true; + } + } if (!(flags & QTW_IGNORE_CTE_SUBQUERIES)) { if (walker((Node *) query->cteList, context)) @@ -3153,6 +3180,38 @@ query_tree_mutator(Query *query, MUTATE(query->havingQual, query->havingQual, Node *); MUTATE(query->limitOffset, query->limitOffset, Node *); MUTATE(query->limitCount, query->limitCount, Node *); + + if ((flags & QTW_EXAMINE_SORTGROUP)) + { + MUTATE(query->groupClause, query->groupClause, List *); + MUTATE(query->windowClause, query->windowClause, List *); + MUTATE(query->sortClause, query->sortClause, List *); + MUTATE(query->distinctClause, query->distinctClause, List *); + } + else + { + /* + * We need to mutate the expressions in WindowClause nodes even if + * we're not interested in SortGroupClause nodes. + */ + List *resultlist; + ListCell *temp; + + resultlist = NIL; + foreach(temp, query->windowClause) + { + WindowClause *wc = lfirst_node(WindowClause, temp); + WindowClause *newnode; + + FLATCOPY(newnode, wc, WindowClause); + MUTATE(newnode->startOffset, wc->startOffset, Node *); + MUTATE(newnode->endOffset, wc->endOffset, Node *); + + resultlist = lappend(resultlist, (Node *) newnode); + } + query->windowClause = resultlist; + } + if (!(flags & QTW_IGNORE_CTE_SUBQUERIES)) MUTATE(query->cteList, query->cteList, List *); else /* else copy CTE list as-is */ diff --git a/src/include/nodes/nodeFuncs.h b/src/include/nodes/nodeFuncs.h index 0cb931c82c..4b5408fa9b 100644 --- a/src/include/nodes/nodeFuncs.h +++ b/src/include/nodes/nodeFuncs.h @@ -27,6 +27,7 @@ #define QTW_EXAMINE_RTES_AFTER 0x20 /* examine RTE nodes after their * contents */ #define QTW_DONT_COPY_QUERY 0x40 /* do not copy top Query */ +#define QTW_EXAMINE_SORTGROUP 0x80 /* include SortGroupNode lists */ /* callback function for check_functions_in_node */ typedef bool (*check_function_callback) (Oid func_id, void *context); diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out index edc93d5729..d5fd4045f9 100644 --- a/src/test/regress/expected/window.out +++ b/src/test/regress/expected/window.out @@ -3821,3 +3821,45 @@ SELECT i, b, bool_and(b) OVER w, bool_or(b) OVER w 5 | t | t | t (5 rows) +-- Tests for problems with failure to walk or mutate expressions +-- within window frame clauses. +-- test walker (fails with collation error if expressions are not walked) +SELECT array_agg(i) OVER w + FROM generate_series(1,5) i +WINDOW w AS (ORDER BY i ROWS BETWEEN (('foo' < 'foobar')::integer) PRECEDING AND CURRENT ROW); + array_agg +----------- + {1} + {1,2} + {2,3} + {3,4} + {4,5} +(5 rows) + +-- test mutator (fails when inlined if expressions are not mutated) +CREATE FUNCTION pg_temp.f(group_size BIGINT) RETURNS SETOF integer[] +AS $$ + SELECT array_agg(s) OVER w + FROM generate_series(1,5) s + WINDOW w AS (ORDER BY s ROWS BETWEEN CURRENT ROW AND GROUP_SIZE FOLLOWING) +$$ LANGUAGE SQL STABLE; +EXPLAIN (costs off) SELECT * FROM pg_temp.f(2); + QUERY PLAN +------------------------------------------------------ + Subquery Scan on f + -> WindowAgg + -> Sort + Sort Key: s.s + -> Function Scan on generate_series s +(5 rows) + +SELECT * FROM pg_temp.f(2); + f +--------- + {1,2,3} + {2,3,4} + {3,4,5} + {4,5} + {5} +(5 rows) + diff --git a/src/test/regress/sql/window.sql b/src/test/regress/sql/window.sql index fc6d4cc903..fe273aa31e 100644 --- a/src/test/regress/sql/window.sql +++ b/src/test/regress/sql/window.sql @@ -1257,3 +1257,22 @@ SELECT to_char(SUM(n::float8) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND 1 FO SELECT i, b, bool_and(b) OVER w, bool_or(b) OVER w FROM (VALUES (1,true), (2,true), (3,false), (4,false), (5,true)) v(i,b) WINDOW w AS (ORDER BY i ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING); + +-- Tests for problems with failure to walk or mutate expressions +-- within window frame clauses. + +-- test walker (fails with collation error if expressions are not walked) +SELECT array_agg(i) OVER w + FROM generate_series(1,5) i +WINDOW w AS (ORDER BY i ROWS BETWEEN (('foo' < 'foobar')::integer) PRECEDING AND CURRENT ROW); + +-- test mutator (fails when inlined if expressions are not mutated) +CREATE FUNCTION pg_temp.f(group_size BIGINT) RETURNS SETOF integer[] +AS $$ + SELECT array_agg(s) OVER w + FROM generate_series(1,5) s + WINDOW w AS (ORDER BY s ROWS BETWEEN CURRENT ROW AND GROUP_SIZE FOLLOWING) +$$ LANGUAGE SQL STABLE; + +EXPLAIN (costs off) SELECT * FROM pg_temp.f(2); +SELECT * FROM pg_temp.f(2);
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > Andrew> We could minimize the chance of breakage in a back-patched fix > Andrew> by having query_tree_walker/mutator iterate the windowClause > Andrew> list itself > Here is a draft patch along those lines; the intent of this one is that > no existing walker or mutator should need to change (the change to the > dependency code is basically cosmetic I believe, just avoids walking > some things twice). Hmm. I think this is a reasonable direction to go in, but what about groupingSets and rowMarks? Also, in HEAD I'd be inclined to add assertions about utilityStmt being NULL. regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> Here is a draft patch along those lines; the intent of this one is >> that no existing walker or mutator should need to change (the change >> to the dependency code is basically cosmetic I believe, just avoids >> walking some things twice). Tom> Hmm. I think this is a reasonable direction to go in, but Tom> what about groupingSets and rowMarks? groupingSets ultimately contains nothing but numbers which are meaningless without reference to the matching groupClause list. So anything that cares about those is really going to have to process them in its Query case in the walker function in order to get at both clauses. Similarly, rowMarks contains indexes into the rangetable (and no recursive substructure at all), so it's likewise better processed at the Query level. Tom> Also, in HEAD I'd be inclined to add assertions about utilityStmt Tom> being NULL. Yup. -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > Tom> Hmm. I think this is a reasonable direction to go in, but > Tom> what about groupingSets and rowMarks? > groupingSets ultimately contains nothing but numbers which are > meaningless without reference to the matching groupClause list. So > anything that cares about those is really going to have to process them > in its Query case in the walker function in order to get at both > clauses. Ah. I was thinking there were SortGroupClauses under them, but that was based on an overly hasty reading of the parsenodes.h comments. No further complaints. regards, tom lane