*** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *************** *** 10061,10066 **** SELECT count(*) FROM sometable; --- 10061,10297 ---- + + Window Functions + + + Window functions provide facilities of + windowed table calculation, including window aggregate. + + The built-in window functions are listed in + . + All of aggregate functions in and + can be also used in + the windowed table. + + + + General-Purpose Window Functions + + + + + Function + Argument Type + Return Type + Description + + + + + + + + rank + + rank() + + + + bigint + + number of the row from 1 ordered by sort_expressions in the partition, with gaps + + + + + + dense_rank + + dense_rank() + + + + bigint + + number of the row from 1 ordered by sort_expressions in the partition, without gaps + + + + + + row_number() + + row_number() + + + + bigint + + number of the row from 1 ordered by sort_expressions in the partition, incrementing always + + + + + + percent_rank() + + percent_rank() + + + + double precision + + relative rank of the row between 0 and 1 ordered by sort_expressions in the partition, with gap + + + + + + cume_dist() + + cume_dist() + + + + double precision + + relative rank of the row between 0 and 1 ordered by sort_expressions in the partition, without gap + + + + + + ntile() + + ntile(integer) + + + integer + + + integer + + integer value ranging from 1 to the argument integer, equally dividing the partition + + + + + + lag + + + lag(value, offset, [default]) + + + + any, integer[, any] + + + same type as value + + + returns value evaluated on + the row that is offset number of + rows before the current row within the partition. If default is specified and the target row is out + of the partition, default is returned. + + + + + + + lead + + + lead(value, offset, [default]) + + + + any, integer[, any] + + + same type as value + + + returns value evaluated on + the row that is offset number of + rows after the current row within the partition. If default is specified and the target row is out + of the partition, default is returned. + + + + + + + first_value + + first_value(value) + + + any + + + same type as value + + + returns value evaluated on the row + that is the first row of the frame + + + + + + + last_value + + last_value(value) + + + any + + + same type as value + + + returns value evaluated on the row + that is the last row of the frame + + + + + + + nth_value + + + nth_value(value, nth) + + + + any, integer + + + same type as value + + + returns value evaluated on the row + that is the nth row of the + frame + + + + +
+ +
Subquery Expressions *** a/doc/src/sgml/keywords.sgml --- b/doc/src/sgml/keywords.sgml *************** *** 1510,1516 **** EXCLUDE ! non-reserved --- 1510,1516 ---- EXCLUDE ! non-reserved non-reserved *************** *** 2868,2874 **** OTHERS ! non-reserved --- 2868,2874 ---- OTHERS ! non-reserved non-reserved *************** *** 2896,2902 **** OVER ! reserved --- 2896,2902 ---- OVER ! reserved reserved *************** *** 3015,3021 **** PARTITION ! reserved --- 3015,3021 ---- PARTITION ! non-reserved reserved *************** *** 3106,3112 **** PRECEDING ! non-reserved --- 3106,3112 ---- PRECEDING ! reserved non-reserved *************** *** 3204,3210 **** RANGE ! reserved --- 3204,3210 ---- RANGE ! reserved reserved *************** *** 4128,4134 **** TIES ! non-reserved --- 4128,4134 ---- TIES ! non-reserved non-reserved *************** *** 4317,4323 **** UNBOUNDED ! non-reserved --- 4317,4323 ---- UNBOUNDED ! non-reserved non-reserved *************** *** 4590,4596 **** WINDOW ! reserved --- 4590,4596 ---- WINDOW ! reserved reserved *** a/doc/src/sgml/queries.sgml --- b/doc/src/sgml/queries.sgml *************** *** 950,955 **** SELECT product_id, p.name, (sum(s.units) * (p.price - p.cost)) AS profit --- 950,1017 ---- to be the same in all parts of the query. + + + The <literal>WINDOW</literal> Clauses + + + WINDOW + + + + windowing + + + + After WHERE and GROUP BY process, + rows might be windowed table, using the WINDOW + clause. + + + + SELECT function_call OVER + (window_specification) + FROM ... + + SELECT function_call OVER window_name + FROM ... + WHERE ... + GROUP BY ... + HAVING ... + WINDOW window_name AS (window_specification) + + + + where window_specification is given as: + + + + PARTITION BY partition_expression ORDER BY sort_expression + + + + The WINDOW clause specifies what kind of window can be used in the + . Since window + functions can specify its own window with OVER clause, no WINDOW + clause may be needed even if the SELECT list has window functions. + But when more than one function specify the same window specification, + the WINDOW clause saves complexed SQL command. + + + + Windowing operations are sometimes done with sort operations. If you + need exact ordered result as the final output, you must specify + ORDER BY clause. Also, it is better that you don't + assume any ordered result based on the subquery or physical table + placement. + + + + If there is window_name that isn't referred by + any window functions in the WINDOW clause, the + window_specification is only ignored. + + *** a/doc/src/sgml/query.sgml --- b/doc/src/sgml/query.sgml *************** *** 805,810 **** SELECT city, max(temp_lo) --- 805,970 ---- + + Window Functions + + + A window function is the operation across a set of rows in + a windowed table. This may sound similar to aggregate functions, + but contrast to that window functions don't reduce rows. + Additionally, window functions can compute different results + row by row. + + + + Here is an example that shows how to compare each employee's salary + with the department average salary. + + + SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary; + + + + depname | empno | salary | avg + -----------+-------+--------+----------------------- + develop | 11 | 5200 | 5020.0000000000000000 + develop | 7 | 4200 | 5020.0000000000000000 + develop | 9 | 4500 | 5020.0000000000000000 + develop | 8 | 6000 | 5020.0000000000000000 + develop | 10 | 5200 | 5020.0000000000000000 + personnel | 5 | 3500 | 3700.0000000000000000 + personnel | 2 | 3900 | 3700.0000000000000000 + sales | 3 | 4800 | 4866.6666666666666667 + sales | 1 | 5000 | 4866.6666666666666667 + sales | 4 | 4800 | 4866.6666666666666667 + (10 rows) + + + avg works exact same as the aggregate functions + except it doesn't reduce rows and returns same result within the + same depname. Without reducing rows, + it is possible to compare the original salary + with each department's average salary. + + + + Another expample shows different capability of window functions + from above. + + + SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary DESC) FROM empsalary; + + + + depname | empno | salary | rank + -----------+-------+--------+------ + develop | 8 | 6000 | 1 + develop | 10 | 5200 | 2 + develop | 11 | 5200 | 2 + develop | 9 | 4500 | 4 + develop | 7 | 4200 | 5 + personnel | 2 | 3900 | 1 + personnel | 5 | 3500 | 2 + sales | 1 | 5000 | 1 + sales | 4 | 4800 | 2 + sales | 3 | 4800 | 2 + (10 rows) + + + rank returns offset position of the row when + the rows in the partition are ordered by the specified value. In this case, + in each department each employee's salary rank with gap is shown. + + + + In addition to partition there is another concept called + frame as a set of rows. Currently frame can + only be specified by ORDER BY clause. + + + + SELECT salary, sum(salary) OVER() FROM empsalary; + + + + salary | sum + --------+------- + 5200 | 47100 + 5000 | 47100 + 3500 | 47100 + 4800 | 47100 + 3900 | 47100 + 4200 | 47100 + 4500 | 47100 + 4800 | 47100 + 6000 | 47100 + 5200 | 47100 + (10 rows) + + + + The example above has a frame that contains all rows of + the partition. Since window aggregate functions collects + all rows contained in the frame, the results of each + row is sum of the all salary. + + + + SELECT salary, sum(salary) OVER(ORDER BY salary) FROM empsalary; + + + + salary | sum + --------+------- + 3500 | 3500 + 3900 | 7400 + 4200 | 11600 + 4500 | 16100 + 4800 | 20900 + 4800 | 25700 + 5000 | 30700 + 5200 | 35900 + 5200 | 41100 + 6000 | 47100 + (10 rows) + + + + Contrast to that example, this one specifies ORDER BY explicitly, + and there is a frame including rows that is returned between + the first row and the current row. So the results of the sum + is cumulative total from the first row. This is because the frame + contains not all the rows in the partition. + + + + Window functions are put in the SELECT list. + It is forbidden anywhere else such as GROUP BY, + HAVING and WHERE clauses. + The arguments passed to window functions and the expressions in + PARTITION BY and ORDER BY + of a window definition can be the results of aggregate functions. + But window functions may not be placed as aggregate functions' + arguments. All of window functions are evaluated after aggregate. + + + + In a query, windows can be defined as many as needed. The order + of the evaluation for each window is implicitly determined by + backend, which means there is no way to predict its order. + + + + The same window definitions can be named and put togather into one + definition using . + + + SELECT sum(salary) OVER w, avg(salary) OVER w FROM empsalary WINDOW w AS (PARTITION BY depname); + + + The two of functions are evaluated in the same window. + + Updates *** a/doc/src/sgml/ref/select.sgml --- b/doc/src/sgml/ref/select.sgml *************** *** 27,32 **** SELECT [ ALL | DISTINCT [ ON ( expressioncondition ] [ GROUP BY expression [, ...] ] [ HAVING condition [, ...] ] + [ WINDOW window_name AS ( window_definition ) [, ...] ] [ { UNION | INTERSECT | EXCEPT } [ ALL ] select ] [ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ] [ LIMIT { count | ALL } ] *************** *** 552,557 **** HAVING condition --- 553,590 ---- + + <literal>WINDOW</literal> Clause + + + The optional WINDOW clause has the general form + + WINDOW window_name AS (window_definition) [, ...] + + where window_name is + the window name that is referred from windowed functions, and + window_definition is described as follows: + + + [ PARTITION BY expression [, ...] ] + [ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ] + + where the first expression is the same as + one in GROUP BY and the second is the same as + one in ORDER BY of the SELECT. + Both of these can be omitted. + + + + When the window functions with OVER clause in the + SELECT list refers to a window name, WINDOW + clause must describe what the windowed table is like. If some of the + window names are not referred from any of the window function calls, + they will be just ignored. In the window definition the + expressions are allowed to be aggregated. + + + <command>SELECT</command> List *** a/doc/src/sgml/syntax.sgml --- b/doc/src/sgml/syntax.sgml *************** *** 1539,1544 **** sqrt(2) --- 1539,1596 ---- + + Windowed Tables + + + A windowed table is a table with one or more windows. + A window is a transient set of rows split within a table + described by a window definition. The windowed table allows + to process values across multiple rows within the window. + + + + The windowed table comes with window function calls. + + + function_call ([arg]) OVER (PARTITION BY partition_column [ , ... ] ORDER BY order_column [ , ... ]) + function_call ([arg]) OVER window_name + + + where arg, + partition_column and + order_column are normal target + list such as table column and subquery result as well as + the result of GROUP BY process. Either + PARTITION clause or + ORDER clause must be specified in a + window definition. The window_name + in the second form indicates use of a window that is defined + later WINDOW clause. In the windowed tables + any of aggregate function can be called. Additionally predefined + window functions may be used to calculate rank values. + + + + The predefined ranking window functions are described in + . Currently, window functions + cannot be defined by the user. + + + + Windowed functions doesn't accept DISTINCT and ALL syntax, even though + the function is an aggregate function. + + + + Windowed functions are not placed in any of GROUP BY, HAVING and + WHERE clauses, which process values before any of the windows. If + there is need to qualify rows by the result of windowed functions, + whole of the query must be nested and append WHERE clause outer of + the current query. + + + Type Casts *** a/src/backend/catalog/pg_aggregate.c --- b/src/backend/catalog/pg_aggregate.c *************** *** 294,299 **** lookup_agg_function(List *fnName, --- 294,301 ---- Oid fnOid; bool retset; int nvargs; + bool isagg; + bool iswfunc; Oid *true_oid_array; FuncDetailCode fdresult; AclResult aclresult; *************** *** 308,313 **** lookup_agg_function(List *fnName, --- 310,316 ---- */ fdresult = func_get_detail(fnName, NIL, nargs, input_types, false, &fnOid, rettype, &retset, &nvargs, + &isagg, &iswfunc, &true_oid_array); /* only valid case is a normal function not returning a set */ *** a/src/backend/catalog/pg_proc.c --- b/src/backend/catalog/pg_proc.c *************** *** 290,295 **** ProcedureCreate(const char *procedureName, --- 290,296 ---- values[Anum_pg_proc_prorows - 1] = Float4GetDatum(prorows); values[Anum_pg_proc_provariadic - 1] = ObjectIdGetDatum(variadicType); values[Anum_pg_proc_proisagg - 1] = BoolGetDatum(isAgg); + values[Anum_pg_proc_proiswfunc - 1] = BoolGetDatum(false); /* temporarily */ values[Anum_pg_proc_prosecdef - 1] = BoolGetDatum(security_definer); values[Anum_pg_proc_proisstrict - 1] = BoolGetDatum(isStrict); values[Anum_pg_proc_proretset - 1] = BoolGetDatum(returnsSet); *** a/src/backend/commands/explain.c --- b/src/backend/commands/explain.c *************** *** 626,631 **** explain_outNode(StringInfo str, --- 626,634 ---- case T_Limit: pname = "Limit"; break; + case T_Window: + pname = "Window"; + break; case T_Hash: pname = "Hash"; break; *************** *** 912,917 **** explain_outNode(StringInfo str, --- 915,922 ---- show_sort_info((SortState *) planstate, str, indent, es); break; + case T_Window: + break; case T_Result: show_upper_qual((List *) ((Result *) plan)->resconstantqual, "One-Time Filter", plan, *** a/src/backend/executor/Makefile --- b/src/backend/executor/Makefile *************** *** 20,26 **** OBJS = execAmi.o execCurrent.o execGrouping.o execJunk.o execMain.o \ nodeHashjoin.o nodeIndexscan.o nodeMaterial.o nodeMergejoin.o \ nodeNestloop.o nodeFunctionscan.o nodeRecursiveunion.o nodeResult.o \ nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \ ! nodeValuesscan.o nodeCtescan.o nodeWorktablescan.o \ nodeLimit.o nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \ tstoreReceiver.o spi.o --- 20,26 ---- nodeHashjoin.o nodeIndexscan.o nodeMaterial.o nodeMergejoin.o \ nodeNestloop.o nodeFunctionscan.o nodeRecursiveunion.o nodeResult.o \ nodeSeqscan.o nodeSetOp.o nodeSort.o nodeUnique.o \ ! nodeValuesscan.o nodeCtescan.o nodeWindow.o nodeWorktablescan.o \ nodeLimit.o nodeGroup.o nodeSubplan.o nodeSubqueryscan.o nodeTidscan.o \ tstoreReceiver.o spi.o *** a/src/backend/executor/execAmi.c --- b/src/backend/executor/execAmi.c *************** *** 42,47 **** --- 42,48 ---- #include "executor/nodeValuesscan.h" #include "executor/nodeCtescan.h" #include "executor/nodeWorktablescan.h" + #include "executor/nodeWindow.h" #include "nodes/nodeFuncs.h" #include "utils/syscache.h" *************** *** 226,231 **** ExecReScan(PlanState *node, ExprContext *exprCtxt) --- 227,236 ---- ExecReScanLimit((LimitState *) node, exprCtxt); break; + case T_WindowState: + ExecReScanWindow((WindowState *) node, exprCtxt); + break; + default: elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node)); break; *** a/src/backend/executor/execGrouping.c --- b/src/backend/executor/execGrouping.c *************** *** 16,21 **** --- 16,22 ---- #include "executor/executor.h" #include "parser/parse_oper.h" + #include "utils/builtins.h" #include "utils/lsyscache.h" #include "utils/memutils.h" #include "utils/syscache.h" *************** *** 565,567 **** TupleHashTableMatch(const void *key1, const void *key2, Size keysize) --- 566,589 ---- else return 1; } + + /* + * GetAggInitVal + * return datum represented by textInitVal. + */ + Datum + GetAggInitVal(Datum textInitVal, Oid transtype) + { + Oid typinput, + typioparam; + char *strInitVal; + Datum initVal; + + getTypeInputInfo(transtype, &typinput, &typioparam); + strInitVal = TextDatumGetCString(textInitVal); + initVal = OidInputFunctionCall(typinput, strInitVal, + typioparam, -1); + pfree(strInitVal); + return initVal; + } + *** a/src/backend/executor/execProcnode.c --- b/src/backend/executor/execProcnode.c *************** *** 106,111 **** --- 106,112 ---- #include "executor/nodeValuesscan.h" #include "executor/nodeCtescan.h" #include "executor/nodeWorktablescan.h" + #include "executor/nodeWindow.h" #include "miscadmin.h" *************** *** 280,285 **** ExecInitNode(Plan *node, EState *estate, int eflags) --- 281,291 ---- estate, eflags); break; + case T_Window: + result = (PlanState *) ExecInitWindow((Window *) node, + estate, eflags); + break; + default: elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node)); result = NULL; /* keep compiler quiet */ *************** *** 441,446 **** ExecProcNode(PlanState *node) --- 447,456 ---- result = ExecLimit((LimitState *) node); break; + case T_WindowState: + result = ExecWindow((WindowState *) node); + break; + default: elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node)); result = NULL; *************** *** 613,618 **** ExecCountSlotsNode(Plan *node) --- 623,632 ---- case T_Limit: return ExecCountSlotsLimit((Limit *) node); + case T_Window: + return ExecCountSlotsWindow((Window *) node); + break; + default: elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node)); break; *************** *** 765,770 **** ExecEndNode(PlanState *node) --- 779,788 ---- ExecEndLimit((LimitState *) node); break; + case T_WindowState: + ExecEndWindow((WindowState *) node); + break; + default: elog(ERROR, "unrecognized node type: %d", (int) nodeTag(node)); break; *** a/src/backend/executor/execQual.c --- b/src/backend/executor/execQual.c *************** *** 62,67 **** static Datum ExecEvalArrayRef(ArrayRefExprState *astate, --- 62,70 ---- static Datum ExecEvalAggref(AggrefExprState *aggref, ExprContext *econtext, bool *isNull, ExprDoneCond *isDone); + static Datum ExecEvalWFunc(WFuncExprState *wfunc, + ExprContext *econtext, + bool *isNull, ExprDoneCond *isDone); static Datum ExecEvalVar(ExprState *exprstate, ExprContext *econtext, bool *isNull, ExprDoneCond *isDone); static Datum ExecEvalScalarVar(ExprState *exprstate, ExprContext *econtext, *************** *** 444,449 **** ExecEvalAggref(AggrefExprState *aggref, ExprContext *econtext, --- 447,476 ---- } /* ---------------------------------------------------------------- + * ExecEvalWFunc + * + * Returns a Datum whose value is the value of the precomputed + * window function found in the given expression context. + * + * Note: WFunc uses ecxt_aggvalues for stored result + * because Window node never contains Aggref node. + * ---------------------------------------------------------------- + */ + static Datum + ExecEvalWFunc(WFuncExprState *wfunc, ExprContext *econtext, + bool *isNull, ExprDoneCond *isDone) + { + if (isDone) + *isDone = ExprSingleResult; + + if (econtext->ecxt_aggvalues == NULL) /* safety check */ + elog(ERROR, "no aggregates in this expression context"); + + *isNull = econtext->ecxt_aggnulls[wfunc->funcno]; + return econtext->ecxt_aggvalues[wfunc->funcno]; + } + + /* ---------------------------------------------------------------- * ExecEvalVar * * Returns a Datum whose value is the value of a range *************** *** 4140,4145 **** ExecInitExpr(Expr *node, PlanState *parent) --- 4167,4214 ---- state = (ExprState *) astate; } break; + case T_WFunc: + { + WFunc *wfunc = (WFunc *) node; + WFuncExprState *wfstate = makeNode(WFuncExprState); + + wfstate->xprstate.evalfunc = (ExprStateEvalFunc) ExecEvalWFunc; + if (parent && IsA(parent, WindowState)) + { + WindowState *winstate = (WindowState *) parent; + int nfuncs = winstate->numfuncs, + naggs = winstate->numaggs; + + winstate->funcs = lcons(wfstate, winstate->funcs); + nfuncs = ++winstate->numfuncs; + if(wfunc->pure_agg) + naggs = ++winstate->numaggs; + + wfstate->args = (List *) ExecInitExpr((Expr *) wfunc->args, + parent); + /* + * Complain if the wfunc's arguments contain any + * wfuncs; nested agg functions are semantically + * nonsensical. (This should have been caught earlier, + * but we defend against it here anyway.) + */ + if (nfuncs != winstate->numfuncs) + ereport(ERROR, + (errcode(ERRCODE_WINDOWING_ERROR), + errmsg("window function calls cannot be nested"))); + if (naggs != winstate->numaggs) + ereport(ERROR, + (errcode(ERRCODE_WINDOWING_ERROR), + errmsg("window function calls cannot be nested"))); + } + else + { + /* planner messed up */ + elog(ERROR, "wfunc found in non-Window plan node"); + } + state = (ExprState *) wfstate; + } + break; case T_ArrayRef: { ArrayRef *aref = (ArrayRef *) node; *** a/src/backend/executor/nodeAgg.c --- b/src/backend/executor/nodeAgg.c *************** *** 233,239 **** static AggHashEntry lookup_hash_entry(AggState *aggstate, static TupleTableSlot *agg_retrieve_direct(AggState *aggstate); static void agg_fill_hash_table(AggState *aggstate); static TupleTableSlot *agg_retrieve_hash_table(AggState *aggstate); - static Datum GetAggInitVal(Datum textInitVal, Oid transtype); /* --- 233,238 ---- *************** *** 1586,1607 **** ExecInitAgg(Agg *node, EState *estate, int eflags) return aggstate; } - static Datum - GetAggInitVal(Datum textInitVal, Oid transtype) - { - Oid typinput, - typioparam; - char *strInitVal; - Datum initVal; - - getTypeInputInfo(transtype, &typinput, &typioparam); - strInitVal = TextDatumGetCString(textInitVal); - initVal = OidInputFunctionCall(typinput, strInitVal, - typioparam, -1); - pfree(strInitVal); - return initVal; - } - int ExecCountSlotsAgg(Agg *node) { --- 1585,1590 ---- *** /dev/null --- b/src/backend/executor/nodeWindow.c *************** *** 0 **** --- 1,2396 ---- + /*------------------------------------------------------------------------- + * + * nodeWindow.c + * + * Portions Copyright (c) 1996-2008, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * Window node evaluates only WFunc expression. Any other expressions are + * retrieved from the outer plan as Var or Const node. This is based on + * the fact window functions may evaluate those nodes more than once. + * The node collects rows into its WindowObject based on its row buffering + * strategy, which is classified into three; Row, Frame and Partition. + * Strategy requirement depends on types of desparate window functions. + * Each function declares somehow (currently macro in .c source code) which + * buffering is needed, decided by the usage of window function APIs. + * If a function chooses Row buffering, it is allowed to use WinRowXXX APIs + * and if Frame buffering, allowed to use WinRowXXX and WinFrameXXX, and if + * Partition buffering allowed to use WinRowXXX, WinFrameXXX and WinPartXXX. + * This mechanism avoids to store unused rows, which would be very expensive + * if the number of buffered rows got huge. + * + * The Row and Partition bufferings is relatively easy to operate, whereas + * Frame is hard because it may move within the partition. If the frame + * cuts off some preceding rows we call it "to shrink", and if the frame + * feeds in some following rows we call it "to extend". A moving frame + * moves shrinking and extending rows when the current row advances. + * + * A window function is defined as a function maked as wfunc in pg_proc. By this + * mark, it means the function can handle window function APIs that allow + * it to access arbitrary random rows within the window. + * + * Window node can aggregate function as well, treating it as special case. + * The aggregated result is cached in a WindowStatePerAgg struct and is + * recycled if the frame wasn't either shrinked nor extended. If not shrinked + * but extended, the unprocessed rows are passed to trans function and the result + * would be finalized again. If not shrinked and not extended, the result is + * reused without additional trans/final function calls. If shrinked, the cached + * result values cannot be used so the node initializes state and + * aggregate values from the head of the frame again. This is not efficient + * so the aggregate function can be a window function, which can subtract + * values from the shrinking rows for the next execution, in order to avoid + * whole the aggregate process is processed again. fcinfo->context will be + * a WindowState instead of AggState if the aggregate function is called as + * a window function. + * + * Currently Window node assume its input is sorted appropriately so it doesn't + * care sort operation. It may be optimized by HashTable or something, which + * is regarded as advanced challenges so we don't touch it for the first version. + * + * Note that the solid concept of the window functions is "they can access + * arbitrary rows within the frame as they want". As far as we keep the rule + * of thumb, any kind of optimization is allowed. + * + * IDENTIFICATION + * $Id$ + * + *------------------------------------------------------------------------- + */ + + #include "postgres.h" + + #include "catalog/pg_aggregate.h" + #include "catalog/pg_proc.h" + #include "catalog/pg_type.h" + #include "executor/executor.h" + #include "executor/nodeWindow.h" + #include "miscadmin.h" + #include "nodes/nodeFuncs.h" + #include "optimizer/clauses.h" + #include "parser/parse_agg.h" + #include "parser/parse_coerce.h" + #include "parser/parse_expr.h" + #include "parser/parse_oper.h" + #include "utils/acl.h" + #include "utils/builtins.h" + #include "utils/lsyscache.h" + #include "utils/memutils.h" + #include "utils/syscache.h" + #include "utils/tuplestore.h" + #include "utils/datum.h" + + typedef struct WindowStatePerAggData + { + /* number of input arguments for aggregate */ + int numArguments; + + /* Oids of transfer functions */ + Oid transfn_oid; + Oid finalfn_oid; /* may be InvalidOid */ + + /* + * fmgr lookup data for transfer functions --- only valid when + * corresponding oid is not InvalidOid. Note in particular that fn_strict + * flags are kept here. + */ + FmgrInfo transfn; + FmgrInfo finalfn; + + /* + * initial value from pg_aggregate entry + */ + Datum initValue; + bool initValueIsNull; + + /* + * cached value for non-moving frame + */ + Datum resultValue; + bool resultValueIsNull; + bool hasResult; + + /* + * We need the len and byval info for the agg's input, result, and + * transition data types in order to know how to copy/delete values. + */ + int16 inputtypeLen, + resulttypeLen, + transtypeLen; + bool inputtypeByVal, + resulttypeByVal, + transtypeByVal; + + /* point to perfuncstate */ + int funcno; + } WindowStatePerAggData; + + typedef struct WindowStatePerGroupData + { + Datum transValue; /* current transition value */ + bool transValueIsNull; + + bool noTransValue; /* true if transValue not set yet */ + bool doneResult; /* true if cached result has been returned */ + } WindowStatePerGroupData; + + typedef struct WindowStatePerFuncData + { + /* Links to WFunc expr and state nodes this working state is for */ + WFuncExprState *wfuncstate; + WFunc *wfunc; + + /* number of input arguments for aggregate */ + int numArguments; + + /* + * fmgr lookup data for transfer functions --- only valid when + * corresponding oid is not InvalidOid. Note in particular that fn_strict + * flags are kept here. + */ + FmgrInfo flinfo; + + /* + * We need the len and byval info for the result of each function + * in order to know how to copy/delete values. + */ + int16 resulttypeLen; + bool resulttypeByVal; + + /* not window supported (normal aggregate) function? */ + bool pure_agg; + int aggno; + } WindowStatePerFuncData; + + /* + * WindowObject is the core data that represents the window. + * The window may have frame and partition but not always, it depends + * on the Buffering Strategy. All the window function APIs are + * called with this object. + */ + typedef struct WindowObjectData + { + WindowState *winstate; /* parent WindowState */ + ExprContext *econtext; /* window expression context as like window node */ + Tuplestorestate *buffer; /* rows storage */ + int strategy; /* buffering strategy */ + HeapTuple currentrow; /* actual data of the current row */ + + /* + * currentptr is actually pointing the next row to the current. + */ + int currentptr; + + /* + * while iterator is executing, any seeking operation is forbidden, since + * iterator omits its seeking by seq scanning buffer without resetting + * buffer read pointers. + */ + bool iterating; + + /* + * frame informations + */ + int64 f_currentpos; /* relative position of the current row in the frame */ + int f_headptr; /* read pointer of the head row */ + int f_tailptr; /* read pointer of the tail row */ + int64 f_rownum; /* total row number of the frame */ + int f_shrinking; /* frame's shrinking row number until the next frame */ + bool f_shrinked; /* frame was shrinked since the last time */ + int f_extended; /* frame's extended row number since the last time */ + + int64 p_currentpos; /* relative position of the current row in the partition */ + int p_headptr; /* read pointer of the head row */ + int p_tailptr; /* read pointer of the tail row */ + int64 p_rownum; /* total row number of the frame */ + } WindowObjectData; + + /* + * WindowIter is an iterator to scan rows in the frame or + * the partition. Since WinFrameGetArg/WinPartGetArg or etc. + * is too heavy to call many times, iterator makes it easy. + * During the iterator scans rows, WindowObject related to + * this iterator is locked and cannot seek by itself. First + * finish the iterator scan then WindowObject can seek rows again. + */ + typedef struct WindowIterData + { + WindowObject winobj; /* the frame which this itrator came from */ + int type; /* FRAME or PARTITION */ + int position; /* current position within the window */ + bool finished; /* has finished? */ + } WindowIterData; + + #define WINDOW_CHECK_STRATEGY(winobj, req) do{\ + if ((winobj)->strategy < (req))\ + elog(ERROR, "window function API violation");\ + } while(0) + + #define WINDOW_TYPE_ROW 1 + #define WINDOW_TYPE_FRAME 2 + #define WINDOW_TYPE_PARTITION 3 + + static void initialize_windowaggregate(WindowState *winstate, + WindowStatePerFunc perfuncstate, WindowStatePerAgg peraggstate, + WindowStatePerGroup pergroupstate); + static void advance_windowaggregate(WindowState *winstate, + WindowIterData *iter, WindowStatePerFunc perfuncstate, + WindowStatePerAgg peraggstate, WindowStatePerGroup pergroupstate); + static void finalize_windowaggregate(WindowState *winstate, + WindowStatePerFunc perfuncstate, WindowStatePerAgg peraggstate, + WindowStatePerGroup pergroupstate, Datum *result, bool *isnull); + + static void eval_windowaggregate(WindowState *winstate, WindowObject winobj); + static void eval_windowfunction(WindowState *winstate, WindowObject winobj, + WindowStatePerFunc perfuncstate, Datum *result, bool *isnull); + + static WindowObject init_window_object(WindowState *winstate); + static void advance_row(WindowState *winstate); + static void start_frame(WindowState *winstate); + static void finish_frame(WindowState *winstate); + static void store_partition(WindowState *winstate); + static void release_partition(WindowState *winstate); + + static WindowStatePerAggData *initialize_peragg(WindowState *winstate, WFunc *wfunc, + WindowStatePerAgg peraggstate); + static bool is_peer(WindowState *winstate, TupleTableSlot *slot1, TupleTableSlot *slot2); + static bool window_seek(WindowObject winobj, int seek_pos, int seek_type, int window_type); + + /* + * initialize_windowaggregate + * parallel to initialize_aggregate in nodeAgg.c + */ + static void + initialize_windowaggregate(WindowState *winstate, + WindowStatePerFunc perfuncstate, + WindowStatePerAgg peraggstate, + WindowStatePerGroup pergroupstate) + { + MemoryContext oldContext; + + if (peraggstate->initValueIsNull) + pergroupstate->transValue = peraggstate->initValue; + else + { + oldContext = MemoryContextSwitchTo(winstate->wincontext); + pergroupstate->transValue = datumCopy(peraggstate->initValue, + peraggstate->transtypeByVal, + peraggstate->transtypeLen); + MemoryContextSwitchTo(oldContext); + } + pergroupstate->transValueIsNull = peraggstate->initValueIsNull; + pergroupstate->noTransValue = peraggstate->initValueIsNull; + } + + /* + * advance_windowaggregate + * parallel to advance_aggregate in nodeAgg.c + * + * Contrast to nodeAgg.c, arguments of the function are all ExprState + * instead of Datum itself. We convert it to Datum with the current row + * to match trans function compatibility. + */ + static void + advance_windowaggregate(WindowState *winstate, + WindowIterData *iter, + WindowStatePerFunc perfuncstate, + WindowStatePerAgg peraggstate, + WindowStatePerGroup pergroupstate) + { + WFuncExprState *wfuncstate = perfuncstate->wfuncstate; + int numArguments = perfuncstate->numArguments; + FunctionCallInfoData fcinfodata; + FunctionCallInfo fcinfo = &fcinfodata; + Datum newVal; + ListCell *arg; + int i; + MemoryContext oldContext; + + /* We start from 1, since the 0th arg will be the transition value */ + i = 1; + foreach(arg, wfuncstate->args) + { + ExprState *argstate = (ExprState *) lfirst(arg); + bool isnull; + + /* + * we assume the argument expressions are always static nodes. + */ + Assert(IsA(argstate->expr, Var) || IsA(argstate->expr, Const)); + fcinfo->arg[i] = WinIterGetArg(iter, argstate, &isnull); + fcinfo->argnull[i] = isnull; + i++; + } + + if (peraggstate->transfn.fn_strict) + { + /* + * For a strict transfn, nothing happens when there's a NULL input; we + * just keep the prior transValue. + */ + for(i = 1; i <= numArguments; i++) + { + if (fcinfo->argnull[i]) + return; + } + if (pergroupstate->noTransValue) + { + /* + * transValue has not been initialized. This is the first non-NULL + * input value. We use it as the initial value for transValue. (We + * already checked that the agg's input type is binary-compatible + * with its transtype, so straight copy here is OK.) + * + * We must copy the datum into aggcontext if it is pass-by-ref. We + * do not need to pfree the old transValue, since it's NULL. + */ + oldContext = MemoryContextSwitchTo(winstate->wincontext); + pergroupstate->transValue = datumCopy(fcinfo->arg[1], + peraggstate->transtypeByVal, + peraggstate->transtypeLen); + pergroupstate->transValueIsNull = false; + pergroupstate->noTransValue = false; + MemoryContextSwitchTo(oldContext); + return; + } + if (pergroupstate->transValueIsNull) + { + /* + * Don't call a strict function with NULL inputs. Note it is + * possible to get here despite the above tests, if the transfn is + * strict *and* returned a NULL on a prior cycle. If that happens + * we will propagate the NULL all the way to the end. + */ + return; + } + } + + oldContext = MemoryContextSwitchTo(winstate->ss.ps.ps_ExprContext->ecxt_per_tuple_memory); + + /* + * OK to call the transition function + */ + InitFunctionCallInfoData(*fcinfo, &(peraggstate->transfn), + numArguments + 1, + (void *) winstate, NULL); + fcinfo->arg[0] = pergroupstate->transValue; + fcinfo->argnull[0] = pergroupstate->transValueIsNull; + newVal = FunctionCallInvoke(fcinfo); + + /* + * If pass-by-ref datatype, must copy the new value into aggcontext and + * pfree the prior transValue. But if transfn returned a pointer to its + * first input, we don't need to do anything. + */ + if (!peraggstate->transtypeByVal && + DatumGetPointer(newVal) != DatumGetPointer(pergroupstate->transValue)) + { + if (!fcinfo->isnull) + { + MemoryContextSwitchTo(winstate->wincontext); + newVal = datumCopy(newVal, + peraggstate->transtypeByVal, + peraggstate->transtypeLen); + } + if (!pergroupstate->transValueIsNull) + pfree(DatumGetPointer(pergroupstate->transValue)); + } + + MemoryContextSwitchTo(oldContext); + pergroupstate->transValue = newVal; + pergroupstate->transValueIsNull = fcinfo->isnull; + } + + /* + * finalize_windowaggregate + * parallel to finalize_aggregate in nodeAgg.c + */ + static void + finalize_windowaggregate(WindowState *winstate, + WindowStatePerFunc perfuncstate, + WindowStatePerAgg peraggstate, + WindowStatePerGroup pergroupstate, + Datum *result, bool *isnull) + { + MemoryContext oldContext; + + oldContext = MemoryContextSwitchTo(winstate->ss.ps.ps_ExprContext->ecxt_per_tuple_memory); + + /* + * Apply the agg's finalfn if one is provided, else return transValue. + */ + if (OidIsValid(peraggstate->finalfn_oid)) + { + FunctionCallInfoData fcinfo; + + InitFunctionCallInfoData(fcinfo, &(peraggstate->finalfn), 1, + (void *) winstate, NULL); + fcinfo.arg[0] = pergroupstate->transValue; + fcinfo.argnull[0] = pergroupstate->transValueIsNull; + if (fcinfo.flinfo->fn_strict && pergroupstate->transValueIsNull) + { + /* don't call a strict function with NULL inputs */ + *result = (Datum) 0; + *isnull = true; + } + else + { + *result = FunctionCallInvoke(&fcinfo); + *isnull = fcinfo.isnull; + } + } + else + { + *result = pergroupstate->transValue; + *isnull = pergroupstate->transValueIsNull; + } + + /* + * If result is pass-by-ref, make sure it is in the right context. + */ + if (!peraggstate->resulttypeByVal && !*isnull && + !MemoryContextContains(CurrentMemoryContext, + DatumGetPointer(*result))) + *result = datumCopy(*result, + peraggstate->resulttypeByVal, + peraggstate->resulttypeLen); + MemoryContextSwitchTo(oldContext); + + } + + /* + * eval_windowaggregate + * evaluate normal aggregates (proiswfunc is false but proisagg is true). + * + * Many flows and ideas are ported from nodeAgg.c. + * + * If a window frame is moving whole the calculation of trans/final + * functions are executed again, whereas if not moving the only result + * is cached and returned while the frame is valid. + */ + static void + eval_windowaggregate(WindowState *winstate, WindowObject winobj) + { + WindowStatePerAgg peraggstate; + WindowStatePerGroup pergroupstate; + WindowIter iter; + int funcno, numaggs; + int i; + MemoryContext oldContext; + ExprContext *econtext; + bool need_scan; + Datum *result; + bool *isnull; + + numaggs = winstate->numaggs; + /* + * nothing to do. + */ + if(numaggs == 0) + return; + + /* final output execution is on ps_ExprContext */ + econtext = winstate->ss.ps.ps_ExprContext; + + need_scan = true; + for(i = 0; i < numaggs; i++) + { + peraggstate = &winstate->peragg[i]; + pergroupstate = &winstate->pergroup[i]; + pergroupstate->doneResult = false; + + funcno = peraggstate->funcno; + result = &econtext->ecxt_aggvalues[funcno]; + isnull = &econtext->ecxt_aggnulls[funcno]; + + if (!WinFrameShrinked(winobj) && !WinFrameExtended(winobj) && + peraggstate->hasResult) + { + pergroupstate->doneResult = true; + /* return the same value as the previous result */ + *isnull = peraggstate->resultValueIsNull; + if (*isnull) + continue; + + if (!peraggstate->resulttypeByVal && + !peraggstate->resultValueIsNull && + DatumGetPointer(peraggstate->resultValue) != NULL) + { + oldContext = MemoryContextSwitchTo(winstate->wincontext); + *result = datumCopy(peraggstate->resultValue, + peraggstate->resulttypeByVal, + peraggstate->resulttypeLen); + MemoryContextSwitchTo(oldContext); + } + else + *result = peraggstate->resultValue; + + continue; + } + else if (WinFrameShrinked(winobj) || !peraggstate->hasResult) + { + initialize_windowaggregate(winstate, + &winstate->perfunc[funcno], + &winstate->peragg[i], + &winstate->pergroup[i]); + } + /* + * If at least one aggregate needs scan, we go through scan process. + */ + need_scan = true; + } + + /* + * return if all aggs return cached value. + */ + if (!need_scan) + return; + + /* + * Scan frame from the head if shrinked. + * Otherwise, scan only extended rows. + */ + iter = WinFrameStartIter(winobj, + WinFrameShrinked(winobj) ? 0 : + WinFrameGetRowNum(winobj) - WinFrameExtendedNum(winobj)); + while(WinIterNext(iter)) + { + for(i = 0; i < numaggs; i++) + { + if (winstate->pergroup[i].doneResult) + continue; + funcno = winstate->peragg[i].funcno; + advance_windowaggregate(winstate, + iter, + &winstate->perfunc[funcno], + &winstate->peragg[i], + &winstate->pergroup[i]); + } + } + WinIterFinish(iter); + + /* + * finalize aggregates and fill result/isnull fields. + */ + for(i = 0; i < numaggs; i++) + { + peraggstate = &winstate->peragg[i]; + pergroupstate = &winstate->pergroup[i]; + + if (pergroupstate->doneResult) + continue; + funcno = peraggstate->funcno; + result = &econtext->ecxt_aggvalues[funcno]; + isnull = &econtext->ecxt_aggnulls[funcno]; + finalize_windowaggregate(winstate, + &winstate->perfunc[funcno], + peraggstate, pergroupstate, + result, isnull); + + /* + * If the frame is shrinking, we must re-compute + * from the start so cannot reuse the result. + */ + if (WinFrameShrinkingNum(winobj) > 0) + { + peraggstate->hasResult = false; + continue; + } + + /* + * save the result for the next (non-shrinking frame) call. + */ + if (!peraggstate->resulttypeByVal && !*isnull) + { + /* + * clear old resultValue in order not to leak memory. + */ + if (peraggstate->hasResult && + (DatumGetPointer(peraggstate->resultValue) != + DatumGetPointer(*result)) && + !peraggstate->resultValueIsNull) + pfree(DatumGetPointer(peraggstate->resultValue)); + + /* + * If pass-by-ref, copy it into our global context. + */ + oldContext = MemoryContextSwitchTo(winstate->wincontext); + peraggstate->resultValue = datumCopy(*result, + peraggstate->resulttypeByVal, + peraggstate->resulttypeLen); + MemoryContextSwitchTo(oldContext); + } + else + { + peraggstate->resultValue = *result; + } + peraggstate->resultValueIsNull = *isnull; + peraggstate->hasResult = true; + } + } + + /* + * eval_windowfunction + * + * Arguments of window functions are not actual datum but expression node. + * This is based on the fact they can access arbitrary row column specified + * by their arguments. The window API can handle separate row evaluations. + * Since window functions' arguments must be Var or Const by the planner, + * fetching arbitrary separate row is always successful. + */ + static void + eval_windowfunction(WindowState *winstate, WindowObject winobj, + WindowStatePerFunc perfuncstate, Datum *result, bool *isnull) + { + WFuncExprState *wfuncstate = perfuncstate->wfuncstate; + FunctionCallInfoData fcinfo; + int i; + ListCell *arg; + MemoryContext oldContext; + + oldContext = MemoryContextSwitchTo(winstate->ss.ps.ps_ExprContext->ecxt_per_tuple_memory); + InitFunctionCallInfoData(fcinfo, &(perfuncstate->flinfo), + perfuncstate->numArguments, + (void *) winstate, NULL); + + i = 0; + foreach(arg, wfuncstate->args) + { + ExprState *argstate = (ExprState *) lfirst(arg); + + /* + * Window functions' arguments should be Var or Const. + * Any other types operation must have done at the outer + * nodes. + */ + Assert(IsA(argstate->expr, Var) || IsA(argstate->expr, Const)); + fcinfo.arg[i] = PointerGetDatum(argstate); + i++; + } + + *result = FunctionCallInvoke(&fcinfo); + *isnull = fcinfo.isnull; + + /* + * make sure the pass-by-ref datum is allocated in the appropriate context. + */ + if (!perfuncstate->resulttypeByVal && !fcinfo.isnull && + !MemoryContextContains(CurrentMemoryContext, + DatumGetPointer(*result))) + *result = datumCopy(*result, + perfuncstate->resulttypeByVal, + perfuncstate->resulttypeLen); + + MemoryContextSwitchTo(oldContext); + } + + /* + * init_window_object + * initialize WindowObject for each partition. This handles + * only core preparation that doesn't depend on the different + * buffering strategies. + */ + static WindowObject + init_window_object(WindowState *winstate) + { + WindowObject winobj = winstate->winobj; + + MemSet(winobj, 0, sizeof(WindowObjectData)); + + winobj->winstate = winstate; + /* + * window's econtext is tmpcontext. + */ + winobj->econtext = winstate->tmpcontext; + + winobj->strategy = winstate->strategy; + + /* + * set up positions + */ + winobj->f_currentpos = -1; + winobj->p_currentpos = -1; + winobj->iterating = false; + + return winobj; + } + + /* + * advance_row + * used with WINDOW_BUFFER_ROW. + */ + static void + advance_row(WindowState *winstate) + { + WindowObject winobj = winstate->winobj; + Window *node = (Window *) winstate->ss.ps.plan; + PlanState *outerPlan; + ExprContext *econtext; + TupleTableSlot *outerslot; + TupleTableSlot *firstSlot; + + outerPlan = outerPlanState(winstate); + econtext = winstate->ss.ps.ps_ExprContext; + firstSlot = winstate->ss.ss_ScanTupleSlot; + + outerslot = ExecProcNode(outerPlan); + if (!TupIsNull(outerslot)) + { + if (winstate->prt_firstTuple == NULL) + { + /* + * the first row of the first partition + */ + winstate->prt_firstTuple = ExecCopySlotTuple(outerslot); + winobj = init_window_object(winstate); + } + else + { + /* + * shouldFree is false, since + * prt_firstTuple is reused again and again. + */ + ExecStoreTuple(winstate->prt_firstTuple, + firstSlot, + InvalidBuffer, + false); + if (!execTuplesMatch(firstSlot, + outerslot, + node->prtNumCols, node->prtColIdx, + winstate->prtEqfunctions, + econtext->ecxt_per_tuple_memory)) + { + /* + * We free prt_firstTuple here. And it means + * end of the partition, we clear partition-based + * information such like perfunc->flinfo.fn_extra + */ + heap_freetuple(winstate->prt_firstTuple); + winstate->prt_firstTuple = ExecCopySlotTuple(outerslot); + release_partition(winstate); + winobj = init_window_object(winstate); + } + } + /* + * We must have currentrow, for process_current needs it. + * also, p_currentpos is used in WinRowXXX APIs. + */ + winobj->currentrow = ExecCopySlotTuple(outerslot); + winobj->p_currentpos++; + } + else + { + /* + * all rows are processed. let's finish it. + */ + release_partition(winstate); + winstate->win_done = true; + return; + } + } + + /* + * start_frame + */ + static void + start_frame(WindowState *winstate) + { + WindowObject winobj = winstate->winobj; + Window *node = (Window *) winstate->ss.ps.plan; + Tuplestorestate *buffer = winobj->buffer; + TupleTableSlot *slot = winobj->econtext->ecxt_outertuple; + TupleTableSlot *tmpslot = winstate->ss.ss_ScanTupleSlot; + int f_shrinking, f_extended; + + if (!winobj->f_headptr) + { + /* + * first row + */ + winobj->f_headptr = tuplestore_alloc_read_pointer(buffer, winstate->eflags); + switch(node->preceding_type) + { + case FRAME_UNBOUNDED: + f_shrinking = 0; + break; + case FRAME_CURRENT_ROWS: + f_shrinking = 1; + break; + case FRAME_CURRENT_RANGE: + /* UNSUPPORTED */ + elog(ERROR, "unknown preceding type %d", node->preceding_type); + return; /* keep compiler quiet */ + case FRAME_VALUE_ROWS: + if (node->preceding_rows > 0) + f_shrinking = 0; + else + f_shrinking = 1; + break; + case FRAME_VALUE_RANGE: + /* UNSUPPORTED */ + default: + elog(ERROR, "unknown preceding type %d", node->preceding_type); + return; /* keep compiler quiet */ + } + + Assert(!winobj->f_tailptr); + winobj->f_tailptr = tuplestore_alloc_read_pointer(buffer, winstate->eflags); + switch(node->following_type) + { + case FRAME_UNBOUNDED: + f_extended = winobj->p_rownum; + tuplestore_copy_read_pointer(buffer, winobj->p_tailptr, winobj->f_tailptr); + break; + case FRAME_CURRENT_ROWS: + f_extended = 1; + /* f_tailptr == currentptr */ + break; + case FRAME_CURRENT_RANGE: + f_extended = 1; + /* + * read the current row but do not save the pointer, + * for the actual reading is at the end of the function. + */ + tuplestore_copy_read_pointer(buffer, winobj->currentptr, 0); + tuplestore_gettupleslot(buffer, true, slot); + while(true) + { + if (!tuplestore_gettupleslot(buffer, true, tmpslot)) + { + /* + * no more rows. + * remove the eof flag. + */ + if (tuplestore_ateof(buffer)) + tuplestore_advance(buffer, false); + break; + } + + if (!is_peer(winstate, slot, tmpslot)) + { + /* + * tried the next row but failed, + * so go back to the previous. + */ + tuplestore_advance(buffer, false); + break; + } + f_extended++; + } + tuplestore_advance(buffer, false); + tuplestore_copy_read_pointer(buffer, 0, winobj->f_tailptr); + break; + case FRAME_VALUE_ROWS: + { + int i; + + if (winobj->p_rownum < node->following_rows) + f_extended = winobj->p_rownum; + else + f_extended = node->following_rows; + for(i = 0; i < f_extended; i++) + tuplestore_advance(buffer, true); + tuplestore_copy_read_pointer(buffer, 0, winobj->f_tailptr); + } + break; + case FRAME_VALUE_RANGE: + /* UNSUPPORTED */ + default: + elog(ERROR, "unknown following type %d", node->following_type); + return; /* keep compiler quiet */ + } + winobj->f_shrinking = f_shrinking; + winobj->f_extended = f_extended; + /* + * In the first row, rownum of frame = extended rownum of frame. + */ + winobj->f_rownum = f_extended; + } + else + { + /* not first */ + switch(node->preceding_type) + { + case FRAME_UNBOUNDED: + /* never shinks */ + f_shrinking = 0; + break; + case FRAME_CURRENT_ROWS: + /* advance one row */ + f_shrinking = 1; + /* f_headptr will be updated in finish_frame() */ + break; + case FRAME_CURRENT_RANGE: + /* UNSUPPORTED */ + elog(ERROR, "unknown preceding type %d", node->preceding_type); + return; /* keep compiler quiet */ + case FRAME_VALUE_ROWS: + if (node->preceding_rows <= winobj->p_currentpos + 1) + f_shrinking = 1; + else + f_shrinking = 0; + /* f_headptr will be updated in finish_frame() */ + break; + case FRAME_VALUE_RANGE: + /* UNSUPPORTED */ + default: + elog(ERROR, "unknown preceding type %d", node->preceding_type); + return; /* keep compiler quiet */ + } + winobj->f_shrinking = f_shrinking; + + switch(node->following_type) + { + case FRAME_UNBOUNDED: + /* never extends */ + f_extended = 0; + break; + case FRAME_CURRENT_ROWS: + /* advance one row */ + f_extended = 1; + tuplestore_copy_read_pointer(buffer, winobj->currentptr, winobj->f_tailptr); + break; + case FRAME_CURRENT_RANGE: + ExecStoreTuple(winobj->currentrow, + tmpslot, InvalidBuffer, false); + tuplestore_copy_read_pointer(buffer, winobj->currentptr, 0); + tuplestore_gettupleslot(buffer, true, slot); + if (!is_peer(winstate, tmpslot, slot)) + { + f_extended = 1; + while(true) + { + if (!tuplestore_gettupleslot(buffer, true, tmpslot)) + { + /* + * no more rows. + * remove the eof flag. + */ + if (tuplestore_ateof(buffer)) + tuplestore_advance(buffer, false); + break; + } + + if (!is_peer(winstate, slot, tmpslot)) + { + /* + * tried the next row but failed, + * so go back to the previous. + */ + tuplestore_advance(buffer, false); + break; + } + f_extended++; + } + tuplestore_advance(buffer, false); + tuplestore_copy_read_pointer(buffer, 0, winobj->f_tailptr); + } + else + { + /* + * the frame doesn't extend, which means + * the tail pointer stays where it is. + */ + f_extended = 0; + } + break; + case FRAME_VALUE_ROWS: + if (node->following_rows + winobj->p_currentpos < winobj->p_rownum) + f_extended = 1; + else + f_extended = 0; + + if (f_extended > 0) + { + /* update f_tailptr */ + tuplestore_copy_read_pointer(buffer, winobj->f_tailptr, 0); + tuplestore_advance(buffer, true); + tuplestore_copy_read_pointer(buffer, 0, winobj->f_tailptr); + } + break; + case FRAME_VALUE_RANGE: + /* UNSUPPORTED */ + default: + elog(ERROR, "unknown following type %d", node->following_type); + return; /* keep compiler quiet */ + } + winobj->f_extended = f_extended; + winobj->f_rownum += f_extended; + + heap_freetuple(winobj->currentrow); + } + + /* + * advance current row + */ + tuplestore_copy_read_pointer(buffer, winobj->currentptr, 0); + tuplestore_gettupleslot(buffer, true, slot); + tuplestore_copy_read_pointer(buffer, 0, winobj->currentptr); + winobj->f_currentpos++; + winobj->p_currentpos++; + winobj->currentrow = ExecCopySlotTuple(slot); + } + + /* + * finish_frame + */ + static void + finish_frame(WindowState *winstate) + { + WindowObject winobj = winstate->winobj; + Tuplestorestate *buffer = winobj->buffer; + int i, f_shrinking = winobj->f_shrinking; + + if (f_shrinking > 0) + { + tuplestore_copy_read_pointer(buffer, winobj->f_headptr, 0); + for(i = 0; i < f_shrinking; i++) + tuplestore_advance(buffer, true); + tuplestore_copy_read_pointer(buffer, 0, winobj->f_headptr); + /* + * In the next frame, WinFrameShrinked(winobj) = true + */ + winobj->f_shrinked = true; + winobj->f_rownum -= f_shrinking; + } + else + { + winobj->f_shrinked = false; + } + } + + /* + * store_partition + * buffer all rows contained in the current partition. + */ + static void + store_partition(WindowState *winstate) + { + Window *node = (Window *) winstate->ss.ps.plan; + PlanState *outerPlan; + ExprContext *econtext; + TupleTableSlot *outerslot; + TupleTableSlot *firstSlot; + Tuplestorestate *buffer; + WindowObject winobj; + + outerPlan = outerPlanState(winstate); + econtext = winstate->ss.ps.ps_ExprContext; + firstSlot = winstate->ss.ss_ScanTupleSlot; + winobj = init_window_object(winstate); + + /* + * tuplestore needs randomAccess + */ + buffer = tuplestore_begin_heap(true, false, work_mem); + + /* + * allocate CURRENT ROW, head and tail pointers of the partition + */ + winobj->currentptr = tuplestore_alloc_read_pointer(buffer, winstate->eflags); + winobj->p_headptr = tuplestore_alloc_read_pointer(buffer, winstate->eflags); + winobj->p_tailptr = tuplestore_alloc_read_pointer(buffer, winstate->eflags); + winobj->buffer = buffer; + + /* + * see if the partition is the first one? + */ + if (winstate->prt_firstTuple == NULL) + { + outerslot = ExecProcNode(outerPlan); + if (!TupIsNull(outerslot)) + { + winstate->prt_firstTuple = ExecCopySlotTuple(outerslot); + } + else + { + /* nothing will be returned */ + winstate->win_done = true; + finish_frame(winstate); + return; + } + } + + if (winstate->prt_firstTuple != NULL) + { + ExecStoreTuple(winstate->prt_firstTuple, + firstSlot, + InvalidBuffer, + true); + winstate->prt_firstTuple = NULL; + outerslot = firstSlot; + + /* + * store partition rows into window buffer until + * the bottom of the partition. + */ + for(;;) + { + tuplestore_puttupleslot(buffer, outerslot); + winobj->p_rownum++; + + outerslot = ExecProcNode(outerPlan); + if (TupIsNull(outerslot)) + { + /* + * hit the bottom of the last partition. + */ + winstate->win_done = true; + break; + } + + if (!execTuplesMatch(firstSlot, + outerslot, + node->prtNumCols, node->prtColIdx, + winstate->prtEqfunctions, + econtext->ecxt_per_tuple_memory)) + { + /* + * at the end of each partition + * copy the tuple for the next partition cycle. + */ + winstate->prt_firstTuple = ExecCopySlotTuple(outerslot); + break; + } + } + } + + /* + * write pointer - 1 row is the last row of the partition. + */ + tuplestore_write_to_read_pointer(buffer, winobj->p_tailptr); + tuplestore_select_read_pointer(buffer, winobj->p_tailptr); + tuplestore_advance(buffer, false); + tuplestore_select_read_pointer(buffer, 0); + + winstate->prt_processing = true; + } + + /* + * release_partition + * clear information kept within a partition, including + * funcstate/fn_extra, tuplestore and aggregate result. + */ + static void + release_partition(WindowState *winstate) + { + WindowObject winobj = winstate->winobj; + int i; + int numfuncs = winstate->numfuncs; + + for(i = 0; i < numfuncs; i++) + { + WindowStatePerFunc perfuncstate = &(winstate->perfunc[i]); + + if (perfuncstate->flinfo.fn_extra != NULL) + { + pfree(perfuncstate->flinfo.fn_extra); + /* + * Be sure to set NULL, which is one of the signs + * that the partition is brand new. + */ + perfuncstate->flinfo.fn_extra = NULL; + } + + /* + * reset agg result cache + */ + if (perfuncstate->pure_agg) + { + int aggno = perfuncstate->aggno; + WindowStatePerAggData *peraggstate = &winstate->peragg[aggno]; + + if (!peraggstate->resulttypeByVal && + peraggstate->hasResult && + !peraggstate->resultValueIsNull) + { + pfree(DatumGetPointer(peraggstate->resultValue)); + peraggstate->resultValueIsNull = true; + } + peraggstate->hasResult = false; + } + } + + if(winobj->buffer) + tuplestore_end(winobj->buffer); + winstate->prt_processing = false; + } + + /* + * process_current + * given frame, evaluate each window function with it + * and return the result tuple. + */ + static TupleTableSlot * + process_current(WindowState *winstate, WindowObject winobj) + { + ExprContext *econtext; + ProjectionInfo *projInfo; + int i; + int numfuncs; + + /* final output execution is on ps_ExprContext */ + econtext = winstate->ss.ps.ps_ExprContext; + numfuncs = winstate->numfuncs; + + /* + * Clear the per-output-tuple context for current row + */ + ResetExprContext(econtext); + + for(i = 0; i < numfuncs; i++) + { + WindowStatePerFunc perfuncstate; + Datum *result; + bool *isnull; + + perfuncstate = &(winstate->perfunc[i]); + /* + * normal aggregates are called from specialized environment + * rather than as window function. + * aggregates are called in a sequence later for performance, + * so we skip them here. + */ + if (perfuncstate->pure_agg) + continue; + result = &(econtext->ecxt_aggvalues[i]); + isnull = &(econtext->ecxt_aggnulls[i]); + eval_windowfunction(winstate, winobj, perfuncstate, result, isnull); + } + + /* + * process aggregate with trans and final functions, if any. + */ + if (winstate->numaggs > 0) + eval_windowaggregate(winstate, winobj); + + /* + * read back the current row from the frame to econtext + * as an outer tuple. econtext->ecxt_outertuple will be used + * in the projection (evaluation of the current row after + * window function executions). + */ + projInfo = winstate->ss.ps.ps_ProjInfo; + econtext->ecxt_outertuple = winstate->ss.ss_ScanTupleSlot; + ExecStoreTuple(winobj->currentrow, + econtext->ecxt_outertuple, + InvalidBuffer, + false); + return ExecProject(projInfo, NULL); + } + + /* ----------------- + * ExecWindow + * + * ExecWindow receives tuples from its outer subplan and + * creates or feeds the WindowObject then process window functions + * with the WindowObject. This node doesn't reduce nor qualify any row + * so the number of returned rows are exactly same as its outer + * subplan's result. + * The WindowObject works as if some temporary table that stores row values + * and provides evaluation functionality, which needs an econtext so that + * window functions use it as the core executor does. Thus, we need + * two econtext for both of this purpose and the result tuple evaluation purpose. + * + * Each Window node has their own buffering strategy. We currently 2 types, + * BUFFER_ROW and BUFFER_PARTITION. In BUFFER_ROW strategy, no additional rows + * after the current row are stored, whereas in BUFFER_PARTITION all the + * rows in the partition are stored at the first row in the partition. + * ----------------- + */ + TupleTableSlot * + ExecWindow(WindowState *winstate) + { + TupleTableSlot *slot; + + if (winstate->win_done && !winstate->prt_processing) + return NULL; + + if (winstate->strategy == WINDOW_BUFFER_PARTITION) + { + if (!winstate->prt_processing) + store_partition(winstate); + + /* + * we double-check if we can go processing frame. + * It means the first partition has no rows + * when this second check is false. + */ + if (winstate->win_done && !winstate->prt_processing) + return NULL; + + start_frame(winstate); + } + else + { + Assert(winstate->strategy == WINDOW_BUFFER_ROW); + advance_row(winstate); + + /* + * BUFFER_ROW will detect the bottom by calling + * advance_row, so if it is stop it here. + */ + if (winstate->win_done) + return NULL; + } + + /* + * This is the actual process to evaluate expressions + * and compute the result, then projection is done. + * It is the common code among the different strategies. + */ + slot = process_current(winstate, winstate->winobj); + + if (winstate->strategy == WINDOW_BUFFER_PARTITION) + { + finish_frame(winstate); + + /* + * make sure to clear the frame on the end of the partition. + */ + if (winstate->prt_processing && + winstate->winobj->p_currentpos == winstate->winobj->p_rownum - 1) + release_partition(winstate); + } + + return slot; + } + + /* ----------------- + * ExecInitWindow + * + * Creates the run-time information for the Window node produced by the + * planner and initializes its outer subtree + * ----------------- + */ + WindowState * + ExecInitWindow(Window *node, EState *estate, int eflags) + { + WindowState *winstate; + Plan *outerPlan; + ExprContext *econtext; + ExprContext *tmpcontext; + WindowObject winobj; + WindowStatePerFunc perfunc; + WindowStatePerAgg peragg; + WindowStatePerGroup pergroup; + int numfuncs = 0, funcno, + numaggs = 0, aggno; + int strategy; + ListCell *l; + + /* check for unsupported flags */ + Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK))); + + winstate = makeNode(WindowState); + winstate->ss.ps.plan = (Plan *) node; + winstate->ss.ps.state = estate; + winstate->eflags = (EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK); + + /* + * Create expression contexts. We need two, one for per-input-tuple + * processing and one for per-output-tuple processing. We cheat a little + * by using ExecAssignExprContext() to build both. + */ + ExecAssignExprContext(estate, &winstate->ss.ps); + tmpcontext = winstate->ss.ps.ps_ExprContext; + winstate->tmpcontext = tmpcontext; + ExecAssignExprContext(estate, &winstate->ss.ps); + + winstate->wincontext = + AllocSetContextCreate(CurrentMemoryContext, + "WinContext", + ALLOCSET_DEFAULT_MINSIZE, + ALLOCSET_DEFAULT_INITSIZE, + ALLOCSET_DEFAULT_MAXSIZE); + + #define WINDOW_NSLOTS 3 + + /* + * tuple table initialization + */ + ExecInitScanTupleSlot(estate, &winstate->ss); + ExecInitResultTupleSlot(estate, &winstate->ss.ps); + + /* + * This slot is used as the frame current row, and + * some temporary row in window functions. + */ + tmpcontext->ecxt_outertuple = ExecInitExtraTupleSlot(estate); + + winstate->ss.ps.targetlist = (List *) + ExecInitExpr((Expr *) node->plan.targetlist, + (PlanState *) winstate); + winstate->ss.ps.qual = NIL; + + /* + * initialize child nodes + * + * We shield the child node from the need to support REWIND, BACKWARD, or + * MARK/RESTORE. + */ + eflags &= ~(EXEC_FLAG_REWIND | EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK); + outerPlan = outerPlan(node); + outerPlanState(winstate) = ExecInitNode(outerPlan, estate, eflags); + + /* + * initialize result tuple type and projection info. + */ + ExecAssignScanTypeFromOuterPlan(&winstate->ss); + + /* + * initialize result tuple type and projection info. + */ + ExecAssignResultTypeFromTL(&winstate->ss.ps); + ExecAssignProjectionInfo(&winstate->ss.ps, NULL); + ExecSetSlotDescriptor(tmpcontext->ecxt_outertuple, + winstate->ss.ss_ScanTupleSlot->tts_tupleDescriptor); + ExecStoreAllNullTuple(tmpcontext->ecxt_outertuple); + + winobj = (WindowObject) palloc0(sizeof(WindowObjectData)); + winstate->winobj = winobj; + + numfuncs = winstate->numfuncs; + numaggs = winstate->numaggs; + if (node->prtNumCols > 0) + winstate->prtEqfunctions = execTuplesMatchPrepare(node->prtNumCols, + node->prtOperators); + if (node->ordNumCols > 0) + winstate->ordEqfunctions = execTuplesMatchPrepare(node->ordNumCols, + node->ordOperators); + + /* + * Window functions use aggvalues and aggnulls as well as Agg node. + */ + econtext = winstate->ss.ps.ps_ExprContext; + econtext->ecxt_aggvalues = (Datum *) palloc0(sizeof(Datum) * numfuncs); + econtext->ecxt_aggnulls = (bool *) palloc0(sizeof(bool) * numfuncs); + + /* + * allocate per-wfunc/per-agg state information. + */ + perfunc = (WindowStatePerFunc) palloc0(sizeof(WindowStatePerFuncData) * numfuncs); + peragg = (WindowStatePerAgg) palloc0(sizeof(WindowStatePerAggData) * numaggs); + pergroup = (WindowStatePerGroup) palloc0(sizeof(WindowStatePerGroupData) * numaggs); + winstate->perfunc = perfunc; + winstate->peragg = peragg; + winstate->pergroup = pergroup; + + winstate->prt_processing = false; + winstate->win_done = false; + + strategy = WINDOW_BUFFER_ROW; + + funcno = -1; + aggno = -1; + foreach(l, winstate->funcs) + { + WFuncExprState *wfuncstate = (WFuncExprState *) lfirst(l); + WFunc *wfunc = (WFunc *) wfuncstate->xprstate.expr; + WindowStatePerFunc perfuncstate; + int numArguments; + int i; + + Assert(wfunc->winlevelsup == 0); + + /* Look for a previous duplicate window function */ + for (i = 0; i <= funcno; i++) + { + if (equal(wfunc, perfunc[i].wfunc) && + !contain_volatile_functions((Node *) wfunc)) + break; + } + if (i <= funcno) + { + /* Found a match to an existing entry, so just mark it */ + wfuncstate->funcno = i; + continue; + } + + /* Nope, so assign a new PerAgg record */ + perfuncstate = &perfunc[++funcno]; + + /* Mark WFunc state node with assigned index in the result array */ + wfuncstate->funcno = funcno; + + /* Fill in the perfuncstate data */ + perfuncstate->wfuncstate = wfuncstate; + perfuncstate->wfunc = wfunc; + numArguments = list_length(wfunc->args); + perfuncstate->numArguments = numArguments; + + fmgr_info_cxt(wfunc->winfnoid, &(perfuncstate->flinfo), + tmpcontext->ecxt_per_query_memory); + get_typlenbyval(wfunc->wintype, + &perfuncstate->resulttypeLen, + &perfuncstate->resulttypeByVal); + + /* + * if the function doesn't have capability of window function + * but of aggregate, we emulate Agg environment for it. + */ + perfuncstate->pure_agg = wfunc->pure_agg; + if (wfunc->pure_agg) + { + WindowStatePerAgg peraggstate; + + perfuncstate->aggno = ++aggno; + peraggstate = &winstate->peragg[aggno]; + initialize_peragg(winstate, wfunc, peraggstate); + peraggstate->funcno = funcno; + + /* window agg -- > frame */ + /* + * currently without supporting FRAME clauses, + * we use the partition buffering for these. + */ + strategy = WINDOW_BUFFER_PARTITION; + } + else + { + /* + * Window functions should declare to use type of + * APIs for best performance in nodeWindow. + * Until the APIs and user-define functions are + * public we have fixed these function mappings. + */ + switch(wfunc->winfnoid) + { + case 3898: /* row_number --> row */ + case 3899: /* rank --> row */ + case 3900: /* dense_rank --> row */ + break; + case 3901: /* percent_rank --> partition */ + case 3902: /* cume_dist --> partition */ + case 3903: /* ntile --> partition */ + case 3904: /* lag --> partition */ + case 3905: /* lag --> partition */ + case 3906: /* lead --> partition */ + case 3907: /* lead --> partition */ + strategy = WINDOW_BUFFER_PARTITION; + break; + case 3908: /* first_value --> frame */ + case 3909: /* last_value --> frame */ + case 3910: /* nth_value --> frame */ + /* + * currently without supporting FRAME clauses, + * we use the partition buffering for these. + */ + strategy = WINDOW_BUFFER_PARTITION; + break; + default: + elog(ERROR, "unknown function oid(%d)", wfunc->winfnoid); + } + } + } + + /* Update numfuncs to match number of unique function found */ + winstate->numfuncs = funcno + 1; + winstate->numaggs = aggno + 1; + + winstate->strategy = strategy; + + return winstate; + } + + /* ----------------- + * ExecCountSlotsWindow + * ----------------- + */ + int + ExecCountSlotsWindow(Window *node) + { + return ExecCountSlotsNode(outerPlan(node)) + + ExecCountSlotsNode(innerPlan(node)) + + WINDOW_NSLOTS; + } + + /* ----------------- + * ExecEndWindow + * ----------------- + */ + void + ExecEndWindow(WindowState *node) + { + PlanState *outerPlan; + + pfree(node->perfunc); + pfree(node->peragg); + pfree(node->pergroup); + + /* + * clear tuple before freeing expr context because + * tmpcontext is invalid after freeing. + */ + ExecClearTuple(node->ss.ss_ScanTupleSlot); + ExecClearTuple(node->tmpcontext->ecxt_outertuple); + + /* + * Free both the expr contexts. + */ + ExecFreeExprContext(&node->ss.ps); + node->ss.ps.ps_ExprContext = node->tmpcontext; + ExecFreeExprContext(&node->ss.ps); + + /* + * Free winobj after freeing its econtext (ie winstate->tmpcontext). + */ + pfree(node->winobj); + + if (node->prt_firstTuple != NULL) + heap_freetuple(node->prt_firstTuple); + + MemoryContextDelete(node->wincontext); + + outerPlan = outerPlanState(node); + ExecEndNode(outerPlan); + } + + /* ----------------- + * ExecRescanWindow + * ----------------- + */ + void + ExecReScanWindow(WindowState *node, ExprContext *exprCtxt) + { + ExprContext *econtext = node->ss.ps.ps_ExprContext; + int aggno; + + node->win_done = false; + + if (node->prt_processing) + release_partition(node); + + if (node->prt_firstTuple != NULL) + { + heap_freetuple(node->prt_firstTuple); + node->prt_firstTuple = NULL; + } + + for(aggno = 0; aggno < node->numaggs; aggno++) + { + WindowStatePerAgg peraggstate = &node->peragg[aggno]; + + if (peraggstate->hasResult && + !peraggstate->resulttypeByVal && + !peraggstate->resultValueIsNull) + pfree(DatumGetPointer(peraggstate->resultValue)); + peraggstate->hasResult = false; + } + + MemSet(econtext->ecxt_aggvalues, 0, sizeof(Datum) * node->numfuncs); + MemSet(econtext->ecxt_aggnulls, 0, sizeof(bool) * node->numfuncs); + } + + /* + * initialize_peragg + * + * Almost same as the nodeAgg.c. We don't care DISTINCT of the aggregate + * argument currenlty. + */ + static WindowStatePerAggData * + initialize_peragg(WindowState *winstate, WFunc *wfunc, WindowStatePerAgg peraggstate) + { + Oid inputTypes[FUNC_MAX_ARGS]; + int numArguments; + HeapTuple aggTuple; + Form_pg_aggregate aggform; + Oid aggtranstype; + AclResult aclresult; + Oid transfn_oid, + finalfn_oid; + Expr *transfnexpr, + *finalfnexpr; + Datum textInitVal; + int i; + ListCell *lc; + + numArguments = list_length(wfunc->args); + peraggstate->numArguments = numArguments; + + i = 0; + foreach(lc, wfunc->args) + { + inputTypes[i++] = exprType((Node *) lfirst(lc)); + } + + aggTuple = SearchSysCache(AGGFNOID, + ObjectIdGetDatum(wfunc->winfnoid), + 0, 0, 0); + if (!HeapTupleIsValid(aggTuple)) + elog(ERROR, "cache lookup failed for aggregate %u", + wfunc->winfnoid); + aggform = (Form_pg_aggregate) GETSTRUCT(aggTuple); + + /* Check permission to call aggregate function */ + aclresult = pg_proc_aclcheck(wfunc->winfnoid, GetUserId(), + ACL_EXECUTE); + if (aclresult != ACLCHECK_OK) + aclcheck_error(aclresult, ACL_KIND_PROC, + get_func_name(wfunc->winfnoid)); + + peraggstate->transfn_oid = transfn_oid = aggform->aggtransfn; + peraggstate->finalfn_oid = finalfn_oid = aggform->aggfinalfn; + + /* Check that aggregate owner has permission to call component fns */ + { + HeapTuple procTuple; + Oid aggOwner; + + procTuple = SearchSysCache(PROCOID, + ObjectIdGetDatum(wfunc->winfnoid), + 0, 0, 0); + if (!HeapTupleIsValid(procTuple)) + elog(ERROR, "cache lookup failed for function %u", + wfunc->winfnoid); + aggOwner = ((Form_pg_proc) GETSTRUCT(procTuple))->proowner; + ReleaseSysCache(procTuple); + + aclresult = pg_proc_aclcheck(transfn_oid, aggOwner, + ACL_EXECUTE); + if (aclresult != ACLCHECK_OK) + aclcheck_error(aclresult, ACL_KIND_PROC, + get_func_name(transfn_oid)); + if (OidIsValid(finalfn_oid)) + { + aclresult = pg_proc_aclcheck(finalfn_oid, aggOwner, + ACL_EXECUTE); + if (aclresult != ACLCHECK_OK) + aclcheck_error(aclresult, ACL_KIND_PROC, + get_func_name(finalfn_oid)); + } + } + + /* resolve actual type of transition state, if polymorphic */ + aggtranstype = aggform->aggtranstype; + if (IsPolymorphicType(aggtranstype)) + { + /* have to fetch the agg's declared input types... */ + Oid *declaredArgTypes; + int agg_nargs; + + get_func_signature(wfunc->winfnoid, + &declaredArgTypes, &agg_nargs); + Assert(agg_nargs == numArguments); + aggtranstype = enforce_generic_type_consistency(inputTypes, + declaredArgTypes, + agg_nargs, + aggtranstype, + false); + pfree(declaredArgTypes); + } + + /* build expression trees using actual argument & result types */ + build_aggregate_fnexprs(inputTypes, + numArguments, + aggtranstype, + wfunc->wintype, + transfn_oid, + finalfn_oid, + &transfnexpr, + &finalfnexpr); + + fmgr_info(transfn_oid, &peraggstate->transfn); + peraggstate->transfn.fn_expr = (Node *) transfnexpr; + + if (OidIsValid(finalfn_oid)) + { + fmgr_info(finalfn_oid, &peraggstate->finalfn); + peraggstate->finalfn.fn_expr = (Node *) finalfnexpr; + } + + get_typlenbyval(wfunc->wintype, + &peraggstate->resulttypeLen, + &peraggstate->resulttypeByVal); + get_typlenbyval(aggtranstype, + &peraggstate->transtypeLen, + &peraggstate->transtypeByVal); + + /* + * initval is potentially null, so don't try to access it as a struct + * field. Must do it the hard way with SysCacheGetAttr. + */ + textInitVal = SysCacheGetAttr(AGGFNOID, aggTuple, + Anum_pg_aggregate_agginitval, + &peraggstate->initValueIsNull); + + if (peraggstate->initValueIsNull) + peraggstate->initValue = (Datum) 0; + else + peraggstate->initValue = GetAggInitVal(textInitVal, + aggtranstype); + + /* + * If the transfn is strict and the initval is NULL, make sure input + * type and transtype are the same (or at least binary-compatible), so + * that it's OK to use the first input value as the initial + * transValue. This should have been checked at agg definition time, + * but just in case... + */ + if (peraggstate->transfn.fn_strict && peraggstate->initValueIsNull) + { + if (numArguments < 1 || + !IsBinaryCoercible(inputTypes[0], aggtranstype)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION), + errmsg("aggregate %u needs to have compatible input type and transition type", + wfunc->winfnoid))); + } + + ReleaseSysCache(aggTuple); + + return peraggstate; + } + + /* + * is_peer + * compare two rows in the ORDER BY clause + */ + static bool + is_peer(WindowState *winstate, TupleTableSlot *slot1, TupleTableSlot *slot2) + { + Window *node = (Window *) winstate->ss.ps.plan; + + Assert(node->ordNumCols > 0); + return execTuplesMatch(slot1, slot2, + node->ordNumCols, node->ordColIdx, + winstate->ordEqfunctions, + winstate->tmpcontext->ecxt_per_tuple_memory); + } + + /* window frame APIs and window iteration APIs */ + + /* + * window_seek + * Because tuplestore doesn't capability to seek nth row, + * we hold three marking points and from there advance rows as needed. + * + * The current row when seek_pos = 0 on WINDOW_SEEK_CURRENT, + * the first row when seek_pos = 0 on WINDOW_SEEK_HEAD, + * and the last row when seek_pos = 0 on WINDOW_SEEK_TAIL + * is fetched by the subsequent WinXXXGetArg/WinXXXGetTuple call. + * + * Since buffer may contain all the partition, window_seek is in charge + * of frame range boundary check. + */ + static bool + window_seek(WindowObject winobj, int seek_pos, int seek_type, int window_type) + { + Tuplestorestate *buffer = winobj->buffer; + int read_pointer; + + Assert(window_type == WINDOW_TYPE_FRAME || window_type == WINDOW_TYPE_PARTITION); + if (winobj->iterating) + elog(ERROR, "another iteration is going on"); + + switch(seek_type) + { + case WINDOW_SEEK_CURRENT: + read_pointer = winobj->currentptr; + if (window_type == WINDOW_TYPE_FRAME) + { + if (winobj->f_currentpos + seek_pos >= winobj->f_rownum) + return false; + if (winobj->f_currentpos + seek_pos < 0) + return false; + } + else + { + if (winobj->p_currentpos + seek_pos > winobj->p_rownum) + return false; + if (winobj->p_currentpos + seek_pos < 0) + return false; + } + /* + * Since frame->currentptr marks the next CURRENT ROW already, + * seek_pos is actually subtracted 1. + */ + seek_pos -= 1; + break; + + case WINDOW_SEEK_HEAD: + if (window_type == WINDOW_TYPE_FRAME) + read_pointer = winobj->f_headptr; + else + read_pointer = winobj->p_headptr; + + /* validation check */ + if (seek_pos < 0) + return false; + if (window_type == WINDOW_TYPE_FRAME) + { + if (seek_pos >= winobj->f_rownum) + return false; + } + break; + + case WINDOW_SEEK_TAIL: + if (window_type == WINDOW_TYPE_FRAME) + read_pointer = winobj->f_tailptr; + else + read_pointer = winobj->p_tailptr; + + /* validation check */ + if (seek_pos > 0) + return false; + if (window_type == WINDOW_TYPE_FRAME) + { + if (-seek_pos >= winobj->f_rownum) + return false; + } + break; + + default: + elog(ERROR, "unknown window seek type: %d", seek_type); + return false; + } + + tuplestore_copy_read_pointer(buffer, read_pointer, 0); + if (seek_pos != 0) + { + bool forward = (seek_pos > 0); + int num_ahead = (seek_pos < 0 ? (-seek_pos) : seek_pos); + int i; + + for(i = 0; i < num_ahead; i++) + { + if (!tuplestore_advance(buffer, forward)) + { + /* + * tuplestore_advance returns false when backward scan + * hit the head, which is valid position. + * But we consider window_seek going more than head, + * which means the seeking position is invalid. + */ + if (i < num_ahead - 1 || tuplestore_ateof(buffer)) + return false; + break; + } + } + } + + return true; + } + + /* + * WinRowCurrentPos + * return the position of CURRENT ROW in the partition. + */ + int64 + WinRowCurrentPos(WindowObject winobj) + { + return winobj->p_currentpos; + } + + /* + * WinRowGetArg + * return datum of the current row. + */ + Datum + WinRowGetArg(WindowObject winobj, ExprState *argstate, bool *isnull) + { + ExprContext *econtext = winobj->econtext; + + ExecStoreTuple(winobj->currentrow, + econtext->ecxt_outertuple, + InvalidBuffer, + false); + return ExecEvalExpr(argstate, econtext, isnull, NULL); + } + + /* + * WinRowGetTuple + * fetch tuple slot of the current row. returns true if succeeded. + */ + bool + WinRowGetTuple(WindowObject winobj, TupleTableSlot *slot) + { + ExecStoreTuple(winobj->currentrow, + slot, + InvalidBuffer, + false); + return true; + } + + /* + * WinRowIsPeer + * compare two rows in the ORDER BY clause + */ + bool + WinRowIsPeer(WindowObject winobj, TupleTableSlot *slot1, TupleTableSlot *slot2) + { + WindowState *winstate = winobj->winstate; + + return is_peer(winstate, slot1, slot2); + } + + /* + * WinFrameShrinked + * return true if the frame was shrinked since the last execution. + */ + bool + WinFrameShrinked(WindowObject winobj) + { + WINDOW_CHECK_STRATEGY(winobj, WINDOW_BUFFER_FRAME); + + return winobj->f_shrinked; + } + + /* + * WinFrameExtended + * return true if the frame was extended since the last execution. + */ + bool + WinFrameExtended(WindowObject winobj) + { + WINDOW_CHECK_STRATEGY(winobj, WINDOW_BUFFER_FRAME); + + return winobj->f_extended > 0; + } + + /* + * WinFrameGetRowNum + * return total row number of the frame. + */ + int64 + WinFrameGetRowNum(WindowObject winobj) + { + WINDOW_CHECK_STRATEGY(winobj, WINDOW_BUFFER_FRAME); + + return winobj->f_rownum; + } + + /* + * WinFrameGetArg + * return the value of the specific argument of the specific row. + * relpos can be positive or negative. + * If the target row is out of the frame, isout will be true as + * well as isnull. + */ + Datum + WinFrameGetArg(WindowObject winobj, ExprState *argstate, + int relpos, int seektype, bool *isnull, bool *isout) + { + ExprContext *econtext = winobj->econtext; + TupleTableSlot *slot = econtext->ecxt_outertuple; + Tuplestorestate *buffer = winobj->buffer; + + WINDOW_CHECK_STRATEGY(winobj, WINDOW_BUFFER_FRAME); + + *isout = false; + + if (!window_seek(winobj, relpos, seektype, WINDOW_TYPE_FRAME)) + { + *isout = true; + *isnull = true; + return (Datum) 0; + } + + if (!tuplestore_gettupleslot(buffer, true, slot)) + { + *isout = true; + *isnull = true; + return (Datum) 0; + } + + return ExecEvalExpr(argstate, econtext, isnull, NULL); + } + + /* + * WinFrameGetTuple + * fetch tuple slot of the specified row. + * If the target row is out of the frame, isout will be true + * and returns false. + */ + bool + WinFrameGetTuple(WindowObject winobj, TupleTableSlot *slot, + int relpos, int seektype, bool *isout) + { + Tuplestorestate *buffer = winobj->buffer; + + WINDOW_CHECK_STRATEGY(winobj, WINDOW_BUFFER_FRAME); + + *isout = false; + + if (!window_seek(winobj, relpos, seektype, WINDOW_TYPE_FRAME)) + { + *isout = true; + ExecClearTuple(slot); + return false; + } + + if (!tuplestore_gettupleslot(buffer, true, slot)) + { + *isout = true; + return false; + } + + return true; + } + + /* + * WinFrameShrinkingNum + * returns the shrinking number of the frame rows when + * the current frame is finishing. + */ + int + WinFrameShrinkingNum(WindowObject winobj) + { + WINDOW_CHECK_STRATEGY(winobj, WINDOW_BUFFER_FRAME); + + return winobj->f_shrinking; + } + + /* + * WinFrameExtendedNum + * returns the extended number of the frame row when + * the current frame was created. + */ + int + WinFrameExtendedNum(WindowObject winobj) + { + WINDOW_CHECK_STRATEGY(winobj, WINDOW_BUFFER_FRAME); + + return winobj->f_extended; + } + + /* + * WinPartGetRowNum + * return number of rows contained in the partition. + */ + int64 + WinPartGetRowNum(WindowObject winobj) + { + WINDOW_CHECK_STRATEGY(winobj, WINDOW_BUFFER_PARTITION); + + return winobj->p_rownum; + } + + /* + * WinPartGetArg + * return datum from the specified row in the partition. + * If the target row is out of the partition, isout will be true as + * well as isnull. + */ + Datum + WinPartGetArg(WindowObject winobj, ExprState *argstate, + int relpos, int seektype, bool *isnull, bool *isout) + { + ExprContext *econtext = winobj->econtext; + TupleTableSlot *slot = econtext->ecxt_outertuple; + Tuplestorestate *buffer = winobj->buffer; + + WINDOW_CHECK_STRATEGY(winobj, WINDOW_BUFFER_PARTITION); + + *isout = false; + + if (!window_seek(winobj, relpos, seektype, WINDOW_TYPE_PARTITION)) + { + *isout = true; + *isnull = true; + return (Datum) 0; + } + + /* + * Even if window_seek succeeded, it may be the last row because + * tuplestore doesn't report it. + */ + if (!tuplestore_gettupleslot(buffer, true, slot)) + { + *isout = true; + *isnull = true; + return (Datum) 0; + } + + return ExecEvalExpr(argstate, econtext, isnull, NULL); + } + + /* + * WinPartGetTuple + * fetch tuple slot from the specified row in the partition. + * If the target row is out of the frame, isout will be true + * and returns false. + */ + bool + WinPartGetTuple(WindowObject winobj, TupleTableSlot *slot, + int relpos, int seektype, bool *isout) + { + Tuplestorestate *buffer = winobj->buffer; + + WINDOW_CHECK_STRATEGY(winobj, WINDOW_BUFFER_PARTITION); + + *isout = false; + + if (!window_seek(winobj, relpos, seektype, WINDOW_TYPE_PARTITION)) + { + *isout = true; + ExecClearTuple(slot); + return false; + } + + if (!tuplestore_gettupleslot(buffer, true, slot)) + { + *isout = true; + return false; + } + + return true; + } + + /* + * WinFrameStartIter + * initialize the frame iterator. + * pos is relative position from head of the frame. + */ + WindowIter + WinFrameStartIter(WindowObject winobj, int pos) + { + WindowIterData *iter; + + WINDOW_CHECK_STRATEGY(winobj, WINDOW_BUFFER_FRAME); + + /* + * We could use Assert() here but this assumption might be + * broken by external API call, so use elog(). + */ + if (winobj->iterating) + elog(ERROR, "another frame iteration is not done yet."); + + iter = (WindowIterData *) palloc0(sizeof(WindowIterData)); + iter->winobj = winobj; + iter->type = WINDOW_TYPE_FRAME; + iter->position = pos; + + /* + * seek must be done before iterating flag is set + */ + window_seek(winobj, pos, WINDOW_SEEK_HEAD, iter->type); + iter->finished = false; + winobj->iterating = true; + + return iter; + } + + /* + * WinPartStartIter + * initialize the partition iterator. + * pos is relative position from head of the partition. + */ + WindowIter + WinPartStartIter(WindowObject winobj, int pos) + { + WindowIterData *iter; + + WINDOW_CHECK_STRATEGY(winobj, WINDOW_BUFFER_PARTITION); + /* + * We could use Assert() here but this assumption might be + * broken by external API call, so use elog(). + */ + if (winobj->iterating) + elog(ERROR, "another frame iteration is not done yet."); + + iter = (WindowIterData *) palloc0(sizeof(WindowIterData)); + iter->winobj = winobj; + iter->type = WINDOW_TYPE_PARTITION; + iter->position = pos; + + /* + * seek must be done before iterating flag is set + */ + window_seek(winobj, pos, WINDOW_SEEK_HEAD, iter->type); + iter->finished = false; + winobj->iterating = true; + + return iter; + } + + + /* + * WinIterNext + * advance the iterator to the next row. Any tuple fetching is allowed + * after this method. + * return false if there is no more row. + */ + bool + WinIterNext(WindowIter iter) + { + WindowObject winobj = iter->winobj; + ExprContext *econtext = winobj->econtext; + TupleTableSlot *slot = econtext->ecxt_outertuple; + bool finished = false; + + if (iter->finished) + return false; + + iter->position++; + if(!tuplestore_gettupleslot(winobj->buffer, true, slot)) + { + finished = true; + } + else + { + /* + * Even though gettupeslot returns true, it could be + * the last row of the buffer. We track it by + * counting row numbers. + */ + if (iter->type == WINDOW_TYPE_FRAME) + { + finished = (iter->position > winobj->f_rownum); + } + else + { + finished = (iter->position > winobj->p_rownum); + } + } + + return !(iter->finished = finished); + } + + /* + * WinIterFinish + * clean up iterator. It cannot be touched after this call. + */ + void + WinIterFinish(WindowIter iter) + { + WindowObject winobj; + + if (!iter) + return; + + winobj = iter->winobj; + winobj->iterating = false; + pfree(iter); + } + + /* + * WinIterGetArg + * return datum from the current iterator row. + */ + Datum + WinIterGetArg(WindowIter iter, ExprState *argstate, bool *isnull) + { + WindowObject winobj = iter->winobj; + ExprContext *econtext = winobj->econtext; + + return ExecEvalExpr(argstate, econtext, isnull, NULL); + } + + /* + * WinIterGetTuple + * fetch tuple slot from the current iterator row. + */ + bool + WinIterGetTuple(WindowIter iter, TupleTableSlot *slot) + { + TupleTableSlot *orig = iter->winobj->econtext->ecxt_outertuple; + + /* + * winobj's outertuple has fetched the row by WinIterNext() + */ + if (orig == slot) + return true; + + ExecStoreTuple(ExecCopySlotTuple(orig), slot, InvalidBuffer, true); + return true; + } *** a/src/backend/nodes/copyfuncs.c --- b/src/backend/nodes/copyfuncs.c *************** *** 762,767 **** _copyLimit(Limit *from) --- 762,789 ---- } /* + * _copyWindow + */ + static Window * + _copyWindow(Window *from) + { + Window *newnode = makeNode(Window); + + CopyPlanFields((Plan *) from, (Plan *) newnode); + + COPY_SCALAR_FIELD(prtNumCols); + COPY_POINTER_FIELD(prtColIdx, from->prtNumCols * sizeof(AttrNumber)); + COPY_POINTER_FIELD(prtOperators, from->prtNumCols * sizeof(Oid)); + COPY_SCALAR_FIELD(ordNumCols); + COPY_POINTER_FIELD(ordColIdx, from->ordNumCols * sizeof(AttrNumber)); + COPY_NODE_FIELD(preceding); + COPY_NODE_FIELD(following); + COPY_SCALAR_FIELD(winref); + + return newnode; + } + + /* * _copyPlanInvalItem */ static PlanInvalItem * *************** *** 932,937 **** _copyAggref(Aggref *from) --- 954,979 ---- } /* + * _copyWFunc + */ + static WFunc * + _copyWFunc(WFunc *from) + { + WFunc *newnode = makeNode(WFunc); + + COPY_SCALAR_FIELD(winfnoid); + COPY_SCALAR_FIELD(wintype); + COPY_NODE_FIELD(args); + COPY_SCALAR_FIELD(winlevelsup); + COPY_SCALAR_FIELD(winstar); + COPY_SCALAR_FIELD(winref); + COPY_SCALAR_FIELD(pure_agg); + COPY_LOCATION_FIELD(location); + + return newnode; + } + + /* * _copyArrayRef */ static ArrayRef * *************** *** 1730,1735 **** _copySortGroupClause(SortGroupClause *from) --- 1772,1831 ---- return newnode; } + static OrderClause * + _copyOrderClause(OrderClause *from) + { + OrderClause *newnode = makeNode(OrderClause); + + COPY_SCALAR_FIELD(tleSortGroupRef); + COPY_SCALAR_FIELD(eqop); + COPY_SCALAR_FIELD(sortop); + COPY_SCALAR_FIELD(nulls_first); + + return newnode; + } + + static PartitionClause * + _copyPartitionClause(PartitionClause *from) + { + PartitionClause *newnode = makeNode(PartitionClause); + + COPY_SCALAR_FIELD(tleSortGroupRef); + COPY_SCALAR_FIELD(eqop); + COPY_SCALAR_FIELD(sortop); + COPY_SCALAR_FIELD(nulls_first); + + return newnode; + } + + static WinDef * + _copyWinDef(WinDef *from) + { + WinDef *newnode = makeNode(WinDef); + + COPY_NODE_FIELD(partitionClause); + COPY_NODE_FIELD(orderClause); + COPY_NODE_FIELD(wfunc); + COPY_STRING_FIELD(name); + COPY_LOCATION_FIELD(location); + + return newnode; + } + + static WindowClause * + _copyWindowClause(WindowClause *from) + { + WindowClause *newnode = makeNode(WindowClause); + + COPY_NODE_FIELD(partitionClause); + COPY_NODE_FIELD(orderClause); + COPY_STRING_FIELD(name); + COPY_SCALAR_FIELD(has_wfunc); + COPY_SCALAR_FIELD(winref); + + return newnode; + } + static RowMarkClause * _copyRowMarkClause(RowMarkClause *from) { *************** *** 1849,1854 **** _copyFuncCall(FuncCall *from) --- 1945,1951 ---- COPY_SCALAR_FIELD(agg_star); COPY_SCALAR_FIELD(agg_distinct); COPY_SCALAR_FIELD(func_variadic); + COPY_NODE_FIELD(win_definition); COPY_LOCATION_FIELD(location); return newnode; *************** *** 2073,2078 **** _copyQuery(Query *from) --- 2170,2176 ---- COPY_SCALAR_FIELD(hasDistinctOn); COPY_SCALAR_FIELD(hasRecursive); COPY_NODE_FIELD(cteList); + COPY_SCALAR_FIELD(hasWindow); COPY_NODE_FIELD(rtable); COPY_NODE_FIELD(jointree); COPY_NODE_FIELD(targetList); *************** *** 2081,2086 **** _copyQuery(Query *from) --- 2179,2185 ---- COPY_NODE_FIELD(havingQual); COPY_NODE_FIELD(distinctClause); COPY_NODE_FIELD(sortClause); + COPY_NODE_FIELD(windowList); COPY_NODE_FIELD(limitOffset); COPY_NODE_FIELD(limitCount); COPY_NODE_FIELD(rowMarks); *************** *** 2141,2146 **** _copySelectStmt(SelectStmt *from) --- 2240,2246 ---- COPY_NODE_FIELD(whereClause); COPY_NODE_FIELD(groupClause); COPY_NODE_FIELD(havingClause); + COPY_NODE_FIELD(windowClause); COPY_NODE_FIELD(withClause); COPY_NODE_FIELD(valuesLists); COPY_NODE_FIELD(sortClause); *************** *** 3326,3331 **** copyObject(void *from) --- 3426,3433 ---- case T_Limit: retval = _copyLimit(from); break; + case T_Window: + retval = _copyWindow(from); case T_PlanInvalItem: retval = _copyPlanInvalItem(from); break; *************** *** 3354,3359 **** copyObject(void *from) --- 3456,3464 ---- case T_Aggref: retval = _copyAggref(from); break; + case T_WFunc: + retval = _copyWFunc(from); + break; case T_ArrayRef: retval = _copyArrayRef(from); break; *************** *** 3828,3833 **** copyObject(void *from) --- 3933,3950 ---- case T_SortGroupClause: retval = _copySortGroupClause(from); break; + case T_OrderClause: + retval = _copyOrderClause(from); + break; + case T_PartitionClause: + retval = _copyPartitionClause(from); + break; + case T_WinDef: + retval = _copyWinDef(from); + break; + case T_WindowClause: + retval = _copyWindowClause(from); + break; case T_RowMarkClause: retval = _copyRowMarkClause(from); break; *** a/src/backend/nodes/equalfuncs.c --- b/src/backend/nodes/equalfuncs.c *************** *** 192,197 **** _equalAggref(Aggref *a, Aggref *b) --- 192,212 ---- } static bool + _equalWFunc(WFunc *a, WFunc *b) + { + COMPARE_SCALAR_FIELD(winfnoid); + COMPARE_SCALAR_FIELD(wintype); + COMPARE_NODE_FIELD(args); + COMPARE_SCALAR_FIELD(winlevelsup); + COMPARE_SCALAR_FIELD(winstar); + COMPARE_SCALAR_FIELD(winref); + COMPARE_SCALAR_FIELD(pure_agg); + COMPARE_LOCATION_FIELD(location); + + return true; + } + + static bool _equalArrayRef(ArrayRef *a, ArrayRef *b) { COMPARE_SCALAR_FIELD(refarraytype); *************** *** 904,909 **** _equalSelectStmt(SelectStmt *a, SelectStmt *b) --- 919,925 ---- COMPARE_NODE_FIELD(whereClause); COMPARE_NODE_FIELD(groupClause); COMPARE_NODE_FIELD(havingClause); + COMPARE_NODE_FIELD(windowClause); COMPARE_NODE_FIELD(withClause); COMPARE_NODE_FIELD(valuesLists); COMPARE_NODE_FIELD(sortClause); *************** *** 1799,1804 **** _equalFuncCall(FuncCall *a, FuncCall *b) --- 1815,1821 ---- COMPARE_SCALAR_FIELD(agg_star); COMPARE_SCALAR_FIELD(agg_distinct); COMPARE_SCALAR_FIELD(func_variadic); + COMPARE_NODE_FIELD(win_definition); COMPARE_LOCATION_FIELD(location); return true; *************** *** 2003,2008 **** _equalSortGroupClause(SortGroupClause *a, SortGroupClause *b) --- 2020,2049 ---- } static bool + _equalWinDef(WinDef *a, WinDef *b) + { + COMPARE_NODE_FIELD(partitionClause); + COMPARE_NODE_FIELD(orderClause); + COMPARE_NODE_FIELD(wfunc); + COMPARE_STRING_FIELD(name); + COMPARE_LOCATION_FIELD(location); + + return true; + } + + static bool + _equalWindowClause(WindowClause *a, WindowClause *b) + { + COMPARE_NODE_FIELD(partitionClause); + COMPARE_NODE_FIELD(orderClause); + COMPARE_STRING_FIELD(name); + COMPARE_SCALAR_FIELD(has_wfunc); + COMPARE_SCALAR_FIELD(winref); + + return true; + } + + static bool _equalRowMarkClause(RowMarkClause *a, RowMarkClause *b) { COMPARE_SCALAR_FIELD(rti); *************** *** 2204,2209 **** equal(void *a, void *b) --- 2245,2252 ---- break; case T_Aggref: retval = _equalAggref(a, b); + case T_WFunc: + retval = _equalWFunc(a, b); break; case T_ArrayRef: retval = _equalArrayRef(a, b); *************** *** 2310,2315 **** equal(void *a, void *b) --- 2353,2361 ---- case T_JoinExpr: retval = _equalJoinExpr(a, b); break; + case T_WindowClause: + retval = _equalWindowClause(a, b); + break; /* * RELATION NODES *************** *** 2664,2671 **** equal(void *a, void *b) --- 2710,2722 ---- retval = _equalRangeTblEntry(a, b); break; case T_SortGroupClause: + case T_OrderClause: + case T_PartitionClause: retval = _equalSortGroupClause(a, b); break; + case T_WinDef: + retval = _equalWinDef(a, b); + break; case T_RowMarkClause: retval = _equalRowMarkClause(a, b); break; *** a/src/backend/nodes/nodeFuncs.c --- b/src/backend/nodes/nodeFuncs.c *************** *** 52,57 **** exprType(Node *expr) --- 52,60 ---- case T_Aggref: type = ((Aggref *) expr)->aggtype; break; + case T_WFunc: + type = ((WFunc *) expr)->wintype; + break; case T_ArrayRef: { ArrayRef *arrayref = (ArrayRef *) expr; *************** *** 1045,1050 **** expression_tree_walker(Node *node, --- 1048,1061 ---- return true; } break; + case T_WFunc: + { + WFunc *expr = (WFunc *) node; + if(expression_tree_walker((Node *) expr->args, + walker, context)) + return true; + } + break; case T_ArrayRef: { ArrayRef *aref = (ArrayRef *) node; *************** *** 1539,1544 **** expression_tree_mutator(Node *node, --- 1550,1565 ---- return (Node *) newnode; } break; + case T_WFunc: + { + WFunc *wfunc = (WFunc *) node; + WFunc *newnode; + + FLATCOPY(newnode, wfunc, WFunc); + MUTATE(newnode->args, wfunc->args, List *); + return (Node *) newnode; + } + break; case T_ArrayRef: { ArrayRef *arrayref = (ArrayRef *) node; *** a/src/backend/nodes/outfuncs.c --- b/src/backend/nodes/outfuncs.c *************** *** 684,689 **** _outLimit(StringInfo str, Limit *node) --- 684,711 ---- } static void + _outWindow(StringInfo str, Window *node) + { + int i; + + WRITE_NODE_TYPE("WINDOW"); + + _outPlanInfo(str, (Plan *) node); + + appendStringInfo(str, " :prtColIdx"); + for (i = 0; i < node->prtNumCols; i++) + appendStringInfo(str, " %d", node->prtColIdx[i]); + + appendStringInfo(str, " :prtOperations"); + for (i = 0; i < node->prtNumCols; i++) + appendStringInfo(str, " %d", node->prtOperators[i]); + + appendStringInfo(str, " :ordColIdx"); + for (i = 0; i< node->ordNumCols; i++) + appendStringInfo(str, " %d", node->ordColIdx[i]); + } + + static void _outPlanInvalItem(StringInfo str, PlanInvalItem *node) { WRITE_NODE_TYPE("PLANINVALITEM"); *************** *** 799,804 **** _outAggref(StringInfo str, Aggref *node) --- 821,841 ---- } static void + _outWFunc(StringInfo str, WFunc *node) + { + WRITE_NODE_TYPE("WFUNC"); + + WRITE_OID_FIELD(winfnoid); + WRITE_OID_FIELD(wintype); + WRITE_NODE_FIELD(args); + WRITE_UINT_FIELD(winlevelsup); + WRITE_BOOL_FIELD(winstar); + WRITE_UINT_FIELD(winref); + WRITE_BOOL_FIELD(pure_agg); + WRITE_LOCATION_FIELD(location); + } + + static void _outArrayRef(StringInfo str, ArrayRef *node) { WRITE_NODE_TYPE("ARRAYREF"); *************** *** 1442,1447 **** _outPlannerInfo(StringInfo str, PlannerInfo *node) --- 1479,1485 ---- WRITE_NODE_FIELD(group_pathkeys); WRITE_NODE_FIELD(distinct_pathkeys); WRITE_NODE_FIELD(sort_pathkeys); + WRITE_NODE_FIELD(window_pathkeys); WRITE_FLOAT_FIELD(total_table_pages, "%.0f"); WRITE_FLOAT_FIELD(tuple_fraction, "%.4f"); WRITE_BOOL_FIELD(hasJoinRTEs); *************** *** 1723,1728 **** _outSelectStmt(StringInfo str, SelectStmt *node) --- 1761,1767 ---- WRITE_NODE_FIELD(groupClause); WRITE_NODE_FIELD(havingClause); WRITE_NODE_FIELD(withClause); + WRITE_NODE_FIELD(windowClause); WRITE_NODE_FIELD(valuesLists); WRITE_NODE_FIELD(sortClause); WRITE_NODE_FIELD(limitOffset); *************** *** 1744,1749 **** _outFuncCall(StringInfo str, FuncCall *node) --- 1783,1789 ---- WRITE_BOOL_FIELD(agg_star); WRITE_BOOL_FIELD(agg_distinct); WRITE_BOOL_FIELD(func_variadic); + WRITE_NODE_FIELD(win_definition); WRITE_LOCATION_FIELD(location); } *************** *** 1868,1873 **** _outQuery(StringInfo str, Query *node) --- 1908,1914 ---- WRITE_BOOL_FIELD(hasAggs); WRITE_BOOL_FIELD(hasSubLinks); WRITE_BOOL_FIELD(hasDistinctOn); + WRITE_BOOL_FIELD(hasWindow); WRITE_BOOL_FIELD(hasRecursive); WRITE_NODE_FIELD(cteList); WRITE_NODE_FIELD(rtable); *************** *** 1878,1883 **** _outQuery(StringInfo str, Query *node) --- 1919,1925 ---- WRITE_NODE_FIELD(havingQual); WRITE_NODE_FIELD(distinctClause); WRITE_NODE_FIELD(sortClause); + WRITE_NODE_FIELD(windowList); WRITE_NODE_FIELD(limitOffset); WRITE_NODE_FIELD(limitCount); WRITE_NODE_FIELD(rowMarks); *************** *** 1896,1901 **** _outSortGroupClause(StringInfo str, SortGroupClause *node) --- 1938,1990 ---- } static void + _outOrderClause(StringInfo str, OrderClause *node) + { + WRITE_NODE_TYPE("ORDERCLAUSE"); + + WRITE_UINT_FIELD(tleSortGroupRef); + WRITE_OID_FIELD(eqop); + WRITE_OID_FIELD(sortop); + WRITE_BOOL_FIELD(nulls_first); + } + + static void + _outPartitionClause(StringInfo str, PartitionClause *node) + { + WRITE_NODE_TYPE("PARTITIONCLAUSE"); + + WRITE_UINT_FIELD(tleSortGroupRef); + WRITE_OID_FIELD(eqop); + WRITE_OID_FIELD(sortop); + WRITE_BOOL_FIELD(nulls_first); + } + + static void + _outWinDef(StringInfo str, WinDef *node) + { + WRITE_NODE_TYPE("WINDEF"); + + WRITE_NODE_FIELD(partitionClause); + WRITE_NODE_FIELD(orderClause); + WRITE_NODE_FIELD(wfunc); + WRITE_STRING_FIELD(name); + WRITE_LOCATION_FIELD(location); + } + + static void + _outWindowClause(StringInfo str, WindowClause *node) + { + WRITE_NODE_TYPE("WINDOWCLAUSE"); + + WRITE_NODE_FIELD(partitionClause); + WRITE_NODE_FIELD(orderClause); + WRITE_STRING_FIELD(name); + WRITE_BOOL_FIELD(has_wfunc); + WRITE_UINT_FIELD(winref); + } + + + static void _outRowMarkClause(StringInfo str, RowMarkClause *node) { WRITE_NODE_TYPE("ROWMARKCLAUSE"); *************** *** 2366,2371 **** _outNode(StringInfo str, void *obj) --- 2455,2463 ---- case T_Limit: _outLimit(str, obj); break; + case T_Window: + _outWindow(str, obj); + break; case T_PlanInvalItem: _outPlanInvalItem(str, obj); break; *************** *** 2390,2395 **** _outNode(StringInfo str, void *obj) --- 2482,2490 ---- case T_Aggref: _outAggref(str, obj); break; + case T_WFunc: + _outWFunc(str, obj); + break; case T_ArrayRef: _outArrayRef(str, obj); break; *************** *** 2614,2619 **** _outNode(StringInfo str, void *obj) --- 2709,2726 ---- case T_SortGroupClause: _outSortGroupClause(str, obj); break; + case T_OrderClause: + _outOrderClause(str, obj); + break; + case T_PartitionClause: + _outPartitionClause(str, obj); + break; + case T_WinDef: + _outWinDef(str, obj); + break; + case T_WindowClause: + _outWindowClause(str, obj); + break; case T_RowMarkClause: _outRowMarkClause(str, obj); break; *** a/src/backend/nodes/readfuncs.c --- b/src/backend/nodes/readfuncs.c *************** *** 155,160 **** _readQuery(void) --- 155,161 ---- READ_BOOL_FIELD(hasAggs); READ_BOOL_FIELD(hasSubLinks); READ_BOOL_FIELD(hasDistinctOn); + READ_BOOL_FIELD(hasWindow); READ_BOOL_FIELD(hasRecursive); READ_NODE_FIELD(cteList); READ_NODE_FIELD(rtable); *************** *** 165,170 **** _readQuery(void) --- 166,172 ---- READ_NODE_FIELD(havingQual); READ_NODE_FIELD(distinctClause); READ_NODE_FIELD(sortClause); + READ_NODE_FIELD(windowList); READ_NODE_FIELD(limitOffset); READ_NODE_FIELD(limitCount); READ_NODE_FIELD(rowMarks); *************** *** 218,223 **** _readSortGroupClause(void) --- 220,291 ---- } /* + * _readOrderClause + */ + static OrderClause * + _readOrderClause(void) + { + READ_LOCALS(OrderClause); + + READ_UINT_FIELD(tleSortGroupRef); + READ_OID_FIELD(eqop); + READ_OID_FIELD(sortop); + READ_BOOL_FIELD(nulls_first); + + READ_DONE(); + } + + /* + * _readPartitionClause + */ + static PartitionClause * + _readPartitionClause(void) + { + READ_LOCALS(PartitionClause); + + READ_UINT_FIELD(tleSortGroupRef); + READ_OID_FIELD(eqop); + READ_OID_FIELD(sortop); + READ_BOOL_FIELD(nulls_first); + + READ_DONE(); + } + + /* + * _readWinDef + */ + static WinDef * + _readWinDef(void) + { + READ_LOCALS(WinDef); + + READ_NODE_FIELD(partitionClause); + READ_NODE_FIELD(orderClause); + READ_NODE_FIELD(wfunc); + READ_STRING_FIELD(name); + READ_LOCATION_FIELD(location); + + READ_DONE(); + } + + /* + * _readWindowClause + */ + static WindowClause * + _readWindowClause(void) + { + READ_LOCALS(WindowClause); + + READ_NODE_FIELD(partitionClause); + READ_NODE_FIELD(orderClause); + READ_STRING_FIELD(name); + READ_BOOL_FIELD(has_wfunc); + READ_UINT_FIELD(winref); + + READ_DONE(); + } + + /* * _readRowMarkClause */ static RowMarkClause * *************** *** 401,406 **** _readAggref(void) --- 469,494 ---- } /* + * _readWFunc + */ + static WFunc * + _readWFunc(void) + { + READ_LOCALS(WFunc); + + READ_OID_FIELD(winfnoid); + READ_OID_FIELD(wintype); + READ_NODE_FIELD(args); + READ_UINT_FIELD(winlevelsup); + READ_BOOL_FIELD(winstar); + READ_UINT_FIELD(winref); + READ_BOOL_FIELD(pure_agg); + READ_LOCATION_FIELD(location); + + READ_DONE(); + } + + /* * _readArrayRef */ static ArrayRef * *************** *** 1089,1094 **** parseNodeString(void) --- 1177,1190 ---- return_value = _readQuery(); else if (MATCH("SORTGROUPCLAUSE", 15)) return_value = _readSortGroupClause(); + else if (MATCH("ORDERCLAUSE", 11)) + return_value = _readOrderClause(); + else if (MATCH("PARTITIONCLAUSE", 15)) + return_value = _readPartitionClause(); + else if (MATCH("WINDEF", 6)) + return_value = _readWinDef(); + else if (MATCH("WINDOWCLAUSE", 12)) + return_value = _readWindowClause(); else if (MATCH("ROWMARKCLAUSE", 13)) return_value = _readRowMarkClause(); else if (MATCH("COMMONTABLEEXPR", 15)) *************** *** 1109,1114 **** parseNodeString(void) --- 1205,1212 ---- return_value = _readParam(); else if (MATCH("AGGREF", 6)) return_value = _readAggref(); + else if (MATCH("WFUNC", 5)) + return_value = _readWFunc(); else if (MATCH("ARRAYREF", 8)) return_value = _readArrayRef(); else if (MATCH("FUNCEXPR", 8)) *** a/src/backend/optimizer/path/allpaths.c --- b/src/backend/optimizer/path/allpaths.c *************** *** 939,945 **** standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels) * * Conditions checked here: * ! * 1. If the subquery has a LIMIT clause, we must not push down any quals, * since that could change the set of rows returned. * * 2. If the subquery contains EXCEPT or EXCEPT ALL set ops we cannot push --- 939,945 ---- * * Conditions checked here: * ! * 1. If the subquery has a LIMIT or Window clause, we must not push down any quals, * since that could change the set of rows returned. * * 2. If the subquery contains EXCEPT or EXCEPT ALL set ops we cannot push *************** *** 960,966 **** subquery_is_pushdown_safe(Query *subquery, Query *topquery, SetOperationStmt *topop; /* Check point 1 */ ! if (subquery->limitOffset != NULL || subquery->limitCount != NULL) return false; /* Are we at top level, or looking at a setop component? */ --- 960,967 ---- SetOperationStmt *topop; /* Check point 1 */ ! if (subquery->limitOffset != NULL || subquery->limitCount != NULL ! || subquery->hasWindow) return false; /* Are we at top level, or looking at a setop component? */ *** a/src/backend/optimizer/path/equivclass.c --- b/src/backend/optimizer/path/equivclass.c *************** *** 438,451 **** get_eclass_for_sort_expr(PlannerInfo *root, /* * add_eq_member doesn't check for volatile functions, set-returning ! * functions, or aggregates, but such could appear in sort expressions; so ! * we have to check whether its const-marking was correct. */ if (newec->ec_has_const) { if (newec->ec_has_volatile || expression_returns_set((Node *) expr) || ! contain_agg_clause((Node *) expr)) { newec->ec_has_const = false; newem->em_is_const = false; --- 438,454 ---- /* * add_eq_member doesn't check for volatile functions, set-returning ! * functions, aggregates, or window functions, but such could appear ! * in sort expressions; so we have to check whether its const-marking ! * was correct. ! * XXX: need consider stable function? */ if (newec->ec_has_const) { if (newec->ec_has_volatile || expression_returns_set((Node *) expr) || ! contain_agg_clause((Node *) expr) || ! find_wfunc((Node *) expr) != NULL) { newec->ec_has_const = false; newem->em_is_const = false; *** a/src/backend/optimizer/plan/createplan.c --- b/src/backend/optimizer/plan/createplan.c *************** *** 3496,3501 **** make_limit(Plan *lefttree, Node *limitOffset, Node *limitCount, --- 3496,3627 ---- return node; } + /* + * make_window + * Almost same as make_agg in the meaning of partition columns + */ + Window * + make_window(PlannerInfo *root, + List *tlist, + WindowClause *parse, + Oid *prtOperators, + Oid *ordOperators, + Plan *lefttree) + { + Window *node = makeNode(Window); + Plan *plan = &node->plan; + List *sub_tlist = lefttree->targetlist; + ListCell *lc; + int numCols; + int numWfuncs; + + /* + * count up window functions + */ + numWfuncs = 0; + foreach(lc, tlist) + { + TargetEntry *tle = (TargetEntry *) lfirst(lc); + Node *wfunc; + + wfunc = find_wfunc((Node *) tle->expr); + if(wfunc) + numWfuncs++; + } + + copy_plan_costsize(plan, lefttree); + + /* + * Charge one cpu_operator_cost per comparison per input tuple. We assume + * all columns get compared at most of the tuples. + */ + numCols = list_length(tlist); + plan->total_cost += cpu_operator_cost * lefttree->plan_rows * numCols; + plan->total_cost += cpu_operator_cost * numWfuncs * lefttree->plan_rows; + + plan->lefttree = lefttree; + plan->targetlist = tlist; + plan->qual = NIL; + + numCols = list_length(parse->partitionClause); + node->prtNumCols = numCols; + if (parse->partitionClause) + { + int keyno = 0; + AttrNumber *prtColIdx = NULL; + ListCell *pl; + + prtColIdx = (AttrNumber *) palloc(sizeof(AttrNumber) * numCols); + + foreach(pl, parse->partitionClause) + { + PartitionClause *prtcl = (PartitionClause *) lfirst(pl); + Node *prtexpr = get_sortgroupclause_expr(prtcl, sub_tlist); + TargetEntry *te = NULL; + ListCell *l; + + foreach(l, sub_tlist) + { + te = (TargetEntry *) lfirst(l); + if (equal(prtexpr, te->expr)) + break; + } + + Assert(te); + prtColIdx[keyno++] = te->resno; + } + node->prtColIdx = prtColIdx; + node->prtOperators = prtOperators; + } + + numCols = list_length(parse->orderClause); + node->ordNumCols = numCols; + if (parse->orderClause) + { + int keyno = 0; + AttrNumber *ordColIdx = NULL; + ListCell *ol; + + ordColIdx = (AttrNumber *) palloc(sizeof(AttrNumber) * numCols); + foreach(ol, parse->orderClause) + { + OrderClause *ordcl = (OrderClause *) lfirst(ol); + Node *ordexpr = get_sortgroupclause_expr(ordcl, sub_tlist); + TargetEntry *te = NULL; + ListCell *l; + + foreach(l, sub_tlist) + { + te = (TargetEntry *) lfirst(l); + if (equal(ordexpr, te->expr)) + break; + } + + Assert(te); + ordColIdx[keyno++] = te->resno; + } + /* orderClause columns will be used as "key columns". */ + node->ordColIdx = ordColIdx; + node->ordOperators = ordOperators; + } + + /* + * Currently, the parser doesn't accept frame clause. + * This is for future implementation. + */ + node->preceding_type = FRAME_UNBOUNDED; + node->following_type = (node->ordNumCols > 0 ? + FRAME_CURRENT_RANGE : FRAME_UNBOUNDED); + + node->preceding_rows = 0; + node->following_rows = 0; + node->preceding = NULL; + node->following = NULL; + + node->winref = parse->winref; + + return node; + } /* * make_result *** a/src/backend/optimizer/plan/planagg.c --- b/src/backend/optimizer/plan/planagg.c *************** *** 52,57 **** static ScanDirection match_agg_to_index_col(MinMaxAggInfo *info, --- 52,58 ---- static void make_agg_subplan(PlannerInfo *root, MinMaxAggInfo *info); static Node *replace_aggs_with_params_mutator(Node *node, List **context); static Oid fetch_agg_sort_op(Oid aggfnoid); + static bool find_aggref_walker(Node *node, Aggref **context); /* *************** *** 625,627 **** fetch_agg_sort_op(Oid aggfnoid) --- 626,652 ---- return aggsortop; } + + Aggref * + find_aggref(Node *node) + { + Aggref *context = NULL; + + find_aggref_walker(node, &context); + return context; + } + + static bool + find_aggref_walker(Node *node, Aggref **context) + { + if (node == NULL) + return false; + + if (IsA(node, Aggref)) + { + *context = (Aggref *) node; + return true; + } + + return expression_tree_walker(node, find_aggref_walker, (void *) context); + } *** a/src/backend/optimizer/plan/planmain.c --- b/src/backend/optimizer/plan/planmain.c *************** *** 233,238 **** query_planner(PlannerInfo *root, List *tlist, --- 233,239 ---- */ root->query_pathkeys = canonicalize_pathkeys(root, root->query_pathkeys); root->group_pathkeys = canonicalize_pathkeys(root, root->group_pathkeys); + root->window_pathkeys = canonicalize_pathkeys(root, root->window_pathkeys); root->distinct_pathkeys = canonicalize_pathkeys(root, root->distinct_pathkeys); root->sort_pathkeys = canonicalize_pathkeys(root, root->sort_pathkeys); *** a/src/backend/optimizer/plan/planner.c --- b/src/backend/optimizer/plan/planner.c *************** *** 22,27 **** --- 22,28 ---- #include "executor/nodeAgg.h" #include "miscadmin.h" #include "nodes/makefuncs.h" + #include "nodes/nodeFuncs.h" #include "optimizer/clauses.h" #include "optimizer/cost.h" #include "optimizer/pathnode.h" *************** *** 82,87 **** static void locate_grouping_columns(PlannerInfo *root, --- 83,90 ---- List *sub_tlist, AttrNumber *groupColIdx); static List *postprocess_setop_tlist(List *new_tlist, List *orig_tlist); + static List *preprocess_window(List *tlist, Plan *subplan); + static List *window_tlist(List *tlist, Index winref, bool *has_win); /***************************************************************************** *************** *** 899,904 **** grouping_planner(PlannerInfo *root, double tuple_fraction) --- 902,948 ---- else root->group_pathkeys = NIL; + /* + * Currently we don't relocate each Window node based on + * cost estimation; it'd be better to think about the order + * of each node execution. But for now we only think about + * the bottom node pathkeys. This should be fixed. + */ + if (parse->windowList) + { + ListCell *l; + + foreach(l, parse->windowList) + { + List *partition_pathkeys = NIL; + List *order_pathkeys = NIL; + WindowClause *wc = (WindowClause *) lfirst(l); + + if (wc->partitionClause && + grouping_is_sortable(wc->partitionClause)) + partition_pathkeys = + make_pathkeys_for_sortclauses(root, + wc->partitionClause, + tlist, + false); + + if (wc->orderClause && + grouping_is_sortable(wc->orderClause)) + order_pathkeys = + make_pathkeys_for_sortclauses(root, + wc->orderClause, + tlist, + false); + + root->window_pathkeys = list_concat(partition_pathkeys, order_pathkeys); + /* + * Window node may be stacked more than one, but + * what is effective to query_planner() is only the bottom pathkeys. + */ + break; + } + } + if (parse->distinctClause && grouping_is_sortable(parse->distinctClause)) root->distinct_pathkeys = *************** *** 951,956 **** grouping_planner(PlannerInfo *root, double tuple_fraction) --- 995,1002 ---- */ if (root->group_pathkeys) root->query_pathkeys = root->group_pathkeys; + else if (root->window_pathkeys) + root->query_pathkeys = root->window_pathkeys; else if (list_length(root->distinct_pathkeys) > list_length(root->sort_pathkeys)) root->query_pathkeys = root->distinct_pathkeys; *************** *** 1237,1242 **** grouping_planner(PlannerInfo *root, double tuple_fraction) --- 1283,1369 ---- } /* end of if (setOperations) */ /* + * Window nodes are stacked one by one for each window because Window + * functions are evaluated in the appropriate window. Hence, in a window + * level, upper window expressions are replaced by nulls so as to be + * evaluated in the upper Window node. For lower expressions, setrefs + * will replace them to Var nodes. + */ + if (parse->windowList) + { + ListCell *l; + List *window_pathkeys = NIL; + + /* + * If the top-level plan node is one that cannot do expression + * evaluation, we must insert a Result node to project the + * desired tlist. + */ + if (!is_projection_capable_plan(result_plan)) + { + result_plan = (Plan *) make_result(root, + tlist, + NULL, + result_plan); + } + result_plan->targetlist = preprocess_window(tlist, result_plan); + foreach(l, parse->windowList) + { + List *current_tlist; + List *partition_pathkeys = NIL; + List *order_pathkeys = NIL; + WindowClause *wc = (WindowClause *) lfirst(l); + bool has_win = false; + + current_tlist = window_tlist(tlist, wc->winref, &has_win); + if (!has_win) + continue; + + /* + * Currently, Window partitioning is only by Sort. + * So just join partitionClause and orderClause + * to match Grouping. Hashing algorithm will be considered later. + */ + if (wc->partitionClause) + { + partition_pathkeys = make_pathkeys_for_sortclauses(root, + wc->partitionClause, + result_plan->targetlist, + false); + } + + if (wc->orderClause) + { + order_pathkeys = make_pathkeys_for_sortclauses(root, + wc->orderClause, + result_plan->targetlist, + false); + } + + /* + * create Sort node under Window, so PARTITION BY works + */ + window_pathkeys = list_concat(partition_pathkeys, order_pathkeys); + window_pathkeys = canonicalize_pathkeys(root, window_pathkeys); + if (!pathkeys_contained_in(window_pathkeys, current_pathkeys)) + { + result_plan = (Plan *) make_sort_from_pathkeys(root, + result_plan, + window_pathkeys, + -1); + current_pathkeys = window_pathkeys; + } + + result_plan = (Plan *) make_window(root, + current_tlist, + wc, + extract_grouping_ops(wc->partitionClause), + extract_grouping_ops(wc->orderClause), + result_plan); + } + } + + /* * If there is a DISTINCT clause, add the necessary node(s). */ if (parse->distinctClause) *************** *** 2039,2045 **** make_subplanTargetList(PlannerInfo *root, * If we're not grouping or aggregating, there's nothing to do here; * query_planner should receive the unmodified target list. */ ! if (!parse->hasAggs && !parse->groupClause && !root->hasHavingQual) { *need_tlist_eval = true; return tlist; --- 2166,2172 ---- * If we're not grouping or aggregating, there's nothing to do here; * query_planner should receive the unmodified target list. */ ! if (!parse->hasAggs && !parse->groupClause && !root->hasHavingQual && !parse->hasWindow) { *need_tlist_eval = true; return tlist; *************** *** 2199,2201 **** postprocess_setop_tlist(List *new_tlist, List *orig_tlist) --- 2326,2513 ---- elog(ERROR, "resjunk output columns are not implemented"); return new_tlist; } + + /* + * preprocess_window - + * given parser tlist, returns recomposed tlist for current top plan. + * + * Before create Window nodes, window expressions are removed from current + * tlist because current plan must not be a Window node. These expressions + * are evaluated in appropriate window later. + * + * There are two main cases to be considered. + * 1. Agg/Group + * Vars from scan node required by any of expression should have been pulled + * up to now. So there's no need to consider it. + * + * 2. Other Scan + * The situation resembles to the one in Agg/Group. Var expressions are pulled + * (tlist is flattened), and other evaluation expressions but window expression + * are as well, since in Window nodes we take care of only window expression. + * + * common in both + * WFunc args are also pulled and appended to the subplan. The window functions + * assume their arguments must be Var or Const, retrieved from outer plan. + */ + static List * + preprocess_window(List *tlist, Plan *subplan) + { + ListCell *l; + List *pulled_exprs = NIL; + List *output_targetlist = NIL; + AttrNumber resno, + tlist_resno; + + /* + * Agg/Group has already flatten its tlist. Only other nodes + * must consider pulling vars. + */ + if (!(IsA(subplan, Agg) || IsA(subplan, Group))) + { + /* + * copyObject() is required, as in tlist = lappend(tlist, tle); + * Without this, it may fall into an infinte loop. + */ + output_targetlist = copyObject(flatten_tlist(tlist)); + tlist_resno = list_length(tlist); + + foreach(l, output_targetlist) + { + TargetEntry *tle = (TargetEntry *) lfirst(l); + TargetEntry *member; + Var *var = (Var *) tle->expr; + + /* + * tlist must have full set of vars required + * by any window functions up to the top Window node. + */ + member = tlist_member((Node *) var, tlist); + if (!member) + { + tlist_resno++; + tle = makeTargetEntry(copyObject(var), + tlist_resno, + NULL, + true); + tlist = lappend(tlist, tle); + } + else + { + /* + * It is necessary since flatten_tlist() doesn't pull + * their resosortgroupref. + */ + tle->ressortgroupref = member->ressortgroupref; + } + } + } + + resno = list_length(output_targetlist); + foreach(l, tlist) + { + TargetEntry *tle = (TargetEntry *) lfirst(l); + TargetEntry *newtle; + WFunc *wfunc; + + /* + * WFuncs doesn't work in the current top plan. So + * we replace these nodes to NullConst but save their + * args in order that WFuncs argument evaluations contain + * only Var and Const. + */ + wfunc = (WFunc *) find_wfunc((Node *) tle->expr); + if (wfunc) + { + pulled_exprs = list_concat(pulled_exprs, wfunc->args); + tle = flatCopyTargetEntry(tle); + tle->expr = (Expr *) makeNullConst(exprType((Node *) tle->expr), + exprTypmod((Node *) tle->expr)); + } + + if (tlist_member((Node *) tle->expr, output_targetlist)) + continue; + + /* + * If an entry isn't in flatten tlist nor window expression, + * it must be evaluated here before any of Window nodes. + */ + newtle = flatCopyTargetEntry(tle); + resno++; + newtle->resno = resno; + output_targetlist = lappend(output_targetlist, newtle); + } + + tlist_resno = list_length(tlist); + /* + * finally pulled arguments are appended to the current tlist so that + * each window function can take Var or Const arguments. + */ + foreach(l, pulled_exprs) + { + TargetEntry *tle; + Expr *expr = (Expr *) lfirst(l); + + resno++; + tle = makeTargetEntry(expr, + resno, + NULL, + true); + output_targetlist = lappend(output_targetlist, tle); + + /* + * We also need the entry in the middle windows, + * so that upper window can take them as arguments. + */ + if (!tlist_member((Node *) tle->expr, tlist)) + { + tle = flatCopyTargetEntry(tle); + tlist_resno++; + tle->resno = tlist_resno; + tlist = lappend(tlist, tle); + } + } + + return output_targetlist; + } + + /* + * window_tlist - + * + * creates tlist suitable for the current window, indicated by winref. + * For the upper window expressions than current, they are relpaced + * by NullConst, so that setrefs can understand where the references + * may stop. + * With window clause syntax, there may be a window in which none of + * window evaluations is executed. In this case, we can pass by the window. + */ + static List * + window_tlist(List *tlist, Index winref, bool *has_win) + { + List *output_targetlist = NIL; + ListCell *l; + + foreach(l, tlist) + { + TargetEntry *tle = (TargetEntry *) lfirst(l); + WFunc *wfunc; + + tle = flatCopyTargetEntry(tle); + wfunc = (WFunc *) find_wfunc_greater((Node *) tle->expr); + if (wfunc && winref == wfunc->winref) + *has_win = true; + + /* + * window that contains evaluation on upper than current window is set null. + * for the lower ones, setrefs will fix them to Vars pointing to OUTER. + */ + if (wfunc && winref < wfunc->winref) + { + tle->expr = (Expr *) makeNullConst(exprType((Node *) tle->expr), + exprTypmod((Node *) tle->expr)); + } + + output_targetlist = lappend(output_targetlist, tle); + } + + return output_targetlist; + } *** a/src/backend/optimizer/plan/setrefs.c --- b/src/backend/optimizer/plan/setrefs.c *************** *** 416,421 **** set_plan_refs(PlannerGlobal *glob, Plan *plan, int rtoffset) --- 416,422 ---- break; case T_Agg: case T_Group: + case T_Window: set_upper_references(glob, plan, rtoffset); break; case T_Result: *** a/src/backend/optimizer/plan/subselect.c --- b/src/backend/optimizer/plan/subselect.c *************** *** 1954,1959 **** finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params) --- 1954,1960 ---- case T_Unique: case T_SetOp: case T_Group: + case T_Window: break; default: *** a/src/backend/optimizer/prep/prepjointree.c --- b/src/backend/optimizer/prep/prepjointree.c *************** *** 937,942 **** is_simple_subquery(Query *subquery) --- 937,943 ---- * limiting, or WITH. (XXX WITH could possibly be allowed later) */ if (subquery->hasAggs || + subquery->hasWindow || subquery->groupClause || subquery->havingQual || subquery->sortClause || *** a/src/backend/optimizer/util/clauses.c --- b/src/backend/optimizer/util/clauses.c *************** *** 780,785 **** contain_volatile_functions_walker(Node *node, void *context) --- 780,793 ---- return true; /* else fall through to check args */ } + else if (IsA(node, WFunc)) + { + WFunc *winagg = (WFunc *) node; + + if (func_volatile(winagg->winfnoid) == PROVOLATILE_VOLATILE) + return true; + /* else fall through to check args */ + } else if (IsA(node, OpExpr)) { OpExpr *expr = (OpExpr *) node; *** a/src/backend/optimizer/util/tlist.c --- b/src/backend/optimizer/util/tlist.c *************** *** 366,368 **** grouping_is_hashable(List *groupClause) --- 366,369 ---- } return true; } + *** a/src/backend/optimizer/util/var.c --- b/src/backend/optimizer/util/var.c *************** *** 65,70 **** typedef struct --- 65,76 ---- int sublevels_up; } flatten_join_alias_vars_context; + typedef struct + { + Node *node; + int prefer; + } find_wfunc_context; + static bool pull_varnos_walker(Node *node, pull_varnos_context *context); static bool pull_varattnos_walker(Node *node, Bitmapset **varattnos); *************** *** 81,86 **** static bool pull_var_clause_walker(Node *node, --- 87,94 ---- static Node *flatten_join_alias_vars_mutator(Node *node, flatten_join_alias_vars_context *context); static Relids alias_relid_set(PlannerInfo *root, Relids relids); + static Node *find_wfunc_inner(Node *node, int prefer); + static bool find_wfunc_walker(Node *node, find_wfunc_context *context); /* *************** *** 848,850 **** alias_relid_set(PlannerInfo *root, Relids relids) --- 856,937 ---- bms_free(tmprelids); return result; } + + /* + * Since a target entry may contain more than one window function, + * caller can specify which window function is demanded. + */ + Node * + find_wfunc_greater(Node *node) + { + return find_wfunc_inner(node, 1); + } + + Node * + find_wfunc_lesser(Node *node) + { + return find_wfunc_inner(node, -1); + } + + Node * + find_wfunc(Node *node) + { + return find_wfunc_inner(node, 0); + } + + /* + * find_wfunc - + * find window function node in the given TargetEntry. + * parameter prefer means caller prefers greater winref in > 0 and + * lesser winref in < 0. prefer 0 means no care. + */ + static Node * + find_wfunc_inner(Node *node, int prefer) + { + find_wfunc_context context; + + context.node = NULL; + context.prefer = prefer; + find_wfunc_walker(node, &context); + + return context.node; + } + + /* + * find_wfunc_walker - + */ + static bool + find_wfunc_walker(Node *node, find_wfunc_context *context) + { + if (node == NULL) + return false; + + if (IsA(node, WFunc)) + { + if (context->node) + { + if (context->prefer > 0) + { + if (((WFunc *) context->node)->winref < ((WFunc *) node)->winref) + context->node = node; + } + else + { + /* context->node != NULL && context->prefer == 0 doesn't make sense */ + if (((WFunc *) context->node)->winref > ((WFunc *) node)->winref) + context->node = node; + } + } + else + context->node = node; + + /* + * no prefer, which means no need to search others. + * let's return the first found node. + */ + if (context->prefer == 0) + return true; + } + + return expression_tree_walker(node, find_wfunc_walker, (void *) context); + } *** a/src/backend/parser/analyze.c --- b/src/backend/parser/analyze.c *************** *** 779,784 **** transformSelectStmt(ParseState *pstate, SelectStmt *stmt) --- 779,789 ---- qry->hasDistinctOn = true; } + pstate->p_windef_list = list_concat(stmt->windowClause, pstate->p_windef_list); + qry->windowList = transformWinDef(pstate, + pstate->p_windef_list, + &qry->targetList); + /* transform LIMIT */ qry->limitOffset = transformLimitClause(pstate, stmt->limitOffset, "OFFSET"); *************** *** 798,806 **** transformSelectStmt(ParseState *pstate, SelectStmt *stmt) --- 803,815 ---- qry->hasSubLinks = pstate->p_hasSubLinks; qry->hasAggs = pstate->p_hasAggs; + qry->hasWindow = pstate->p_hasWindow; if (pstate->p_hasAggs || qry->groupClause || qry->havingQual) parseCheckAggregates(pstate, qry); + if (pstate->p_hasWindow) + parseCheckWindow(pstate, qry); + foreach(l, stmt->lockingClause) { transformLockingClause(pstate, qry, (LockingClause *) lfirst(l)); *************** *** 1564,1571 **** transformUpdateStmt(ParseState *pstate, UpdateStmt *stmt) qry->hasSubLinks = pstate->p_hasSubLinks; /* ! * Top-level aggregates are simply disallowed in UPDATE, per spec. (From ! * an implementation point of view, this is forced because the implicit * ctid reference would otherwise be an ungrouped variable.) */ if (pstate->p_hasAggs) --- 1573,1580 ---- qry->hasSubLinks = pstate->p_hasSubLinks; /* ! * Top-level aggregates nor window are simply disallowed in UPDATE, per spec. ! * (From an implementation point of view, this is forced because the implicit * ctid reference would otherwise be an ungrouped variable.) */ if (pstate->p_hasAggs) *************** *** 1575,1580 **** transformUpdateStmt(ParseState *pstate, UpdateStmt *stmt) --- 1584,1594 ---- parser_errposition(pstate, locate_agg_of_level((Node *) qry, 0)))); + if (pstate->p_hasWindow) + ereport(ERROR, + (errcode(ERRCODE_WINDOWING_ERROR), + errmsg("cannot use window function in UPDATE"))); + /* * Now we are done with SELECT-like processing, and can get on with * transforming the target list to match the UPDATE target columns. *************** *** 1674,1679 **** transformReturningList(ParseState *pstate, List *returningList) --- 1688,1699 ---- parser_errposition(pstate, locate_agg_of_level((Node *) rlist, 0)))); + /* window functions not allowed in returning clause */ + if (pstate->p_hasWindow) + ereport(ERROR, + (errcode(ERRCODE_WINDOWING_ERROR), + errmsg("cannot use window function in RETURNING"))); + /* no new relation references please */ if (list_length(pstate->p_rtable) != length_rtable) { *** a/src/backend/parser/gram.y --- b/src/backend/parser/gram.y *************** *** 385,390 **** static TypeName *TableFuncTypeName(List *columns); --- 385,395 ---- %type with_clause %type cte_list + %type partition_clause opt_partition_clause window_clause window_definition_list + %type window_definition window_specification over_clause + /* since these are not implemented, types may be unmatched actually. */ + %type opt_frame_clause frame_clause frame_extent + %type frame_bound_const frame_bound opt_frame_exclusion /* * If you make any token changes, update the keyword table in *************** *** 414,423 **** static TypeName *TableFuncTypeName(List *columns); DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DESC DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P DOUBLE_P DROP ! EACH ELSE ENABLE_P ENCODING ENCRYPTED END_P ENUM_P ESCAPE EXCEPT EXCLUDING ! EXCLUSIVE EXECUTE EXISTS EXPLAIN EXTERNAL EXTRACT ! FALSE_P FAMILY FETCH FIRST_P FLOAT_P FOR FORCE FOREIGN FORWARD FREEZE FROM FULL FUNCTION GLOBAL GRANT GRANTED GREATEST GROUP_P --- 419,428 ---- DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DESC DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P DOUBLE_P DROP ! EACH ELSE ENABLE_P ENCODING ENCRYPTED END_P ENUM_P ESCAPE EXCEPT EXCLUDE ! EXCLUDING EXCLUSIVE EXECUTE EXISTS EXPLAIN EXTERNAL EXTRACT ! FALSE_P FAMILY FETCH FIRST_P FLOAT_P FOLLOWING FOR FORCE FOREIGN FORWARD FREEZE FROM FULL FUNCTION GLOBAL GRANT GRANTED GREATEST GROUP_P *************** *** 444,458 **** static TypeName *TableFuncTypeName(List *columns); NOT NOTHING NOTIFY NOTNULL NOWAIT NULL_P NULLIF NULLS_P NUMERIC OBJECT_P OF OFF OFFSET OIDS OLD ON ONLY OPERATOR OPTION OR ! ORDER OUT_P OUTER_P OVERLAPS OVERLAY OWNED OWNER ! PARSER PARTIAL PASSWORD PLACING PLANS POSITION ! PRECISION PRESERVE PREPARE PREPARED PRIMARY PRIOR PRIVILEGES PROCEDURAL PROCEDURE QUOTE ! READ REAL REASSIGN RECHECK RECURSIVE REFERENCES REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA RESET RESTART RESTRICT RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROW ROWS RULE --- 449,463 ---- NOT NOTHING NOTIFY NOTNULL NOWAIT NULL_P NULLIF NULLS_P NUMERIC OBJECT_P OF OFF OFFSET OIDS OLD ON ONLY OPERATOR OPTION OR ! ORDER OTHERS OUT_P OUTER_P OVER OVERLAPS OVERLAY OWNED OWNER ! PARSER PARTIAL PARTITION PASSWORD PLACING PLANS POSITION ! PRECEDING PRECISION PRESERVE PREPARE PREPARED PRIMARY PRIOR PRIVILEGES PROCEDURAL PROCEDURE QUOTE ! RANGE READ REAL REASSIGN RECHECK RECURSIVE REFERENCES REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA RESET RESTART RESTRICT RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROW ROWS RULE *************** *** 462,478 **** static TypeName *TableFuncTypeName(List *columns); STATISTICS STDIN STDOUT STORAGE STRICT_P STRIP_P SUBSTRING SUPERUSER_P SYMMETRIC SYSID SYSTEM_P ! TABLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN TIME TIMESTAMP TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P TRUNCATE TRUSTED TYPE_P ! UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNTIL UPDATE USER USING VACUUM VALID VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING VERBOSE VERSION_P VIEW VOLATILE ! WHEN WHERE WHITESPACE_P WITH WITHOUT WORK WRITE XML_P XMLATTRIBUTES XMLCONCAT XMLELEMENT XMLFOREST XMLPARSE XMLPI XMLROOT XMLSERIALIZE --- 467,483 ---- STATISTICS STDIN STDOUT STORAGE STRICT_P STRIP_P SUBSTRING SUPERUSER_P SYMMETRIC SYSID SYSTEM_P ! TABLE TABLESPACE TEMP TEMPLATE TEMPORARY TEXT_P THEN TIES TIME TIMESTAMP TO TRAILING TRANSACTION TREAT TRIGGER TRIM TRUE_P TRUNCATE TRUSTED TYPE_P ! UNBOUNDED UNCOMMITTED UNENCRYPTED UNION UNIQUE UNKNOWN UNLISTEN UNTIL UPDATE USER USING VACUUM VALID VALIDATOR VALUE_P VALUES VARCHAR VARIADIC VARYING VERBOSE VERSION_P VIEW VOLATILE ! WHEN WHERE WHITESPACE_P WINDOW WITH WITHOUT WORK WRITE XML_P XMLATTRIBUTES XMLCONCAT XMLELEMENT XMLFOREST XMLPARSE XMLPI XMLROOT XMLSERIALIZE *************** *** 506,512 **** static TypeName *TableFuncTypeName(List *columns); %nonassoc BETWEEN %nonassoc IN_P %left POSTFIXOP /* dummy for postfix Op rules */ ! %nonassoc IDENT /* to support target_el without AS */ %left Op OPERATOR /* multi-character ops and user-defined operators */ %nonassoc NOTNULL %nonassoc ISNULL --- 511,521 ---- %nonassoc BETWEEN %nonassoc IN_P %left POSTFIXOP /* dummy for postfix Op rules */ ! /* ! * - to support target_el without AS (IDENT) ! * - to support frame_clause starting ROWS/RANGE ! */ ! %nonassoc IDENT ROWS RANGE %left Op OPERATOR /* multi-character ops and user-defined operators */ %nonassoc NOTNULL %nonassoc ISNULL *************** *** 6424,6430 **** select_clause: simple_select: SELECT opt_distinct target_list into_clause from_clause where_clause ! group_clause having_clause { SelectStmt *n = makeNode(SelectStmt); n->distinctClause = $2; --- 6433,6439 ---- simple_select: SELECT opt_distinct target_list into_clause from_clause where_clause ! group_clause having_clause window_clause { SelectStmt *n = makeNode(SelectStmt); n->distinctClause = $2; *************** *** 6434,6439 **** simple_select: --- 6443,6449 ---- n->whereClause = $6; n->groupClause = $7; n->havingClause = $8; + n->windowClause = $9; $$ = (Node *)n; } | values_clause { $$ = $1; } *************** *** 8202,8208 **** c_expr: columnref { $$ = $1; } * (Note that many of the special SQL functions wouldn't actually make any * sense as functional index entries, but we ignore that consideration here.) */ ! func_expr: func_name '(' ')' { FuncCall *n = makeNode(FuncCall); n->funcname = $1; --- 8212,8218 ---- * (Note that many of the special SQL functions wouldn't actually make any * sense as functional index entries, but we ignore that consideration here.) */ ! func_expr: func_name '(' ')' over_clause { FuncCall *n = makeNode(FuncCall); n->funcname = $1; *************** *** 8210,8219 **** func_expr: func_name '(' ')' n->agg_star = FALSE; n->agg_distinct = FALSE; n->func_variadic = FALSE; n->location = @1; $$ = (Node *)n; } ! | func_name '(' expr_list ')' { FuncCall *n = makeNode(FuncCall); n->funcname = $1; --- 8220,8230 ---- n->agg_star = FALSE; n->agg_distinct = FALSE; n->func_variadic = FALSE; + n->win_definition = (WinDef *) $4; n->location = @1; $$ = (Node *)n; } ! | func_name '(' expr_list ')' over_clause { FuncCall *n = makeNode(FuncCall); n->funcname = $1; *************** *** 8221,8230 **** func_expr: func_name '(' ')' n->agg_star = FALSE; n->agg_distinct = FALSE; n->func_variadic = FALSE; n->location = @1; $$ = (Node *)n; } ! | func_name '(' VARIADIC a_expr ')' { FuncCall *n = makeNode(FuncCall); n->funcname = $1; --- 8232,8242 ---- n->agg_star = FALSE; n->agg_distinct = FALSE; n->func_variadic = FALSE; + n->win_definition = (WinDef *) $5; n->location = @1; $$ = (Node *)n; } ! | func_name '(' VARIADIC a_expr ')' /* intentionally not accept over_clause */ { FuncCall *n = makeNode(FuncCall); n->funcname = $1; *************** *** 8232,8241 **** func_expr: func_name '(' ')' n->agg_star = FALSE; n->agg_distinct = FALSE; n->func_variadic = TRUE; n->location = @1; $$ = (Node *)n; } ! | func_name '(' expr_list ',' VARIADIC a_expr ')' { FuncCall *n = makeNode(FuncCall); n->funcname = $1; --- 8244,8254 ---- n->agg_star = FALSE; n->agg_distinct = FALSE; n->func_variadic = TRUE; + n->win_definition = NULL; n->location = @1; $$ = (Node *)n; } ! | func_name '(' expr_list ',' VARIADIC a_expr ')' /* intentionally not accept over_clause */ { FuncCall *n = makeNode(FuncCall); n->funcname = $1; *************** *** 8243,8252 **** func_expr: func_name '(' ')' n->agg_star = FALSE; n->agg_distinct = FALSE; n->func_variadic = TRUE; n->location = @1; $$ = (Node *)n; } ! | func_name '(' ALL expr_list ')' { FuncCall *n = makeNode(FuncCall); n->funcname = $1; --- 8256,8266 ---- n->agg_star = FALSE; n->agg_distinct = FALSE; n->func_variadic = TRUE; + n->win_definition = NULL; n->location = @1; $$ = (Node *)n; } ! | func_name '(' ALL expr_list ')' /* intentionally not accept over_clause */ { FuncCall *n = makeNode(FuncCall); n->funcname = $1; *************** *** 8258,8267 **** func_expr: func_name '(' ')' * for that in FuncCall at the moment. */ n->func_variadic = FALSE; n->location = @1; $$ = (Node *)n; } ! | func_name '(' DISTINCT expr_list ')' { FuncCall *n = makeNode(FuncCall); n->funcname = $1; --- 8272,8282 ---- * for that in FuncCall at the moment. */ n->func_variadic = FALSE; + n->win_definition = NULL; n->location = @1; $$ = (Node *)n; } ! | func_name '(' DISTINCT expr_list ')' /* intentionally not accept over_clause */ { FuncCall *n = makeNode(FuncCall); n->funcname = $1; *************** *** 8269,8278 **** func_expr: func_name '(' ')' n->agg_star = FALSE; n->agg_distinct = TRUE; n->func_variadic = FALSE; n->location = @1; $$ = (Node *)n; } ! | func_name '(' '*' ')' { /* * We consider AGGREGATE(*) to invoke a parameterless --- 8284,8294 ---- n->agg_star = FALSE; n->agg_distinct = TRUE; n->func_variadic = FALSE; + n->win_definition = NULL; n->location = @1; $$ = (Node *)n; } ! | func_name '(' '*' ')' over_clause { /* * We consider AGGREGATE(*) to invoke a parameterless *************** *** 8290,8295 **** func_expr: func_name '(' ')' --- 8306,8312 ---- n->agg_star = TRUE; n->agg_distinct = FALSE; n->func_variadic = FALSE; + n->win_definition = (WinDef *) $5; n->location = @1; $$ = (Node *)n; } *************** *** 8737,8742 **** xml_whitespace_option: PRESERVE WHITESPACE_P { $$ = TRUE; } --- 8754,8908 ---- ; /* + * Window Definitions + * + * In SQL2003 window may appear after a function call or after HAVING, with the same syntax. + * If there is window syntax after HAVING, some of the windows can refer to it. + */ + over_clause: OVER window_specification { $$ = $2; } + | OVER IDENT + { + WinDef *n = makeNode(WinDef); + n->partitionClause = NIL; + n->orderClause = NIL; + n->wfunc = NULL; + n->name = pstrdup($2); + n->location = @1; + $$ = (Node *) n; + } + | /*EMPTY*/ { $$ = NULL; } + ; + window_specification: '(' opt_partition_clause opt_sort_clause opt_frame_clause')' + { + WinDef *n = makeNode(WinDef); + n->partitionClause = $2; + n->orderClause = $3; + n->wfunc = NULL; + n->name = NULL; + n->location = @1; + $$ = (Node *) n; + } + ; + + opt_partition_clause: + partition_clause { $$ = $1; } + | /*EMPTY*/ { $$ = NIL; } + ; + partition_clause: PARTITION BY expr_list + { + $$ = $3; + } + ; + + /* Frame clause is not supported but we must recognize its grammar to report errors */ + opt_frame_clause: + frame_clause + { + /* Frame clause should be called "FRAME clause" or "ROWS/RANGE clause"? */ + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("ROWS/RANGE clause of window functions not yet implemented"))); + } + | /*EMPTY*/ { $$ = NULL; } + ; + frame_clause: + ROWS frame_extent opt_frame_exclusion + { + $$ = NULL; + } + | RANGE frame_extent opt_frame_exclusion + { + $$ = NULL; + } + ; + /* + * frame_bound_const is redundant but needed instead of AexprConst, since + * AexprConst contains func_name which is sometimes BETWEEN + * and if so shift/reduce conflict occurs. But frame_bound doesn't + * need func_name. + */ + frame_extent: + UNBOUNDED PRECEDING { $$ = NULL; } + | CURRENT_P ROW { $$ = NULL; } + | frame_bound_const PRECEDING { $$ = NULL; } + | BETWEEN frame_bound AND frame_bound { $$ = NULL; } + ; + frame_bound: + UNBOUNDED PRECEDING { $$ = NULL; } + | CURRENT_P ROW { $$ = NULL; } + | frame_bound_const PRECEDING { $$ = NULL; } + | UNBOUNDED FOLLOWING { $$ = NULL; } + | frame_bound_const FOLLOWING { $$ = NULL; } + ; + /* + * similar to AexprConst except func_name + */ + frame_bound_const: + Iconst { $$ = makeIntConst($1, @1); } + | FCONST { $$ = makeFloatConst($1, @1); } + | Sconst { $$ = makeStringConst($1, @1); } + | BCONST { $$ = makeBitStringConst($1, @1); } + | XCONST { $$ = makeBitStringConst($1, @1); } + | ConstTypename Sconst { $$ = makeStringConstCast($2, @2, $1); } + | ConstInterval Sconst opt_interval + { + TypeName *t = $1; + t->typmods = $3; + $$ = makeStringConstCast($2, @2, t); + } + | ConstInterval '(' Iconst ')' Sconst opt_interval + { + TypeName *t = $1; + if ($6 != NIL) + { + if (list_length($6) != 1) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("interval precision specified twice"), + scanner_errposition(@1))); + t->typmods = lappend($6, makeIntConst($3, @3)); + } + else + t->typmods = list_make2(makeIntConst(INTERVAL_FULL_RANGE, -1), + makeIntConst($3, @3)); + $$ = makeStringConstCast($5, @5, t); + } + | TRUE_P { $$ = makeBoolAConst(TRUE, @1); } + | FALSE_P { $$ = makeBoolAConst(FALSE, @1); } + | NULL_P { $$ = makeNullAConst(@1); } + ; + opt_frame_exclusion: + EXCLUDE CURRENT_P ROW { $$ = NULL; } + | EXCLUDE GROUP_P { $$ = NULL; } + | EXCLUDE TIES { $$ = NULL; } + | EXCLUDE NO OTHERS { $$ = NULL; } + | /*EMPTY*/ { $$ = NULL; } + ; + + window_clause: + WINDOW window_definition_list { $$ = $2; } + | /*EMPTY*/ { $$ = NIL; } + ; + window_definition_list: + window_definition + { + $$ = list_make1($1); + } + | window_definition_list ',' window_definition + { + $$ = lappend($1, $3); + } + ; + window_definition: + IDENT AS window_specification + { + WinDef *n = (WinDef *) $3; + n->name = pstrdup($1); + $$ = (Node *) n; + } + ; + + /* * Supporting nonterminals for expressions. */ *************** *** 9463,9468 **** unreserved_keyword: --- 9629,9635 ---- | ENCRYPTED | ENUM_P | ESCAPE + | EXCLUDE | EXCLUDING | EXCLUSIVE | EXECUTE *************** *** 9470,9475 **** unreserved_keyword: --- 9637,9643 ---- | EXTERNAL | FAMILY | FIRST_P + | FOLLOWING | FORCE | FORWARD | FUNCTION *************** *** 9535,9546 **** unreserved_keyword: --- 9703,9717 ---- | OIDS | OPERATOR | OPTION + | OTHERS | OWNED | OWNER | PARSER | PARTIAL | PASSWORD | PLANS + | PARTITION + | PRECEDING | PREPARE | PREPARED | PRESERVE *************** *** 9549,9554 **** unreserved_keyword: --- 9720,9726 ---- | PROCEDURAL | PROCEDURE | QUOTE + | RANGE | READ | REASSIGN | RECHECK *************** *** 9600,9610 **** unreserved_keyword: --- 9772,9784 ---- | TEMPLATE | TEMPORARY | TEXT_P + | TIES | TRANSACTION | TRIGGER | TRUNCATE | TRUSTED | TYPE_P + | UNBOUNDED | UNCOMMITTED | UNENCRYPTED | UNKNOWN *************** *** 9785,9790 **** reserved_keyword: --- 9959,9965 ---- | ONLY | OR | ORDER + | OVER | PLACING | PRIMARY | REFERENCES *************** *** 9805,9810 **** reserved_keyword: --- 9980,9986 ---- | VARIADIC | WHEN | WHERE + | WINDOW | WITH ; *** a/src/backend/parser/keywords.c --- b/src/backend/parser/keywords.c *************** *** 160,165 **** const ScanKeyword ScanKeywords[] = { --- 160,166 ---- {"enum", ENUM_P, UNRESERVED_KEYWORD}, {"escape", ESCAPE, UNRESERVED_KEYWORD}, {"except", EXCEPT, RESERVED_KEYWORD}, + {"exclude", EXCLUDE, UNRESERVED_KEYWORD}, {"excluding", EXCLUDING, UNRESERVED_KEYWORD}, {"exclusive", EXCLUSIVE, UNRESERVED_KEYWORD}, {"execute", EXECUTE, UNRESERVED_KEYWORD}, *************** *** 172,177 **** const ScanKeyword ScanKeywords[] = { --- 173,179 ---- {"fetch", FETCH, RESERVED_KEYWORD}, {"first", FIRST_P, UNRESERVED_KEYWORD}, {"float", FLOAT_P, COL_NAME_KEYWORD}, + {"following", FOLLOWING, UNRESERVED_KEYWORD}, {"for", FOR, RESERVED_KEYWORD}, {"force", FORCE, UNRESERVED_KEYWORD}, {"foreign", FOREIGN, RESERVED_KEYWORD}, *************** *** 283,300 **** const ScanKeyword ScanKeywords[] = { --- 285,306 ---- {"option", OPTION, UNRESERVED_KEYWORD}, {"or", OR, RESERVED_KEYWORD}, {"order", ORDER, RESERVED_KEYWORD}, + {"others", OTHERS, UNRESERVED_KEYWORD}, {"out", OUT_P, COL_NAME_KEYWORD}, {"outer", OUTER_P, TYPE_FUNC_NAME_KEYWORD}, + {"over", OVER, RESERVED_KEYWORD}, {"overlaps", OVERLAPS, TYPE_FUNC_NAME_KEYWORD}, {"overlay", OVERLAY, COL_NAME_KEYWORD}, {"owned", OWNED, UNRESERVED_KEYWORD}, {"owner", OWNER, UNRESERVED_KEYWORD}, {"parser", PARSER, UNRESERVED_KEYWORD}, {"partial", PARTIAL, UNRESERVED_KEYWORD}, + {"partition", PARTITION, UNRESERVED_KEYWORD}, {"password", PASSWORD, UNRESERVED_KEYWORD}, {"placing", PLACING, RESERVED_KEYWORD}, {"plans", PLANS, UNRESERVED_KEYWORD}, {"position", POSITION, COL_NAME_KEYWORD}, + {"preceding", PRECEDING, UNRESERVED_KEYWORD}, {"precision", PRECISION, COL_NAME_KEYWORD}, {"prepare", PREPARE, UNRESERVED_KEYWORD}, {"prepared", PREPARED, UNRESERVED_KEYWORD}, *************** *** 305,310 **** const ScanKeyword ScanKeywords[] = { --- 311,317 ---- {"procedural", PROCEDURAL, UNRESERVED_KEYWORD}, {"procedure", PROCEDURE, UNRESERVED_KEYWORD}, {"quote", QUOTE, UNRESERVED_KEYWORD}, + {"range", RANGE, UNRESERVED_KEYWORD}, {"read", READ, UNRESERVED_KEYWORD}, {"real", REAL, COL_NAME_KEYWORD}, {"reassign", REASSIGN, UNRESERVED_KEYWORD}, *************** *** 371,376 **** const ScanKeyword ScanKeywords[] = { --- 378,384 ---- {"temporary", TEMPORARY, UNRESERVED_KEYWORD}, {"text", TEXT_P, UNRESERVED_KEYWORD}, {"then", THEN, RESERVED_KEYWORD}, + {"ties", TIES, UNRESERVED_KEYWORD}, {"time", TIME, COL_NAME_KEYWORD}, {"timestamp", TIMESTAMP, COL_NAME_KEYWORD}, {"to", TO, RESERVED_KEYWORD}, *************** *** 383,388 **** const ScanKeyword ScanKeywords[] = { --- 391,397 ---- {"truncate", TRUNCATE, UNRESERVED_KEYWORD}, {"trusted", TRUSTED, UNRESERVED_KEYWORD}, {"type", TYPE_P, UNRESERVED_KEYWORD}, + {"unbounded", UNBOUNDED, UNRESERVED_KEYWORD}, {"uncommitted", UNCOMMITTED, UNRESERVED_KEYWORD}, {"unencrypted", UNENCRYPTED, UNRESERVED_KEYWORD}, {"union", UNION, RESERVED_KEYWORD}, *************** *** 408,413 **** const ScanKeyword ScanKeywords[] = { --- 417,423 ---- {"when", WHEN, RESERVED_KEYWORD}, {"where", WHERE, RESERVED_KEYWORD}, {"whitespace", WHITESPACE_P, UNRESERVED_KEYWORD}, + {"window", WINDOW, RESERVED_KEYWORD}, {"with", WITH, RESERVED_KEYWORD}, {"without", WITHOUT, UNRESERVED_KEYWORD}, {"work", WORK, UNRESERVED_KEYWORD}, *** a/src/backend/parser/parse_agg.c --- b/src/backend/parser/parse_agg.c *************** *** 1,7 **** /*------------------------------------------------------------------------- * * parse_agg.c ! * handle aggregates in parser * * Portions Copyright (c) 1996-2008, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California --- 1,7 ---- /*------------------------------------------------------------------------- * * parse_agg.c ! * handle aggregates and window functions in parser * * Portions Copyright (c) 1996-2008, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California *************** *** 73,78 **** transformAggregateCall(ParseState *pstate, Aggref *agg) --- 73,84 ---- errmsg("aggregate function calls cannot be nested"), parser_errposition(pstate, locate_agg_of_level((Node *) agg->args, 0)))); + if (checkExprHasWFuncs((Node *) agg->args)) + ereport(ERROR, + (errcode(ERRCODE_GROUPING_ERROR), + errmsg("aggregate function calls cannot take window functions as arguments"), + parser_errposition(pstate, + locate_agg_of_level((Node *) agg->args, 0)))); } if (min_varlevel < 0) *************** *** 85,90 **** transformAggregateCall(ParseState *pstate, Aggref *agg) --- 91,130 ---- pstate->p_hasAggs = true; } + /* + * transformWindowCall - + * + */ + void + transformWindowCall(ParseState *pstate, WFunc *wfunc) + { + int min_varlevel; + + /* + * The aggregate's level is the same as the level of the lowest-level + * variable or aggregate in its arguments; or if it contains no variables + * at all, we presume it to be local. + */ + min_varlevel = find_minimum_var_level((Node *) wfunc->args); + + /* + * An aggregate can't directly contain another aggregate call of the same + * level (though outer aggs are okay). We can skip this check if we + * didn't find any local vars or aggs. + */ + if (min_varlevel == 0) + { + if (checkExprHasWFuncs((Node *) wfunc->args)) + ereport(ERROR, + (errcode(ERRCODE_WINDOWING_ERROR), + errmsg("window function calls cannot be nested"))); + } + + if (min_varlevel < 0) + min_varlevel = 0; + wfunc->winlevelsup = min_varlevel; + pstate->p_hasWindow = true; + } /* * parseCheckAggregates *************** *** 231,236 **** parseCheckAggregates(ParseState *pstate, Query *qry) --- 271,311 ---- locate_agg_of_level((Node *) qry, 0)))); } + /* + * Window functions must not be in either WHERE, HAVING, or GRUP BY clauses. + */ + void + parseCheckWindow(ParseState *pstate, Query *qry) + { + ListCell *l; + + if (checkExprHasWFuncs(qry->jointree->quals)) + ereport(ERROR, + (errcode(ERRCODE_WINDOWING_ERROR), + errmsg("window functions not allowed in WHERE clause"))); + if (checkExprHasWFuncs((Node *) qry->jointree->fromlist)) + ereport(ERROR, + (errcode(ERRCODE_WINDOWING_ERROR), + errmsg("window functions not allowed in JOIN conditions"))); + if (checkExprHasWFuncs(qry->havingQual)) + ereport(ERROR, + (errcode(ERRCODE_WINDOWING_ERROR), + errmsg("window functions not allowed in HAVING clause"))); + + foreach(l, qry->groupClause) + { + SortGroupClause *grpcl = (SortGroupClause *) lfirst(l); + Node *expr; + + expr = get_sortgroupclause_expr(grpcl, qry->targetList); + if (expr == NULL) + continue; /* probably cannot happen */ + if (checkExprHasWFuncs(expr)) + ereport(ERROR, + (errcode(ERRCODE_WINDOWING_ERROR), + errmsg("window functions not allowed in GROUP BY clause"))); + } + } /* * check_ungrouped_columns - *** a/src/backend/parser/parse_clause.c --- b/src/backend/parser/parse_clause.c *************** *** 40,47 **** #define ORDER_CLAUSE 0 #define GROUP_CLAUSE 1 #define DISTINCT_ON_CLAUSE 2 ! static char *clauseText[] = {"ORDER BY", "GROUP BY", "DISTINCT ON"}; static void extractRemainingColumns(List *common_colnames, List *src_colnames, List *src_colvars, --- 40,49 ---- #define ORDER_CLAUSE 0 #define GROUP_CLAUSE 1 #define DISTINCT_ON_CLAUSE 2 + #define PARTITION_CLAUSE 3 + #define WIN_ORDER_CLAUSE 4 ! static char *clauseText[] = {"ORDER BY", "GROUP BY", "DISTINCT ON", "PARTITION", "ORDER BY"}; static void extractRemainingColumns(List *common_colnames, List *src_colnames, List *src_colvars, *************** *** 68,73 **** static Node *buildMergedJoinVar(ParseState *pstate, JoinType jointype, --- 70,77 ---- Var *l_colvar, Var *r_colvar); static TargetEntry *findTargetlistEntry(ParseState *pstate, Node *node, List **tlist, int clause); + static WindowClause *findWindowClause(List *wclist, const char *name, + List *partitionClause, List *orderClause); static int get_matching_location(int sortgroupref, List *sortgrouprefs, List *exprs); static List *addTargetToSortList(ParseState *pstate, TargetEntry *tle, *************** *** 565,570 **** transformRangeFunction(ParseState *pstate, RangeFunction *r) --- 569,582 ---- locate_agg_of_level(funcexpr, 0)))); } + if (pstate->p_hasWindow) + { + if (checkExprHasWFuncs(funcexpr)) + ereport(ERROR, + (errcode(ERRCODE_WINDOWING_ERROR), + errmsg("cannot use window function in function expression in FROM"))); + } + /* * OK, build an RTE for the function. */ *************** *** 1301,1306 **** findTargetlistEntry(ParseState *pstate, Node *node, List **tlist, int clause) --- 1313,1343 ---- clauseText[clause]), parser_errposition(pstate, location))); + /* + * In ORDER BY clause of a window, simple integer means + * integer data, not target pos. + */ + if (clause == WIN_ORDER_CLAUSE) + { + TargetEntry *tle; + Const *cons; + AttrNumber resno; + + resno = list_length(*tlist) + 1; + cons = makeConst(INT4OID, + -1, + sizeof(int32), + Int32GetDatum(intVal(val)), + false, + true); + tle = makeTargetEntry((Expr *) cons, + resno, + NULL, + true); + *tlist = lappend(*tlist, tle); + return tle; + } + target_pos = intVal(val); foreach(tl, *tlist) { *************** *** 1452,1457 **** transformSortClause(ParseState *pstate, --- 1489,1660 ---- } /* + * transformOrderClause - + * + * OrderClause is a set of SortBys. Only tag type is different. + */ + List * + transformOrderClause(ParseState *pstate, + List *orderlist, + List **targetlist, + bool resolveUnknown) + { + List *result = NIL; + ListCell *l; + TargetEntry *tle; + + foreach(l, orderlist) + { + SortBy *sortby = lfirst(l); + + tle = findTargetlistEntry(pstate, sortby->node, + targetlist, WIN_ORDER_CLAUSE); + result = addTargetToSortList(pstate, tle, + result, *targetlist, + sortby, resolveUnknown); + } + + return result; + } + + /* + * transformPartitionClause - + * + * Almost everything PartitionClause has is the same as GroupClause. + */ + List * + transformPartitionClause(ParseState *pstate, + List *partitionlist, + List **targetlist) + { + List *result = NIL; + ListCell *l; + TargetEntry *tle; + + foreach(l, partitionlist) + { + Oid restype; + Oid sortop; + Oid eqop; + PartitionClause *pc; + + tle = findTargetlistEntry(pstate, lfirst(l), targetlist, PARTITION_CLAUSE); + + restype = exprType((Node *) tle->expr); + + if (restype == UNKNOWNOID) + tle->expr = (Expr *) coerce_type(pstate, (Node *) tle->expr, + restype, TEXTOID, -1, + COERCION_IMPLICIT, + COERCE_IMPLICIT_CAST, + -1); + pc = makeNode(PartitionClause); + pc->tleSortGroupRef = assignSortGroupRef(tle, *targetlist); + get_sort_group_operators(restype, + false, true, false, + &sortop, &eqop, NULL); + pc->eqop = eqop; + pc->sortop = sortop; + pc->nulls_first = false; + result = lappend(result, pc); + } + + return result; + } + + /* + * transformWinDef - + * + * PARTITION BY and ORDER BY clauses are transformed and WFuncs are tied with + * WindowClause here. During this process, WindowClause that has not window function + * (this case would happen when using WINDOW clause) is removed. + */ + List * + transformWinDef(ParseState *pstate, + List *windefinition, + List **targetlist) + { + List *result = NIL, *window_clauses = NIL; + ListCell *l; + Index winref = 1; + + foreach(l, windefinition) + { + WinDef *windef = (WinDef *) lfirst(l); + List *partitionClause = NIL; + List *orderClause = NIL; + WindowClause *wc; + + partitionClause = transformPartitionClause(pstate, + windef->partitionClause, + targetlist); + orderClause = transformOrderClause(pstate, + windef->orderClause, + targetlist, + true); + + /* + * If there is the same node that has been in the list, + * refer to it. + */ + wc = findWindowClause(window_clauses, windef->name, partitionClause, orderClause); + if (!wc) + { + if (windef->name && windef->wfunc) + { + /* + * Though OVER clause uses window name, there's + * no definition in WINDOW clause. + */ + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("window name(%s) is not found in WINDOW clause", + windef->name))); + } + + wc = makeNode(WindowClause); + wc->partitionClause = partitionClause; + wc->orderClause = orderClause; + wc->name = windef->name; + wc->has_wfunc = false; + wc->winref = winref++; + + window_clauses = lappend(window_clauses, wc); + } + + /* + * Tie the function with the appropriate window + */ + if (windef->wfunc) + { + windef->wfunc->winref = wc->winref; + wc->has_wfunc = true; + /* + * we now recognize Window operation will be needed. + */ + pstate->p_hasWindow = true; + } + } + + /* + * filter those which isn't tied with any functions + */ + foreach(l, window_clauses) + { + WindowClause *wc = (WindowClause *) lfirst(l); + + if (wc->has_wfunc) + result = lappend(result, wc); + else + pfree(wc); + } + + list_free(window_clauses); + + return result; + } + + /* * transformDistinctClause - * transform a DISTINCT clause * *************** *** 1919,1921 **** targetIsInSortList(TargetEntry *tle, Oid sortop, List *sortList) --- 2122,2152 ---- } return false; } + + /* + * find_window_clause - + * + * search for the WindowClause which has already been in the list. + * It is done by name or by the set of partition and order. + */ + static WindowClause * + findWindowClause(List *wclist, const char *name, List *partitionClause, List *orderClause) + { + ListCell *l; + + foreach(l, wclist) + { + WindowClause *wc = (WindowClause *) lfirst(l); + if (wc->name && name) + { + if (strcmp(wc->name, name) == 0) + return wc; + } + + if (equal(wc->partitionClause, partitionClause) && + equal(wc->orderClause, orderClause)) + return wc; + } + + return NULL; + } *** a/src/backend/parser/parse_expr.c --- b/src/backend/parser/parse_expr.c *************** *** 361,367 **** transformIndirection(ParseState *pstate, Node *basenode, List *indirection) list_make1(n), list_make1(result), false, false, false, ! true, -1); } } /* process trailing subscripts, if any */ --- 361,367 ---- list_make1(n), list_make1(result), false, false, false, ! true, NULL, -1); } } /* process trailing subscripts, if any */ *************** *** 505,511 **** transformColumnRef(ParseState *pstate, ColumnRef *cref) list_make1(makeString(name2)), list_make1(node), false, false, false, ! true, cref->location); } break; } --- 505,511 ---- list_make1(makeString(name2)), list_make1(node), false, false, false, ! true, NULL, cref->location); } break; } *************** *** 546,552 **** transformColumnRef(ParseState *pstate, ColumnRef *cref) list_make1(makeString(name3)), list_make1(node), false, false, false, ! true, cref->location); } break; } --- 546,552 ---- list_make1(makeString(name3)), list_make1(node), false, false, false, ! true, NULL, cref->location); } break; } *************** *** 601,607 **** transformColumnRef(ParseState *pstate, ColumnRef *cref) list_make1(makeString(name4)), list_make1(node), false, false, false, ! true, cref->location); } break; } --- 601,607 ---- list_make1(makeString(name4)), list_make1(node), false, false, false, ! true, NULL, cref->location); } break; } *************** *** 1109,1114 **** transformFuncCall(ParseState *pstate, FuncCall *fn) --- 1109,1115 ---- fn->agg_distinct, fn->func_variadic, false, + fn->win_definition, fn->location); } *** a/src/backend/parser/parse_func.c --- b/src/backend/parser/parse_func.c *************** *** 15,20 **** --- 15,21 ---- #include "postgres.h" #include "access/heapam.h" + #include "catalog/pg_aggregate.h" #include "catalog/pg_inherits.h" #include "catalog/pg_proc.h" #include "catalog/pg_type.h" *************** *** 63,69 **** static void unknown_attribute(ParseState *pstate, Node *relref, char *attname, Node * ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs, bool agg_star, bool agg_distinct, bool func_variadic, ! bool is_column, int location) { Oid rettype; Oid funcid; --- 64,70 ---- Node * ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs, bool agg_star, bool agg_distinct, bool func_variadic, ! bool is_column, WinDef *windef, int location) { Oid rettype; Oid funcid; *************** *** 76,81 **** ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs, --- 77,84 ---- Node *retval; bool retset; int nvargs; + bool isagg; + bool iswfunc; FuncDetailCode fdresult; /* *************** *** 164,169 **** ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs, --- 167,173 ---- fdresult = func_get_detail(funcname, fargs, nargs, actual_arg_types, !func_variadic, &funcid, &rettype, &retset, &nvargs, + &isagg, &iswfunc, &declared_arg_types); if (fdresult == FUNCDETAIL_COERCION) { *************** *** 195,201 **** ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs, NameListToString(funcname)), parser_errposition(pstate, location))); } ! else if (fdresult != FUNCDETAIL_AGGREGATE) { /* * Oops. Time to die. --- 199,205 ---- NameListToString(funcname)), parser_errposition(pstate, location))); } ! else if (fdresult != FUNCDETAIL_AGG_OR_WFUNC) { /* * Oops. Time to die. *************** *** 291,330 **** ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs, funcexpr->location = location; retval = (Node *) funcexpr; } else { ! /* aggregate function */ ! Aggref *aggref = makeNode(Aggref); ! aggref->aggfnoid = funcid; ! aggref->aggtype = rettype; ! aggref->args = fargs; ! aggref->aggstar = agg_star; ! aggref->aggdistinct = agg_distinct; ! aggref->location = location; ! /* ! * Reject attempt to call a parameterless aggregate without (*) ! * syntax. This is mere pedantry but some folks insisted ... ! */ ! if (fargs == NIL && !agg_star) ! ereport(ERROR, ! (errcode(ERRCODE_WRONG_OBJECT_TYPE), ! errmsg("%s(*) must be used to call a parameterless aggregate function", ! NameListToString(funcname)), ! parser_errposition(pstate, location))); ! /* parse_agg.c does additional aggregate-specific processing */ ! transformAggregateCall(pstate, aggref); ! retval = (Node *) aggref; ! if (retset) ! ereport(ERROR, ! (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION), ! errmsg("aggregates cannot return sets"), ! parser_errposition(pstate, location))); } return retval; --- 295,404 ---- funcexpr->location = location; retval = (Node *) funcexpr; + + if (windef) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("%s is not a window function nor an aggregate function", + NameListToString(funcname)), + parser_errposition(pstate, location))); } else { ! Assert(fdresult == FUNCDETAIL_AGG_OR_WFUNC); ! if (isagg && !windef) ! { ! /* aggregate function */ ! Aggref *aggref = makeNode(Aggref); ! aggref->aggfnoid = funcid; ! aggref->aggtype = rettype; ! aggref->args = fargs; ! aggref->aggstar = agg_star; ! aggref->aggdistinct = agg_distinct; ! aggref->location = location; ! /* ! * Reject attempt to call a parameterless aggregate without (*) ! * syntax. This is mere pedantry but some folks insisted ... ! */ ! if (fargs == NIL && !agg_star) ! ereport(ERROR, ! (errcode(ERRCODE_WRONG_OBJECT_TYPE), ! errmsg("%s(*) must be used to call a parameterless aggregate function", ! NameListToString(funcname)), ! parser_errposition(pstate, location))); ! ! /* parse_agg.c does additional aggregate-specific processing */ ! transformAggregateCall(pstate, aggref); ! ! retval = (Node *) aggref; ! ! if (retset) ! ereport(ERROR, ! (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION), ! errmsg("aggregates cannot return sets"), ! parser_errposition(pstate, location))); ! } ! else ! { ! /* window function */ ! WFunc *wfunc = makeNode(WFunc); ! ! /* ! * Window functions must be called with window definition. ! */ ! if (!windef) ! ereport(ERROR, ! (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION), ! errmsg("window function call need window definition"), ! parser_errposition(pstate, location))); ! /* ! * normal aggregate can be called as window function by ! * the special aggregate wrapper function. By using wrappepr, ! * the performance is dropped but result is same. ! */ ! if (isagg && !iswfunc) ! wfunc->pure_agg = true; ! wfunc->winfnoid = funcid; ! wfunc->wintype = rettype; ! wfunc->args = fargs; ! wfunc->location = location; ! /* ! * agg_star is acceptable for aggregate functions ! * but distinct sould be rejected in parser though checked here again. ! */ ! wfunc->winstar = agg_star; ! if (agg_distinct) ! ereport(ERROR, ! (errcode(ERRCODE_SYNTAX_ERROR), ! errmsg("windowing functions cannot accept DISTINCT argument."))); ! /* ! * Reject attempt to call a parameterless aggregate without (*) ! * syntax. This is mere pedantry but some folks insisted ... ! */ ! if (wfunc->pure_agg && fargs == NIL && !agg_star) ! ereport(ERROR, ! (errcode(ERRCODE_WRONG_OBJECT_TYPE), ! errmsg("%s(*) must be used to call a parameterless aggregate function", ! NameListToString(funcname)), ! parser_errposition(pstate, location))); ! transformWindowCall(pstate, wfunc); ! ! retval = (Node *) wfunc; ! ! pstate->p_windef_list = lappend(pstate->p_windef_list, windef); ! windef->wfunc = wfunc; ! ! if (retset) ! ereport(ERROR, ! (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION), ! errmsg("window functions cannot return sets"), ! parser_errposition(pstate, location))); ! } } return retval; *************** *** 729,734 **** func_get_detail(List *funcname, --- 803,810 ---- Oid *rettype, /* return value */ bool *retset, /* return value */ int *nvargs, /* return value */ + bool *isagg, /* return value */ + bool *iswfunc, /* return value */ Oid **true_typeids) /* return value */ { FuncCandidateList raw_candidates; *************** *** 880,886 **** func_get_detail(List *funcname, pform = (Form_pg_proc) GETSTRUCT(ftup); *rettype = pform->prorettype; *retset = pform->proretset; ! result = pform->proisagg ? FUNCDETAIL_AGGREGATE : FUNCDETAIL_NORMAL; ReleaseSysCache(ftup); return result; } --- 956,972 ---- pform = (Form_pg_proc) GETSTRUCT(ftup); *rettype = pform->prorettype; *retset = pform->proretset; ! *isagg = pform->proisagg; ! *iswfunc = pform->proiswfunc; ! /* ! * agg and wfunc are similar to each other ! * and overlaps as well. So we return agg_or_wfunc then ! * delegate to caller the detail process depending on which. ! */ ! if (pform->proisagg || pform->proiswfunc) ! result = FUNCDETAIL_AGG_OR_WFUNC; ! else ! result = FUNCDETAIL_NORMAL; ReleaseSysCache(ftup); return result; } *************** *** 1374,1376 **** LookupAggNameTypeNames(List *aggname, List *argtypes, bool noError) --- 1460,1465 ---- return oid; } + + + *** a/src/backend/rewrite/rewriteManip.c --- b/src/backend/rewrite/rewriteManip.c *************** *** 36,41 **** typedef struct --- 36,43 ---- static bool contain_aggs_of_level_walker(Node *node, contain_aggs_of_level_context *context); + static bool checkExprHasWFuncs_walker(Node *node, + contain_aggs_of_level_context *context); static bool locate_agg_of_level_walker(Node *node, locate_agg_of_level_context *context); static bool checkExprHasSubLink_walker(Node *node, void *context); *************** *** 110,115 **** contain_aggs_of_level_walker(Node *node, --- 112,161 ---- (void *) context); } + bool + checkExprHasWFuncs(Node *node) + { + contain_aggs_of_level_context context; + + context.sublevels_up = 0; + + /* + * Must be prepared to start with a Query or a bare expression tree; if + * it's a Query, we don't want to increment sublevels_up. + */ + return query_or_expression_tree_walker(node, + checkExprHasWFuncs_walker, + (void *) &context, + 0); + } + + static bool + checkExprHasWFuncs_walker(Node *node, contain_aggs_of_level_context *context) + { + if (node == NULL) + return false; + if (IsA(node, WFunc)) + { + if (((WFunc *) node)->winlevelsup == context->sublevels_up) + return true; /* abort the tree traversal and return true */ + /* else fall through to examine argument */ + } + if (IsA(node, Query)) + { + /* Recurse into subselects */ + bool result; + + context->sublevels_up++; + result = query_tree_walker((Query *) node, + checkExprHasWFuncs_walker, + (void *) context, 0); + context->sublevels_up--; + return result; + } + return expression_tree_walker(node, checkExprHasWFuncs_walker, + (void *) context); + } + /* * locate_agg_of_level - * Find the parse location of any aggregate of the specified query level. *** a/src/backend/utils/adt/Makefile --- b/src/backend/utils/adt/Makefile *************** *** 29,35 **** OBJS = acl.o arrayfuncs.o array_userfuncs.o arrayutils.o bool.o \ tsginidx.o tsgistidx.o tsquery.o tsquery_cleanup.o tsquery_gist.o \ tsquery_op.o tsquery_rewrite.o tsquery_util.o tsrank.o \ tsvector.o tsvector_op.o tsvector_parser.o \ ! txid.o uuid.o xml.o like.o: like.c like_match.c --- 29,35 ---- tsginidx.o tsgistidx.o tsquery.o tsquery_cleanup.o tsquery_gist.o \ tsquery_op.o tsquery_rewrite.o tsquery_util.o tsrank.o \ tsvector.o tsvector_op.o tsvector_parser.o \ ! txid.o uuid.o wfunc.o xml.o like.o: like.c like_match.c *** a/src/backend/utils/adt/ruleutils.c --- b/src/backend/utils/adt/ruleutils.c *************** *** 183,188 **** static void get_oper_expr(OpExpr *expr, deparse_context *context); --- 183,189 ---- static void get_func_expr(FuncExpr *expr, deparse_context *context, bool showimplicit); static void get_agg_expr(Aggref *aggref, deparse_context *context); + static void get_wfunc_expr(WFunc *wfunc, deparse_context *context); static void get_coercion_expr(Node *arg, deparse_context *context, Oid resulttype, int32 resulttypmod, Node *parentNode); *************** *** 4014,4019 **** get_rule_expr(Node *node, deparse_context *context, --- 4015,4024 ---- get_agg_expr((Aggref *) node, context); break; + case T_WFunc: + get_wfunc_expr((WFunc *) node, context); + break; + case T_ArrayRef: { ArrayRef *aref = (ArrayRef *) node; *************** *** 4972,4977 **** get_agg_expr(Aggref *aggref, deparse_context *context) --- 4977,5015 ---- appendStringInfoChar(buf, ')'); } + /* + * get_wfunc_expr - Parse back an WFunc node + */ + static void + get_wfunc_expr(WFunc *wfunc, deparse_context *context) + { + StringInfo buf = context->buf; + Oid argtypes[FUNC_MAX_ARGS]; + int nargs; + ListCell *l; + + nargs = 0; + foreach(l, wfunc->args) + { + if (nargs >= FUNC_MAX_ARGS) + ereport(ERROR, + (errcode(ERRCODE_TOO_MANY_ARGUMENTS), + errmsg("too many arguments"))); + argtypes[nargs] = exprType((Node *) lfirst(l)); + nargs++; + } + + appendStringInfo(buf, "%s(%s", + generate_function_name(wfunc->winfnoid, + nargs, argtypes, NULL), ""); + /* winstar can be set only in zero-argument aggregates */ + if (wfunc->winstar) + appendStringInfoChar(buf, '*'); + else + get_rule_expr((Node *) wfunc->args, context, true); + appendStringInfoChar(buf, ')'); + } + /* ---------- * get_coercion_expr * *************** *** 5983,5988 **** generate_function_name(Oid funcid, int nargs, Oid *argtypes, --- 6021,6028 ---- Oid p_rettype; bool p_retset; int p_nvargs; + bool p_isagg; + bool p_iswfunc; Oid *p_true_typeids; proctup = SearchSysCache(PROCOID, *************** *** 6002,6009 **** generate_function_name(Oid funcid, int nargs, Oid *argtypes, p_result = func_get_detail(list_make1(makeString(proname)), NIL, nargs, argtypes, false, &p_funcid, &p_rettype, ! &p_retset, &p_nvargs, &p_true_typeids); ! if ((p_result == FUNCDETAIL_NORMAL || p_result == FUNCDETAIL_AGGREGATE) && p_funcid == funcid) nspname = NULL; else --- 6042,6050 ---- p_result = func_get_detail(list_make1(makeString(proname)), NIL, nargs, argtypes, false, &p_funcid, &p_rettype, ! &p_retset, &p_nvargs, ! &p_isagg, &p_iswfunc, &p_true_typeids); ! if ((p_result == FUNCDETAIL_NORMAL || p_result == FUNCDETAIL_AGG_OR_WFUNC) && p_funcid == funcid) nspname = NULL; else *** /dev/null --- b/src/backend/utils/adt/wfunc.c *************** *** 0 **** --- 1,492 ---- + /*------------------------------------------------------------------------- + * + * wfunc.c + * Builtin window functions defined in SQL spec. + * + * Portions Copyright (c) 2000-2008, PostgreSQL Global Development Group + * + * + * IDENTIFICATION + * $PostgreSQL$ + * + *------------------------------------------------------------------------- + */ + + #include "postgres.h" + #include "executor/executor.h" + #include "executor/nodeWindow.h" + #include "utils/builtins.h" + + static bool rank_up(FunctionCallInfo fcinfo); + static Datum leadlag_common(FunctionCallInfo fcinfo, bool forward, bool fetchdefault); + + /* + * SQL spec window functions + */ + Datum row_number(PG_FUNCTION_ARGS); + Datum rank(PG_FUNCTION_ARGS); + Datum dense_rank(PG_FUNCTION_ARGS); + Datum percent_rank(PG_FUNCTION_ARGS); + Datum cume_dist(PG_FUNCTION_ARGS); + Datum ntile(PG_FUNCTION_ARGS); + Datum lag(PG_FUNCTION_ARGS); + Datum lag_withdefault(PG_FUNCTION_ARGS); + Datum lead(PG_FUNCTION_ARGS); + Datum lead_withdefault(PG_FUNCTION_ARGS); + Datum first_value(PG_FUNCTION_ARGS); + Datum last_value(PG_FUNCTION_ARGS); + Datum nth_value(PG_FUNCTION_ARGS); + + + /* + * ranking process information + */ + typedef struct rank_context + { + int64 rank; /* current rank */ + HeapTuple htup; /* row data one before the current */ + } rank_context; + + /* + * ntile process information + */ + typedef struct + { + int32 ntile; /* current result */ + int64 rows_per_bucket; /* row number of current bucket */ + int64 boundary; /* how many rows should be in the bucket */ + int64 remainder; /* (total rows) % (bucket num) */ + } ntile_context; + + + /* + * allocate new user space for the bran new function call. + */ + #define allocate_if_new(fcinfo, size) do{ \ + if ((fcinfo)->flinfo->fn_extra == NULL) \ + { \ + MemoryContext __oldContext; \ + __oldContext = MemoryContextSwitchTo((fcinfo)->flinfo->fn_mcxt); \ + (fcinfo)->flinfo->fn_extra = palloc0(size); \ + MemoryContextSwitchTo(__oldContext); \ + } \ + }while(0) + + /* + * utility routine for *_rank functions. + */ + static bool + rank_up(FunctionCallInfo fcinfo) + { + WindowObject winobj = PG_WINDOW_OBJECT(); + WindowState *winstate; + Window *node; + TupleTableSlot *scanslot; + TupleTableSlot *winslot; + rank_context *context; + bool up = false; /* should rank up? */ + bool res; + MemoryContext oldContext; + + allocate_if_new(fcinfo, sizeof(rank_context)); + + winstate = (WindowState *) fcinfo->context; + node = (Window *) winstate->ss.ps.plan; + context = (rank_context *) fcinfo->flinfo->fn_extra; + winslot = winstate->tmpcontext->ecxt_outertuple; + + if (context->rank == 0) + { + /* first call */ + context->rank = 1; + res = WinRowGetTuple(winobj, winslot); + Assert(res); + + oldContext = MemoryContextSwitchTo(fcinfo->flinfo->fn_mcxt); + context->htup = ExecCopySlotTuple(winslot); + MemoryContextSwitchTo(oldContext); + + if (!node->ordNumCols) + elog(ERROR, "this function requires ORDER BY clause in the window"); + } + else + { + /* we need two tuple slot to compare */ + scanslot = winstate->ss.ss_ScanTupleSlot; + + res = WinRowGetTuple(winobj, scanslot); + Assert(res); + ExecStoreTuple(context->htup, winslot, InvalidBuffer, false); + + /* tuples matching by ORDER BY clause */ + if (!WinRowIsPeer(winobj, scanslot, winslot)) + { + heap_freetuple(context->htup); + oldContext = MemoryContextSwitchTo(fcinfo->flinfo->fn_mcxt); + context->htup = ExecCopySlotTuple(scanslot); + MemoryContextSwitchTo(oldContext); + up = true; + } + } + + return up; + } + + + /* + * row_number + * just increment up from 1 until current partition finishes. + */ + Datum + row_number(PG_FUNCTION_ARGS) + { + WindowObject winobj = PG_WINDOW_OBJECT(); + + PG_RETURN_INT64(WinRowCurrentPos(winobj) + 1); + } + + + /* + * rank + * increment up if key tuple changes. The new rank number is as the current row number. + */ + Datum + rank(PG_FUNCTION_ARGS) + { + WindowObject winobj = PG_WINDOW_OBJECT(); + rank_context *context; + bool up; + + up = rank_up(fcinfo); + context = (rank_context *) fcinfo->flinfo->fn_extra; + if (up) + { + context->rank = WinRowCurrentPos(winobj) + 1; + } + + PG_RETURN_INT64(context->rank); + } + + /* + * dense_rank + * increment up if key tuple changes. The new rank number is as added up 1. + */ + Datum + dense_rank(PG_FUNCTION_ARGS) + { + rank_context *context; + bool up; + + up = rank_up(fcinfo); + context = (rank_context *) fcinfo->flinfo->fn_extra; + if (up) + { + context->rank += 1; + } + + PG_RETURN_INT64(context->rank); + } + + /* + * percent_rank + * return fraction between 0 and 1 inclusive, which + * is described as (RK - 1) / (NR - 1), where RK is the rank and NR is + * the number of total row, per spec. + */ + Datum + percent_rank(PG_FUNCTION_ARGS) + { + WindowObject winobj = PG_WINDOW_OBJECT(); + rank_context *context; + bool up; + int64 totalrow; + + up = rank_up(fcinfo); + context = (rank_context *) fcinfo->flinfo->fn_extra; + if (up) + { + context->rank = WinRowCurrentPos(winobj) + 1; + } + + totalrow = WinPartGetRowNum(winobj); + Assert(totalrow > 0); + + /* result is as the first row, per spec */ + if (totalrow == 1) + PG_RETURN_FLOAT8(0.0); + + PG_RETURN_FLOAT8((float8) (context->rank - 1) / (float8) (totalrow - 1)); + } + + /* + * cume_dist + * return fraction betweeen 0 and 1 inclusive, which + * is described as NP / NR, where NP is the number of row preceeding or peers to + * the current row, and NR is the number of total row of the partition, per spec. + */ + Datum + cume_dist(PG_FUNCTION_ARGS) + { + WindowObject winobj = PG_WINDOW_OBJECT(); + rank_context *context; + bool up; + int64 totalrow; + + up = rank_up(fcinfo); + context = (rank_context *) fcinfo->flinfo->fn_extra; + if (up || context->rank == 1) + { + /* + * The saved row is not peer to the current row or just the first, + * so count up the number of rows that are peer to the current. + */ + WindowState *winstate = (WindowState *) fcinfo->context; + WindowIter iter; + TupleTableSlot *scanslot; + TupleTableSlot *winslot; + + context->rank = WinRowCurrentPos(winobj) + 1; + scanslot = winstate->ss.ss_ScanTupleSlot; + winslot = winstate->tmpcontext->ecxt_outertuple; + ExecStoreTuple(context->htup, scanslot, InvalidBuffer, false); + + /* + * start from current + 1 + */ + iter = WinPartStartIter(winobj, context->rank); + while(WinIterNext(iter)) + { + if (!WinIterGetTuple(iter, winslot)) + elog(ERROR, "failed fetch row"); + + if (!WinRowIsPeer(winobj, scanslot, winslot)) + break; + context->rank++; + } + WinIterFinish(iter); + } + + totalrow = WinPartGetRowNum(winobj); + Assert(totalrow > 0); + + PG_RETURN_FLOAT8((float8) (context->rank) / (float8) totalrow); + } + + /* + * ntile + * compute an exact numeric value with scale0 (zero), + * ranging from 1 (one) to n, per spec. + */ + Datum + ntile(PG_FUNCTION_ARGS) + { + WindowObject winobj = PG_WINDOW_OBJECT(); + ntile_context *context; + + allocate_if_new(fcinfo, sizeof(ntile_context)); + + context = (ntile_context *) fcinfo->flinfo->fn_extra; + + if (context->ntile == 0) + { + /* first call */ + int64 total; + int32 nbuckets; + bool isnull; + + total = WinPartGetRowNum(winobj); + nbuckets = WinRowGetArg(winobj, PG_WINDOW_ARG(0), &isnull); + + /* window functions cannot be strict, so raise error */ + if (isnull) + elog(ERROR, "null bucket number"); + + context->ntile = 1; + context->rows_per_bucket = 0; + context->boundary = total / nbuckets; + if (context->boundary <= 0) + context->boundary = 1; + else + { + /* + * If the total number is not divisible, add 1 row to + * leading buckets. + */ + context->remainder = total % nbuckets; + if (context->remainder != 0) + context->boundary++; + } + } + + context->rows_per_bucket++; + if (context->boundary < context->rows_per_bucket) + { + /* ntile up */ + if (context->remainder != 0 && context->ntile == context->remainder) + { + context->remainder = 0; + context->boundary -= 1; + } + context->ntile += 1; + context->rows_per_bucket = 1; + } + + PG_RETURN_INT32(context->ntile); + } + + /* + * leadlag_common + * common operation of lead() and lag() + * for lead() forward argument is set to true, whereas lag() is to false, + * and if fetchdefault is true, the third argument is taken if the target + * datum is out of range. + */ + static Datum + leadlag_common(FunctionCallInfo fcinfo, bool forward, bool fetchdefault) + { + WindowObject winobj = PG_WINDOW_OBJECT(); + int4 offset; + Datum result; + bool isnull; + bool isout; + + offset = WinRowGetArg(winobj, PG_WINDOW_ARG(1), &isnull); + if (isnull) + PG_RETURN_NULL(); + + result = WinPartGetArg(winobj, PG_WINDOW_ARG(0), + (forward ? offset : -offset), + WINDOW_SEEK_CURRENT, &isnull, &isout); + + if (isout) + { + /* + * target row is out of the partition, so going to fetch + * default value if demanded. + */ + if (fetchdefault) + result = WinRowGetArg(winobj, PG_WINDOW_ARG(2), &isnull); + } + + if (isnull) + PG_RETURN_NULL(); + + PG_RETURN_DATUM(result); + } + + /* + * lag + * returns the value of VE evaluated on a row that is OFFSET + * number of rows before the current row within a partition, + * per spec. + */ + Datum + lag(PG_FUNCTION_ARGS) + { + return leadlag_common(fcinfo, false, false); + } + + /* + * lag_withdefault + * same as lag but accepts default value as its third argument + */ + Datum + lag_withdefault(PG_FUNCTION_ARGS) + { + return leadlag_common(fcinfo, false, true); + } + + /* + * lead + * returns the value of VE evaluated on a row that is OFFSET + * number of rows after the current row within a partition, + * per spec. + */ + Datum + lead(PG_FUNCTION_ARGS) + { + return leadlag_common(fcinfo, true, false); + } + + /* + * lead_withdefault + * same as lead but accepts default value as its third argument + */ + Datum + lead_withdefault(PG_FUNCTION_ARGS) + { + return leadlag_common(fcinfo, true, true); + } + + /* + * first_value + * return the value of VE evaluated on the first row of the + * window frame, per spec. + */ + Datum + first_value(PG_FUNCTION_ARGS) + { + WindowObject winobj = PG_WINDOW_OBJECT(); + Datum result; + bool isnull; + bool isout; + + result = WinFrameGetArg(winobj, PG_WINDOW_ARG(0), + 0, WINDOW_SEEK_HEAD, &isnull, &isout); + + if (isnull) + PG_RETURN_NULL(); + + PG_RETURN_DATUM(result); + } + + /* + * last_value + * return the value of VE evaluated on the last row of the + * window frame, per spec. + */ + Datum + last_value(PG_FUNCTION_ARGS) + { + WindowObject winobj = PG_WINDOW_OBJECT(); + Datum result; + bool isnull; + bool isout; + + result = WinFrameGetArg(winobj, PG_WINDOW_ARG(0), + 0, WINDOW_SEEK_TAIL, &isnull, &isout); + + if (isnull) + PG_RETURN_NULL(); + + PG_RETURN_DATUM(result); + } + + /* + * nth_value + * return the value of VE evaluated on the n-th row from the first + * row of the window frame, per spec. + * The argument "nth" is an exact numeric value n with scale 0 (zero). + */ + Datum + nth_value(PG_FUNCTION_ARGS) + { + WindowObject winobj = PG_WINDOW_OBJECT(); + Datum result; + bool isnull; + bool isout; + int4 nth; + + nth = WinRowGetArg(winobj, PG_WINDOW_ARG(1), &isnull); + + if (isnull) + elog(ERROR, "nth value is null"); + + result = WinFrameGetArg(winobj, PG_WINDOW_ARG(0), + nth, WINDOW_SEEK_HEAD, &isnull, &isout); + + if (isnull) + PG_RETURN_NULL(); + + PG_RETURN_DATUM(result); + } *** a/src/backend/utils/sort/tuplestore.c --- b/src/backend/utils/sort/tuplestore.c *************** *** 1052,1057 **** tuplestore_copy_read_pointer(Tuplestorestate *state, --- 1052,1087 ---- } /* + * copy write pointer to the specified read pointer + */ + void + tuplestore_write_to_read_pointer(Tuplestorestate *state, int readptr) + { + TSReadPointer *ptr = &state->readptrs[readptr]; + + Assert(readptr >= 0 && readptr < state->readptrcount); + + switch(state->status) + { + case TSS_INMEM: + ptr->current = state->memtupcount; + break; + case TSS_WRITEFILE: + BufFileTell(state->myfile, + &ptr->file, + &ptr->offset); + break; + case TSS_READFILE: + ptr->file = state->writepos_file; + ptr->offset = state->writepos_offset; + break; + default: + elog(ERROR, "invalid tuplestore state"); + break; + } + } + + /* * tuplestore_trim - remove all no-longer-needed tuples */ static void *** a/src/include/catalog/pg_attribute.h --- b/src/include/catalog/pg_attribute.h *************** *** 295,314 **** DATA(insert ( 1247 tableoid 26 0 4 -7 0 -1 -1 t p i t f f t 0)); { 1255, {"prorows"}, 700, -1, 4, 6, 0, -1, -1, FLOAT4PASSBYVAL, 'p', 'i', true, false, false, true, 0 }, \ { 1255, {"provariadic"}, 26, -1, 4, 7, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \ { 1255, {"proisagg"}, 16, -1, 1, 8, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \ ! { 1255, {"prosecdef"}, 16, -1, 1, 9, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \ ! { 1255, {"proisstrict"}, 16, -1, 1, 10, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \ ! { 1255, {"proretset"}, 16, -1, 1, 11, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \ ! { 1255, {"provolatile"}, 18, -1, 1, 12, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \ ! { 1255, {"pronargs"}, 21, -1, 2, 13, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \ ! { 1255, {"prorettype"}, 26, -1, 4, 14, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \ ! { 1255, {"proargtypes"}, 30, -1, -1, 15, 1, -1, -1, false, 'p', 'i', true, false, false, true, 0 }, \ ! { 1255, {"proallargtypes"}, 1028, -1, -1, 16, 1, -1, -1, false, 'x', 'i', false, false, false, true, 0 }, \ ! { 1255, {"proargmodes"}, 1002, -1, -1, 17, 1, -1, -1, false, 'x', 'i', false, false, false, true, 0 }, \ ! { 1255, {"proargnames"}, 1009, -1, -1, 18, 1, -1, -1, false, 'x', 'i', false, false, false, true, 0 }, \ ! { 1255, {"prosrc"}, 25, -1, -1, 19, 0, -1, -1, false, 'x', 'i', false, false, false, true, 0 }, \ ! { 1255, {"probin"}, 17, -1, -1, 20, 0, -1, -1, false, 'x', 'i', false, false, false, true, 0 }, \ ! { 1255, {"proconfig"}, 1009, -1, -1, 21, 1, -1, -1, false, 'x', 'i', false, false, false, true, 0 }, \ ! { 1255, {"proacl"}, 1034, -1, -1, 22, 1, -1, -1, false, 'x', 'i', false, false, false, true, 0 } DATA(insert ( 1255 proname 19 -1 NAMEDATALEN 1 0 -1 -1 f p c t f f t 0)); DATA(insert ( 1255 pronamespace 26 -1 4 2 0 -1 -1 t p i t f f t 0)); --- 295,315 ---- { 1255, {"prorows"}, 700, -1, 4, 6, 0, -1, -1, FLOAT4PASSBYVAL, 'p', 'i', true, false, false, true, 0 }, \ { 1255, {"provariadic"}, 26, -1, 4, 7, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \ { 1255, {"proisagg"}, 16, -1, 1, 8, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \ ! { 1255, {"proiswfunc"}, 16, -1, 1, 9, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \ ! { 1255, {"prosecdef"}, 16, -1, 1, 10, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \ ! { 1255, {"proisstrict"}, 16, -1, 1, 11, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \ ! { 1255, {"proretset"}, 16, -1, 1, 12, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \ ! { 1255, {"provolatile"}, 18, -1, 1, 13, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \ ! { 1255, {"pronargs"}, 21, -1, 2, 14, 0, -1, -1, true, 'p', 's', true, false, false, true, 0 }, \ ! { 1255, {"prorettype"}, 26, -1, 4, 15, 0, -1, -1, true, 'p', 'i', true, false, false, true, 0 }, \ ! { 1255, {"proargtypes"}, 30, -1, -1, 16, 1, -1, -1, false, 'p', 'i', true, false, false, true, 0 }, \ ! { 1255, {"proallargtypes"}, 1028, -1, -1, 17, 1, -1, -1, false, 'x', 'i', false, false, false, true, 0 }, \ ! { 1255, {"proargmodes"}, 1002, -1, -1, 18, 1, -1, -1, false, 'x', 'i', false, false, false, true, 0 }, \ ! { 1255, {"proargnames"}, 1009, -1, -1, 19, 1, -1, -1, false, 'x', 'i', false, false, false, true, 0 }, \ ! { 1255, {"prosrc"}, 25, -1, -1, 20, 0, -1, -1, false, 'x', 'i', false, false, false, true, 0 }, \ ! { 1255, {"probin"}, 17, -1, -1, 21, 0, -1, -1, false, 'x', 'i', false, false, false, true, 0 }, \ ! { 1255, {"proconfig"}, 1009, -1, -1, 22, 1, -1, -1, false, 'x', 'i', false, false, false, true, 0 }, \ ! { 1255, {"proacl"}, 1034, -1, -1, 23, 1, -1, -1, false, 'x', 'i', false, false, false, true, 0 } DATA(insert ( 1255 proname 19 -1 NAMEDATALEN 1 0 -1 -1 f p c t f f t 0)); DATA(insert ( 1255 pronamespace 26 -1 4 2 0 -1 -1 t p i t f f t 0)); *************** *** 318,337 **** DATA(insert ( 1255 procost 700 -1 4 5 0 -1 -1 FLOAT4PASSBYVAL p i t f f t DATA(insert ( 1255 prorows 700 -1 4 6 0 -1 -1 FLOAT4PASSBYVAL p i t f f t 0)); DATA(insert ( 1255 provariadic 26 -1 4 7 0 -1 -1 t p i t f f t 0)); DATA(insert ( 1255 proisagg 16 -1 1 8 0 -1 -1 t p c t f f t 0)); ! DATA(insert ( 1255 prosecdef 16 -1 1 9 0 -1 -1 t p c t f f t 0)); ! DATA(insert ( 1255 proisstrict 16 -1 1 10 0 -1 -1 t p c t f f t 0)); ! DATA(insert ( 1255 proretset 16 -1 1 11 0 -1 -1 t p c t f f t 0)); ! DATA(insert ( 1255 provolatile 18 -1 1 12 0 -1 -1 t p c t f f t 0)); ! DATA(insert ( 1255 pronargs 21 -1 2 13 0 -1 -1 t p s t f f t 0)); ! DATA(insert ( 1255 prorettype 26 -1 4 14 0 -1 -1 t p i t f f t 0)); ! DATA(insert ( 1255 proargtypes 30 -1 -1 15 1 -1 -1 f p i t f f t 0)); ! DATA(insert ( 1255 proallargtypes 1028 -1 -1 16 1 -1 -1 f x i f f f t 0)); ! DATA(insert ( 1255 proargmodes 1002 -1 -1 17 1 -1 -1 f x i f f f t 0)); ! DATA(insert ( 1255 proargnames 1009 -1 -1 18 1 -1 -1 f x i f f f t 0)); ! DATA(insert ( 1255 prosrc 25 -1 -1 19 0 -1 -1 f x i f f f t 0)); ! DATA(insert ( 1255 probin 17 -1 -1 20 0 -1 -1 f x i f f f t 0)); ! DATA(insert ( 1255 proconfig 1009 -1 -1 21 1 -1 -1 f x i f f f t 0)); ! DATA(insert ( 1255 proacl 1034 -1 -1 22 1 -1 -1 f x i f f f t 0)); DATA(insert ( 1255 ctid 27 0 6 -1 0 -1 -1 f p s t f f t 0)); DATA(insert ( 1255 oid 26 0 4 -2 0 -1 -1 t p i t f f t 0)); DATA(insert ( 1255 xmin 28 0 4 -3 0 -1 -1 t p i t f f t 0)); --- 319,339 ---- DATA(insert ( 1255 prorows 700 -1 4 6 0 -1 -1 FLOAT4PASSBYVAL p i t f f t 0)); DATA(insert ( 1255 provariadic 26 -1 4 7 0 -1 -1 t p i t f f t 0)); DATA(insert ( 1255 proisagg 16 -1 1 8 0 -1 -1 t p c t f f t 0)); ! DATA(insert ( 1255 proiswfunc 16 -1 1 9 0 -1 -1 t p c t f f t 0)); ! DATA(insert ( 1255 prosecdef 16 -1 1 10 0 -1 -1 t p c t f f t 0)); ! DATA(insert ( 1255 proisstrict 16 -1 1 11 0 -1 -1 t p c t f f t 0)); ! DATA(insert ( 1255 proretset 16 -1 1 12 0 -1 -1 t p c t f f t 0)); ! DATA(insert ( 1255 provolatile 18 -1 1 13 0 -1 -1 t p c t f f t 0)); ! DATA(insert ( 1255 pronargs 21 -1 2 14 0 -1 -1 t p s t f f t 0)); ! DATA(insert ( 1255 prorettype 26 -1 4 15 0 -1 -1 t p i t f f t 0)); ! DATA(insert ( 1255 proargtypes 30 -1 -1 16 1 -1 -1 f p i t f f t 0)); ! DATA(insert ( 1255 proallargtypes 1028 -1 -1 17 1 -1 -1 f x i f f f t 0)); ! DATA(insert ( 1255 proargmodes 1002 -1 -1 18 1 -1 -1 f x i f f f t 0)); ! DATA(insert ( 1255 proargnames 1009 -1 -1 19 1 -1 -1 f x i f f f t 0)); ! DATA(insert ( 1255 prosrc 25 -1 -1 20 0 -1 -1 f x i f f f t 0)); ! DATA(insert ( 1255 probin 17 -1 -1 21 0 -1 -1 f x i f f f t 0)); ! DATA(insert ( 1255 proconfig 1009 -1 -1 22 1 -1 -1 f x i f f f t 0)); ! DATA(insert ( 1255 proacl 1034 -1 -1 23 1 -1 -1 f x i f f f t 0)); DATA(insert ( 1255 ctid 27 0 6 -1 0 -1 -1 f p s t f f t 0)); DATA(insert ( 1255 oid 26 0 4 -2 0 -1 -1 t p i t f f t 0)); DATA(insert ( 1255 xmin 28 0 4 -3 0 -1 -1 t p i t f f t 0)); *** a/src/include/catalog/pg_class.h --- b/src/include/catalog/pg_class.h *************** *** 125,131 **** DATA(insert OID = 1247 ( pg_type PGNSP 71 PGUID 0 1247 0 0 0 0 0 f f r 28 0 t DESCR(""); DATA(insert OID = 1249 ( pg_attribute PGNSP 75 PGUID 0 1249 0 0 0 0 0 f f r 17 0 f f f f f 3 _null_ _null_ )); DESCR(""); ! DATA(insert OID = 1255 ( pg_proc PGNSP 81 PGUID 0 1255 0 0 0 0 0 f f r 22 0 t f f f f 3 _null_ _null_ )); DESCR(""); DATA(insert OID = 1259 ( pg_class PGNSP 83 PGUID 0 1259 0 0 0 0 0 f f r 24 0 t f f f f 3 _null_ _null_ )); DESCR(""); --- 125,131 ---- DESCR(""); DATA(insert OID = 1249 ( pg_attribute PGNSP 75 PGUID 0 1249 0 0 0 0 0 f f r 17 0 f f f f f 3 _null_ _null_ )); DESCR(""); ! DATA(insert OID = 1255 ( pg_proc PGNSP 81 PGUID 0 1255 0 0 0 0 0 f f r 23 0 t f f f f 3 _null_ _null_ )); DESCR(""); DATA(insert OID = 1259 ( pg_class PGNSP 83 PGUID 0 1259 0 0 0 0 0 f f r 24 0 t f f f f 3 _null_ _null_ )); DESCR("");