Thread: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Andrew Gierth
Date:
The spec defines two types of aggregate function classed as "ordered set function", as follows: 1. An "inverse distribution function" taking one argument (which must be a grouped column or otherwise constant within groups)plus a sorted group with exactly one column: =# SELECT (func(p) WITHIN GROUP (ORDER BY q)) from ... The motivatingexample for this (and the only ones in the spec) are percentile_cont and percentile_disc, to return a percentileresult from a continuous or discrete distribution. (Thus percentile_cont(0.5) within group (order by x) is thespec's version of a median(x) function.) 2. A "hypothetical set function" taking N arguments of arbitrary types (a la VARIADIC "any", rather than a fixed list) plusa sorted group with N columns of matching types: =# SELECT (func(p1,p2,...) WITHIN GROUP (ORDER BY q1,q2,...)) from... (where typeof(p1)==typeof(q1) and so on, at least up to trivial conversions) The motivating example here is tobe able to do rank(p1,p2,...) to return the rank that the specified values would have had if they were added to the group. As usual, we do not want to constrain ourselves to supporting only the specific cases in the spec, but would prefer a general solution. We (meaning myself and Atri) have an implementation that basically works, though it is not yet complete, but before taking it any further we need to resolve the design question of how to represent these two types of function in the system catalogs. The fact that there are in effect two parts to the parameter list, which are either independent (for inverse distribution funcs) or closely related (for hypothetical set functions), doesn't seem to point to an obvious way to represent this in pg_proc/pg_aggregate. I'm not yet satisfied with the method used in our implementation, so we're throwing this open for suggestions. We will post the work-in-progress patch along with a description of its current implementation shortly. One of the major complications is that we ideally want to be able to do polymorphism based on the type of the sorted group, specifically in order to be able to do percentile_disc(float8) within group (order by anyelement) returning anyelement. (i.e. we should be able to get a discrete percentile from any type that is orderable.) The question here is how to resolve the return type both of the aggregate itself and of the finalfn. We've also had an expression of interest in extending this to allow percentile_disc(float8[]) and percentile_cont(float8[]) returning arrays; e.g. percentile_cont(array[0, 0.25, 0.5, 0.75, 1]) to return an array containing the bounds, median and quartiles in one go. This is an extension to the spec but it seems sufficiently obviously useful to be worth supporting. Comments? -- Andrew (irc:RhodiumToad)
Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
David Fetter
Date:
On Thu, Jul 18, 2013 at 03:15:14AM +0000, Andrew Gierth wrote: > The spec defines two types of aggregate function classed as "ordered set > function", as follows: > > 1. An "inverse distribution function" taking one argument (which must be > a grouped column or otherwise constant within groups) plus a sorted > group with exactly one column: > > =# SELECT (func(p) WITHIN GROUP (ORDER BY q)) from ... > > The motivating example for this (and the only ones in the spec) are > percentile_cont and percentile_disc, to return a percentile result > from a continuous or discrete distribution. (Thus > percentile_cont(0.5) within group (order by x) is the spec's version > of a median(x) function.) > > 2. A "hypothetical set function" taking N arguments of arbitrary types > (a la VARIADIC "any", rather than a fixed list) plus a sorted group > with N columns of matching types: > > =# SELECT (func(p1,p2,...) WITHIN GROUP (ORDER BY q1,q2,...)) from ... > > (where typeof(p1)==typeof(q1) and so on, at least up to trivial > conversions) > > The motivating example here is to be able to do rank(p1,p2,...) to > return the rank that the specified values would have had if they were > added to the group. > > As usual, we do not want to constrain ourselves to supporting only the > specific cases in the spec, but would prefer a general solution. > > We (meaning myself and Atri) have an implementation that basically > works, though it is not yet complete, but before taking it any further > we need to resolve the design question of how to represent these two > types of function in the system catalogs. The fact that there are in > effect two parts to the parameter list, which are either independent > (for inverse distribution funcs) or closely related (for hypothetical > set functions), doesn't seem to point to an obvious way to represent > this in pg_proc/pg_aggregate. > > I'm not yet satisfied with the method used in our implementation, What is that method? > so we're throwing this open for suggestions. We will post the > work-in-progress patch along with a description of its current > implementation shortly. > > One of the major complications is that we ideally want to be able to > do polymorphism based on the type of the sorted group, specifically > in order to be able to do > > percentile_disc(float8) within group (order by anyelement) > > returning anyelement. (i.e. we should be able to get a discrete > percentile from any type that is orderable.) The question here is > how to resolve the return type both of the aggregate itself and of > the finalfn. > > We've also had an expression of interest in extending this to allow > percentile_disc(float8[]) and percentile_cont(float8[]) returning > arrays; e.g. percentile_cont(array[0, 0.25, 0.5, 0.75, 1]) to return > an array containing the bounds, median and quartiles in one go. This > is an extension to the spec but it seems sufficiently obviously > useful to be worth supporting. > > Comments? I'm really happy to see PostgreSQL come into its own when it comes to the analytics side of the house :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Atri Sharma
Date:
On Thu, Jul 18, 2013 at 10:02 AM, David Fetter <david@fetter.org> wrote: > On Thu, Jul 18, 2013 at 03:15:14AM +0000, Andrew Gierth wrote: >> The spec defines two types of aggregate function classed as "ordered set >> function", as follows: >> >> 1. An "inverse distribution function" taking one argument (which must be >> a grouped column or otherwise constant within groups) plus a sorted >> group with exactly one column: >> >> =# SELECT (func(p) WITHIN GROUP (ORDER BY q)) from ... >> >> The motivating example for this (and the only ones in the spec) are >> percentile_cont and percentile_disc, to return a percentile result >> from a continuous or discrete distribution. (Thus >> percentile_cont(0.5) within group (order by x) is the spec's version >> of a median(x) function.) >> >> 2. A "hypothetical set function" taking N arguments of arbitrary types >> (a la VARIADIC "any", rather than a fixed list) plus a sorted group >> with N columns of matching types: >> >> =# SELECT (func(p1,p2,...) WITHIN GROUP (ORDER BY q1,q2,...)) from ... >> >> (where typeof(p1)==typeof(q1) and so on, at least up to trivial >> conversions) >> >> The motivating example here is to be able to do rank(p1,p2,...) to >> return the rank that the specified values would have had if they were >> added to the group. >> >> As usual, we do not want to constrain ourselves to supporting only the >> specific cases in the spec, but would prefer a general solution. >> >> We (meaning myself and Atri) have an implementation that basically >> works, though it is not yet complete, but before taking it any further >> we need to resolve the design question of how to represent these two >> types of function in the system catalogs. The fact that there are in >> effect two parts to the parameter list, which are either independent >> (for inverse distribution funcs) or closely related (for hypothetical >> set functions), doesn't seem to point to an obvious way to represent >> this in pg_proc/pg_aggregate. >> >> I'm not yet satisfied with the method used in our implementation, > > What is that method? We currently represent ordered set functions with a new bool flag in pg_aggregate. The flag is set to true for ordered set functions(obviously) and false for all others. The currently implemented functions i.e. percentile_disc, percentile_cont and percentile_cont for intervals have their finalfns present in pg_aggregate. The aggregate functions take in two arguments, one for the percentile value and other for the input row set. So, percentile_cont's entry in pg_proc has float8 and float8 as its parameters and another entry of percentile_cont (with the interval version as the finalfn) has float8 and interval as its parameter types. As you can see, there isn't a way right now to resolve the return type of the aggregate for polymorphic cases. This is something we wish to resolve. Regards, Atri -- Regards, Atri l'apprenant
Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Josh Berkus
Date:
On 07/17/2013 08:15 PM, Andrew Gierth wrote: > The spec defines two types of aggregate function classed as "ordered set > function", as follows: > > 1. An "inverse distribution function" taking one argument (which must be > a grouped column or otherwise constant within groups) plus a sorted > group with exactly one column: > > =# SELECT (func(p) WITHIN GROUP (ORDER BY q)) from ... > > The motivating example for this (and the only ones in the spec) are > percentile_cont and percentile_disc, to return a percentile result > from a continuous or discrete distribution. (Thus > percentile_cont(0.5) within group (order by x) is the spec's version > of a median(x) function.) One question is how this relates to the existing SELECT agg_func(x order by y) ... syntax. Clearly there's some extra functionality here, but the two are very similar conceptually. > 2. A "hypothetical set function" taking N arguments of arbitrary types > (a la VARIADIC "any", rather than a fixed list) plus a sorted group > with N columns of matching types: > > =# SELECT (func(p1,p2,...) WITHIN GROUP (ORDER BY q1,q2,...)) from ... > > (where typeof(p1)==typeof(q1) and so on, at least up to trivial > conversions) > > The motivating example here is to be able to do rank(p1,p2,...) to > return the rank that the specified values would have had if they were > added to the group. Wow, I can't possibly grasp the purpose of this. Maybe a practical example? > We've also had an expression of interest in extending this to allow > percentile_disc(float8[]) and percentile_cont(float8[]) returning > arrays; e.g. percentile_cont(array[0, 0.25, 0.5, 0.75, 1]) to return an > array containing the bounds, median and quartiles in one go. This is an > extension to the spec but it seems sufficiently obviously useful to be > worth supporting. To be specific, I asked for this because it's already something I do using PL/R, although in PL/R it's pretty much limited to floats. Anyway, for anyone who isn't following why we want this: statitical summary reports. For example, I'd love to be able to do a quartile distribution of query execution times without resorting to R. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Andrew Gierth
Date:
Josh Berkus wrote: > On 07/17/2013 08:15 PM, Andrew Gierth wrote: > > The spec defines two types of aggregate function classed as "ordered set > > function", as follows: > > > > 1. An "inverse distribution function" taking one argument (which must be > > a grouped column or otherwise constant within groups) plus a sorted > > group with exactly one column: > > > > =# SELECT (func(p) WITHIN GROUP (ORDER BY q)) from ... > > > > The motivating example for this (and the only ones in the spec) are > > percentile_cont and percentile_disc, to return a percentile result > > from a continuous or discrete distribution. (Thus > > percentile_cont(0.5) within group (order by x) is the spec's version > > of a median(x) function.) > > One question is how this relates to the existing > > SELECT agg_func(x order by y) > > ... syntax. Clearly there's some extra functionality here, but the two > are very similar conceptually. Well, as you probably know, the spec is a whole pile of random special-case syntax and any similarities are probably more accidental than anything else. A major difference is that in agg(x order by y), the values of y are not passed to the aggregate function - they serve no purpose other than controlling the order of the "x" values. Whereas in WITHIN GROUP, the values in the ORDER BY ... clause are in some sense the primary input to the aggregate, and the "p" argument is secondary and can't vary between rows of the group. Our implementation does heavily reuse the existing executor mechanics for ORDER BY in aggregates, and it also reuses a fair chunk of the parser code for it, but there are significant differences. >[of hypothetical set functions] > Wow, I can't possibly grasp the purpose of this. Maybe a practical > example? =# select rank(123) within group (order by x) from (values (10),(50),(100),(200),(500)) v(x); would return 1 row containing the value 4, because if you added the value 123 to the grouped values, it would have been ranked 4th. Any time you want to calculate what the rank, dense_rank or cume_dist would be of a specific row within a group without actually adding the row to the group, this is how it's done. I don't have any practical examples to hand, but this beast seems to be implemented in at least Oracle and MSSQL so I guess it has uses. >[on supporting arrays of percentiles] > To be specific, I asked for this because it's already something I do > using PL/R, although in PL/R it's pretty much limited to floats. percentile_cont is limited to floats and intervals in the spec; to be precise, it's limited to taking args of either interval or any numeric type, and returns interval for interval args, and "approximate numeric with implementation-defined precision", i.e. some form of float, for numeric-type args. The definition requires interpolation between values, so it's not clear that there's any point in trying to allow other types. percentile_disc is also limited to floats and intervals in the spec, but I see absolutely no reason whatsoever for this, since the definition given is valid for any type with ordering operators; there is no reason not to make it fully polymorphic. (The requirement for ordering will be enforced in parse-analysis anyway, by the ORDER BY transformations, and the function simply returns one of the input values unaltered.) -- Andrew (irc:RhodiumToad)
Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Josh Berkus
Date:
Andrew, > Well, as you probably know, the spec is a whole pile of random > special-case syntax and any similarities are probably more accidental > than anything else. Hah, I didn't realize that our ordered aggregate syntax even *was* spec. > A major difference is that in agg(x order by y), the values of y are > not passed to the aggregate function - they serve no purpose other > than controlling the order of the "x" values. Whereas in WITHIN GROUP, > the values in the ORDER BY ... clause are in some sense the primary > input to the aggregate, and the "p" argument is secondary and can't > vary between rows of the group. > > Our implementation does heavily reuse the existing executor mechanics > for ORDER BY in aggregates, and it also reuses a fair chunk of the > parser code for it, but there are significant differences. Well, seems like it would work the same as agg_func(constx,coly,colz ORDER BY coly, colz) ... which means you could reuse a LOT of the internal plumbing. Or am I missing something? Also, what would a CREATE AGGREGATE and state function definition for custom WITHIN GROUP aggregates look like? > Any time you want to calculate what the rank, dense_rank or cume_dist > would be of a specific row within a group without actually adding the > row to the group, this is how it's done. > > I don't have any practical examples to hand, but this beast seems to > be implemented in at least Oracle and MSSQL so I guess it has uses. Well, I still can't imagine a practical use for it, at least based on RANK. I certainly have no objections if you have the code, though. I'll also point out that mode() requires ordered input as well, so add that to the set of functions we'll want to eventually support. One thing I find myself wanting with ordered aggregates is the ability to exclude NULLs. Thoughts? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Andrew Gierth
Date:
Josh Berkus wrote: > Hah, I didn't realize that our ordered aggregate syntax even *was* spec. The spec defines agg(x order by y) only for array_agg and xmlagg; the generalization to arbitrary other aggregates is our extension. (But kind of obvious really.) > > Our implementation does heavily reuse the existing executor mechanics > > for ORDER BY in aggregates, and it also reuses a fair chunk of the > > parser code for it, but there are significant differences. > > Well, seems like it would work the same as > > agg_func(constx,coly,colz ORDER BY coly, colz) > > ... which means you could reuse a LOT of the internal plumbing. Or am I > missing something? You missed the part above which said "...does heavily reuse..." :-) i.e. we are in fact reusing a lot of the internal plumbing. > Also, what would a CREATE AGGREGATE and state function definition for > custom WITHIN GROUP aggregates look like? Now this is exactly the part we haven't nailed down yet and want ideas for. The problem is, given that the parser is looking at: foo(p1,p2,...) within group (order by q1,q2,...) how do we best represent the possible matching functions in pg_proc and pg_aggregate? Our partial solution so far does not allow polymorphism to work properly, so we need a better way; I'm hoping for some independent suggestions before I post my own ideas. -- Andrew (irc:RhodiumToad)
Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Josh Berkus
Date:
> The problem is, given that the parser is looking at: > > foo(p1,p2,...) within group (order by q1,q2,...) > > how do we best represent the possible matching functions in pg_proc > and pg_aggregate? Our partial solution so far does not allow > polymorphism to work properly, so we need a better way; I'm hoping for > some independent suggestions before I post my own ideas. Yeah, you'd need to extend VARIADIC somehow. That is, I should be able to define a function as: percentile_state (pctl float,ordercols VARIADIC ANY ) returns VARIADIC ANY ... so that it can handle the sorting. Another way to look at it would be: percentile_state (pctl float,orderedset ANONYMOUS ROW ) returns ANONYMOUS ROW as ... ... because really, what you're handing the state function is an anonymous row type constructed of the order by phrase. Of course, then we have to have some way to manipulate the anonymous row from within the function; at the very least, an equality operator. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Noah Misch
Date:
On Thu, Jul 18, 2013 at 10:33:15PM +0000, Andrew Gierth wrote: > Josh Berkus wrote: > > Well, seems like it would work the same as > > > > agg_func(constx,coly,colz ORDER BY coly, colz) I'd try transforming WITHIN GROUP into the above during parse analysis. The default would be the transformation for hypothetical set functions: agg(x,y,z) WITHIN GROUP (ORDER BY a,b,c) -> agg(x,y,z ORDER BY a,b,c) Add a CREATE AGGREGATE option, say SQL_INVERSE_DISTRIBUTION_FUNCTION = {true|false} or SQLIDF, that chooses the IDF transformation: agg(x) WITHIN GROUP (ORDER BY y) -> agg(x, y ORDER BY y) Then there's perhaps no new core aggregation or function candidacy machinery. (I don't know whether VARIADIC transition functions work today, but that would become an orthogonal project.) Compare how we handle standard interval typmod syntax; only the parser and deparser know about it. Atri's description upthread sounded pretty similar to that. > > Also, what would a CREATE AGGREGATE and state function definition for > > custom WITHIN GROUP aggregates look like? > > Now this is exactly the part we haven't nailed down yet and want ideas > for. PERCENTILE_DISC would be declared as (float8, anyelement) with that SQLIDF option. To diagnose nonsensical calls made through nonstandard syntax, it could dig into its AggState to verify that its second argument is equal() to its first ORDER BY expression. There would be a question of whether to accept the WITHIN GROUP syntax for any aggregate or just for those for which the standard indicates it. Then follows the question of when to deparse as WITHIN GROUP and when to deparse as the internal syntax. I'd lean toward accepting WITHIN GROUP for any aggregate but deparsing that way only SQLIDF aggregates and aggregates with names matching the standard hypothetical set function names. Or you could add a second CREATE AGGREGATE option requesting hypothetical-set-function deparse style. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
Re: Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes: > (I don't know whether VARIADIC transition functions work today, but that would > become an orthogonal project.) Coincidentally enough, some Salesforce folk were asking me about allowing VARIADIC aggregates just a few days ago. I experimented enough to find out that if you make an array-accepting transition function, and then force the aggregate's pg_proc entry to look like it's variadic (by manually setting provariadic and some other fields), then everything seems to Just Work: the parser and executor are both fine with it. So I think all that's needed here is to add some syntax support to CREATE AGGREGATE, and probably make some tweaks in pg_dump. I was planning to go work on that sometime soon. Having said that, though, what Andrew seemed to want was VARIADIC ANY, which is a *completely* different kettle of fish, since the actual parameters can't be converted to an array. I'm not sure if that's as easy to support. regards, tom lane
Re: Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Atri Sharma
Date:
Hi all, This is our current work-in-progress patch for WITHIN GROUP. What mostly works: - percentile_cont(float8) within group (order by float8) - percentile_cont(float8) within group (order by interval) - percentile_disc(float8) within group (order by float8) What doesn't work: - supporting other types in percentile_disc (want polymorphism to work first) - no commands yet to add new ordered set functions (want to nail down the catalog representation first) - no hypothetical set functions yet (need to resolve the above two points first) - some rough edges - probably some bugs - docs Implementation details: For execution, we repurpose the existing aggregate-orderby mechanics. Given func(directargs) WITHIN GROUP (ORDER BY args), we process the (ORDER BY args) into a tuplesort in the same way currently done for agg(args ORDER BY args). Rather than using a transfn, we then call the finalfn as finalfn(directargs), providing an API by which the finalfn can access the tuplesort. (This is somewhat inspired by the window function API, but unfortunately has nothing in common with it in terms of requirements, so we couldn't just reuse it.) func(p1,p2,...) WITHIN GROUP (ORDER BY q1,q2,...) is represented in the catalog with two pg_proc rows: func(p1,p2,...,q1,q2,...) (proisagg=true) func_final(p1,p2,...) with the usual pg_aggregate row linking them, though aggtransfn is set to InvalidOid (as is aggtranstype) and an additional flag indicates that this is an ordered set function. (This representation is inadequate for a number of reasons; it does not handle polymorphism well and would require special-case coding for hypothetical set functions, which we have not yet tackled. See our other post.) Regards, Atri -- Regards, Atri l'apprenant
Attachment
Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
John Galloway
Date:
"some Salesforce folks" that would be me! It looks like I didn't quite communicate to Tom just what I was looking for as I do indeed want to have a variable number of "any" types, as:
CREATE AGGREGATE FOO ( ANYELEMENT, <more types>, VARIADIC "any") (
...
STYPE = ANYARRAY
...)
so the corresponding transition function would be
CREATE FUNCTION FOO_sfunc( ANYARRAY, ANYELEMENT, <more types>, VARIADIC "any") RETURNS ANYARRAY
and the final func is
CREATE FUNCTION FOO_ffunc( ANYARRAY ) RETURNS ANYELEMENT
The functions are in C, and I cheat and actually use the ANYARRAY transition variable as a struct just keeping the varlena length correct (thanks to Tom for that idea). Currently I just support a fixed number of "any" args but really need to have that be variable.
So supporting VARIADIC "any" for user defined aggregates would be most useful.
On Thu, Jul 18, 2013 at 7:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Noah Misch <noah@leadboat.com> writes:
> (I don't know whether VARIADIC transition functions work today, but that would
> become an orthogonal project.)
Coincidentally enough, some Salesforce folk were asking me about allowing
VARIADIC aggregates just a few days ago. I experimented enough to find
out that if you make an array-accepting transition function, and then
force the aggregate's pg_proc entry to look like it's variadic (by
manually setting provariadic and some other fields), then everything
seems to Just Work: the parser and executor are both fine with it.
So I think all that's needed here is to add some syntax support to
CREATE AGGREGATE, and probably make some tweaks in pg_dump. I was
planning to go work on that sometime soon.
Having said that, though, what Andrew seemed to want was VARIADIC ANY,
which is a *completely* different kettle of fish, since the actual
parameters can't be converted to an array. I'm not sure if that's
as easy to support.
regards, tom lane
Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Andrew Gierth
Date:
Ok, since Atri posted our work-so-far and there's not been much comment, I'll outline here my proposed plan of attack. Rather than, as in the WIP patch, using the agg finalfn to validate the split between normal args and ORDER BY args, I propose this: Firstly, as in the WIP patch, func(a) within group (order by b) is looked up as though it were func(a,b). The result must be marked as an ordered set function. A new pg_aggregate integer column, aggordnargs (?), must be equal to the number of normal args (i.e. (a) in this case). Note that this may be 0; one can see a legitimate use case for mode() within group (order by anyelement) for example. The finalfn must be defined to have the same signature, so its args are processed as if it were func_final(a,b) - but only a dummy arg is passed for b. (Similar to the case for window functions.) Resolution of polymorphic parameters and result types therefore works as normal. For hypothetical set functions we add a special case, aggordnargs=-1, for which both the aggregate and the finalfn must be defined as (variadic "any") and parse analysis detects this case and unifies the types of the normal args with those of the ORDER BY args. I propose this new syntax: create aggregate func(argtypes...) within group (argtypes...) ( [ STYPE = ... , ] [ SORTOP = ... , ] [ INITCOND = ... , ]FINALFUNC = func_final ); Ordered set functions will typically not need STYPE etc., but hypothetical set functions will be declared as, e.g.: create aggregate rank(variadic "any") within group (variadic "any") ( STYPE = boolean, INITCOND = 'f', SORTOP = >, FINALFUNC= rank_hypothetical_final ); (I'm open to comment as to whether to simply overload the aggsortop column in pg_aggregate or add a new one. I'm inclined to do the latter.) The idea here is that a column of type STYPE will be appended to the list of columns to be sorted, using SORTOP as sort operator, and all input rows will have this column initialized to the INITCOND value. This is to make it easy to implement rank() and friends by simply inserting the hypothetical row into the sort, with a true value to flag it, and finding its position in the sort result. (Better that than comparing the hypothetical row against the whole group.) (Security caveat: it will be necessary for the finalfn in such cases to validate that the additional column exists with the right type. Producing the wrong result is acceptable if the values in it are unexpected; crashing is not.) Any comment before we get back to coding? -- Andrew (irc:RhodiumToad)
Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Noah Misch
Date:
On Tue, Jul 23, 2013 at 01:21:52AM +0000, Andrew Gierth wrote: > For hypothetical set functions we add a special case, aggordnargs=-1, > for which both the aggregate and the finalfn must be defined as > (variadic "any") and parse analysis detects this case and unifies the > types of the normal args with those of the ORDER BY args. Other aggregates based on this syntax might not desire such type unification. Having parse analysis do that distorts the character of an "any" argument. I think the proper place for such processing is the first call to a transition function. The transition functions could certainly call a new API exposed under src/backend/parser to do the heavy lifting. But let's not make the parser presume that an aggordnargs=-1 aggregate always wants its "any" arguments handled in the manner of the standard hypothetical set functions. The rest of the plan looks good so far. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Andrew Gierth
Date:
Noah Misch said: > Other aggregates based on this syntax might not desire such type unification. Then there would have to be some way to distinguish that. Maybe those could have -1 and the standard hypothetical set functions -2, with some flag in CREATE AGGREGATE to sort it out. > Having parse analysis do that distorts the character of an "any" argument. I > think the proper place for such processing is the first call to a transition > function. Except there isn't one. > But let's not make the > parser presume that an aggordnargs=-1 aggregate always wants its "any" > arguments handled in the manner of the standard hypothetical set functions. This has to happen in the parser because these are errors that should be caught before execution: rank(foo) within group (order by bar,baz) rank(integercol) within group (order by textcol) And collations have to be resolved (pairwise) before sorting can happen: rank(textval COLLATE "C") within group (order by foo) -- sorts in "C" rank(textval COLLATE "C") within group (order by bar COLLATE "en_US") -- error (basically, in rank(x) within group (order by y) where x and y are collatable, the collation rules apply exactly as though you were doing (x < y), with all the implicit vs. explicit stuff included) And this: rank(1.1) within group (order by intcol) should become rank(1.1) within group (order by intcol::numeric) -- Andrew (irc:RhodiumToad)
Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Noah Misch
Date:
On Wed, Jul 24, 2013 at 04:16:28AM +0000, Andrew Gierth wrote: > Noah Misch said: > > Other aggregates based on this syntax might not desire such type unification. > > Then there would have to be some way to distinguish that. Maybe those could > have -1 and the standard hypothetical set functions -2, with some flag in > CREATE AGGREGATE to sort it out. Sure. > > But let's not make the > > parser presume that an aggordnargs=-1 aggregate always wants its "any" > > arguments handled in the manner of the standard hypothetical set functions. > > This has to happen in the parser because these are errors that should be > caught before execution: > > rank(foo) within group (order by bar,baz) > rank(integercol) within group (order by textcol) Would be nice to have, but not an overriding concern. > And collations have to be resolved (pairwise) before sorting can happen: > > rank(textval COLLATE "C") within group (order by foo) -- sorts in "C" > rank(textval COLLATE "C") within group (order by bar COLLATE "en_US") -- error > > (basically, in rank(x) within group (order by y) where x and y are > collatable, the collation rules apply exactly as though you were doing > (x < y), with all the implicit vs. explicit stuff included) This, though, makes direct parser support nigh inevitable. The issue to resolve here is whether and to what extent we should implement the SQL-standard hypothetical set functions as special cases of some more-generic concept. Here's the declaration you proposed for the rank() HSF: > create aggregate rank(variadic "any") within group (variadic "any") ( This would be the first place to my knowledge where "any" doesn't mean unrestricted type acceptance at the parser level. If we need bespoke syntax declaring a function as an HSF with the entailed call behavior nuances, fine. Treating a declaration with this particular mix of "any" as a secret handshake requesting those nuances is not the way to go. > STYPE = boolean, > INITCOND = 'f', > SORTOP = >, > FINALFUNC = rank_hypothetical_final > ); -- Noah Misch EnterpriseDB http://www.enterprisedb.com
Re: Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Peter Eisentraut
Date:
On Fri, 2013-07-19 at 21:29 +0530, Atri Sharma wrote: > Hi all, > > This is our current work-in-progress patch for WITHIN GROUP. Please fix these compiler warnings: parse_agg.c: In function ‘check_ungrouped_columns_walker’: parse_agg.c:848:3: warning: passing argument 1 of ‘check_ungrouped_columns_walker’ from incompatible pointer type [enabledby default] parse_agg.c:822:1: note: expected ‘struct Node *’ but argument is of type ‘struct List *’ parse_func.c: In function ‘make_fn_arguments’: parse_func.c:1540:9: warning: assignment from incompatible pointer type [enabled by default] parse_func.c:1547:15: warning: assignment from incompatible pointer type [enabled by default]
Re: Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Peter Eisentraut
Date:
On 7/19/13 11:59 AM, Atri Sharma wrote: > Hi all, > > This is our current work-in-progress patch for WITHIN GROUP. This patch needs to be rebased.
Re: Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Atri Sharma
Date:
Sent from my iPad On 04-Sep-2013, at 21:38, Peter Eisentraut <peter_e@gmx.net> wrote: > On 7/19/13 11:59 AM, Atri Sharma wrote: >> Hi all, >> >> This is our current work-in-progress patch for WITHIN GROUP. > > This patch needs to be rebased. > This version of patch is quite old.We will be sending an updated patch before the start of September commitfest, with allthe points you mentioned taken care of. Thanks for the points. Regards, Atri
Re: Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Atri Sharma
Date:
On Wed, Sep 4, 2013 at 10:01 PM, Atri Sharma <atri.jiit@gmail.com> wrote: > > > Sent from my iPad > > On 04-Sep-2013, at 21:38, Peter Eisentraut <peter_e@gmx.net> wrote: > >> On 7/19/13 11:59 AM, Atri Sharma wrote: >>> Hi all, >>> >>> This is our current work-in-progress patch for WITHIN GROUP. >> >> This patch needs to be rebased. Hi All, This is our complete patch for implementation of WITHIN GROUP. Functions supported: percentile_disc percentile_cont for float8 and intervals percentile_disc and percentile_cont support arrays of percentiles as well. mode rank dense_rank percent_rank cume_dist The patch also adds support for user defined ordered set functions with CREATE AGGREGATE. Polymorphism is now supported, with the original gripes about it now solved. Essentially, we have added a new field in pg_aggregate, aggordnargs, which we use it to verify, having looked up the function, that it is being called correctly. aggordnargs holds the number of direct args to the aggregate. Hypothetical set functions build over the extension of VARIADIC, and all of the hypothetical set functions have variadic 'any' as their parameter types. Need review: 1) psql /df and /dfa output. 2) Handling of non hypothetical collations. 3) Need of mode(), and the name. Feedback/Comments? Regards, Atri
Attachment
Re: Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Peter Eisentraut
Date:
On Fri, 2013-09-13 at 14:56 +0530, Atri Sharma wrote: > This is our complete patch for implementation of WITHIN GROUP. Please fix compiler warnings: inversedistribution.c: In function ‘mode_final’: inversedistribution.c:276:11: warning: ‘mode_val’ may be used uninitialized in this function [-Wmaybe-uninitialized] inversedistribution.c:299:8: warning: ‘last_val’ may be used uninitialized in this function [-Wmaybe-uninitialized]
Re: Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Andrew Gierth
Date:
>>>>> "Peter" == Peter Eisentraut <peter_e@gmx.net> writes: Peter> Please fix compiler warnings: Someone should do the same in WaitForBackgroundWorkerStartup so that building with -Werror works. New patch coming shortly. -- Andrew.
Re: Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Andrew Gierth
Date:
>>>>> "Peter" == Peter Eisentraut <peter_e@gmx.net> writes: Peter> Please fix compiler warnings: Done. -- Andrew (irc:RhodiumToad)
Attachment
Re: Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Robert Haas
Date:
On Sat, Sep 14, 2013 at 7:23 AM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: > Peter> Please fix compiler warnings: > > Someone should do the same in WaitForBackgroundWorkerStartup so that > building with -Werror works. I don't get a warning there. Can you be more specific about the problem? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Andrew Gierth
Date:
>>>>> "Robert" == Robert Haas <robertmhaas@gmail.com> writes: >> Someone should do the same in WaitForBackgroundWorkerStartup so>> that building with -Werror works. Robert> I don't get a warning there. Can you be more specific aboutRobert> the problem? bgworker.c: In function 'WaitForBackgroundWorkerStartup': bgworker.c:866: warning: 'pid' may be used uninitialized in this function gcc 4.2.2 / freebsd 8.2 -- Andrew (irc:RhodiumToad)
Re: Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Robert Haas
Date:
On Tue, Sep 17, 2013 at 2:27 PM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: >>>>>> "Robert" == Robert Haas <robertmhaas@gmail.com> writes: > > >> Someone should do the same in WaitForBackgroundWorkerStartup so > >> that building with -Werror works. > > Robert> I don't get a warning there. Can you be more specific about > Robert> the problem? > > bgworker.c: In function 'WaitForBackgroundWorkerStartup': > bgworker.c:866: warning: 'pid' may be used uninitialized in this function Does the attached patch fix it for you? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Re: Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Andrew Gierth
Date:
>>>>> "Robert" == Robert Haas <robertmhaas@gmail.com> writes: >> bgworker.c: In function 'WaitForBackgroundWorkerStartup':>> bgworker.c:866: warning: 'pid' may be used uninitialized inthis function Robert> Does the attached patch fix it for you? It compiles without error and looks ok... -- Andrew (irc:RhodiumToad)
Re: Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
From
Robert Haas
Date:
On Thu, Sep 19, 2013 at 12:52 PM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: > It compiles without error and looks ok... Thanks for checking. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company