Thread: GROUPING
Folks, While kicking the tires on the new GROUPING() feature, I noticed that NUMERIC has no cast to bit(n). GROUPING() produces essentially a bitmap, although the standard mandates for some reason that it be a numeric type. I was thinking it should produce NUMERIC rather than int4 as it does now in order to accommodate large numbers of columns, but the usefulness of the bitmap is greatly increased if there's a simple CAST to bit(n). Contravening the spec, but much more usefully, GROUPING should probably produce a (possibly ordered) set of key-value pairs. Alternatively, we could create something like GROUPING_JSON(). Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes: > While kicking the tires on the new GROUPING() feature, I noticed that > NUMERIC has no cast to bit(n). GROUPING() produces essentially a > bitmap, although the standard mandates for some reason that it be a > numeric type. > I was thinking it should produce NUMERIC rather than int4 as it does > now in order to accommodate large numbers of columns, but the > usefulness of the bitmap is greatly increased if there's a simple CAST > to bit(n). Maybe INT8 would be a better choice than INT4? But I'm not sure there's any practical use-case for more than 30 grouping sets anyway. Keep in mind the actual output volume probably grows like 2^N. regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> I was thinking it should produce NUMERIC rather than int4 as it does>> now in order to accommodate large numbers of columns,but the>> usefulness of the bitmap is greatly increased if there's a simple>> CAST to bit(n). Tom> Maybe INT8 would be a better choice than INT4? But I'm not sureTom> there's any practical use-case for more than 30grouping setsTom> anyway. Keep in mind the actual output volume probably grows likeTom> 2^N. Spec says "The declared type of the result is exact numeric with an implementation-defined precision and a scale of 0 (zero)." for what that's worth. It doesn't give any hint that I can see for the max number of columns; it just defines grouping(a...,z) as being equal to 2*grouping(a...) + grouping(z). But the number of grouping sets isn't really relevant here, rather the number of columns used for grouping. In any case, if 31 isn't enough for you, you can call it multiple times: select ..., grouping(a,b,...,z), grouping(a1,b1,...z1), ... I didn't think >31 columns would be an issue, but changing it to bigint is of course trivial if anyone thinks it necessary. A possibly more interesting question is whether any other db products have operations like GROUPING() that we could usefully support? -- Andrew (irc:RhodiumToad)
On 20 May 2015 at 19:41, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Fetter <david@fetter.org> writes: >> While kicking the tires on the new GROUPING() feature, I noticed that >> NUMERIC has no cast to bit(n). GROUPING() produces essentially a >> bitmap, although the standard mandates for some reason that it be a >> numeric type. > >> I was thinking it should produce NUMERIC rather than int4 as it does >> now in order to accommodate large numbers of columns, but the >> usefulness of the bitmap is greatly increased if there's a simple CAST >> to bit(n). > > Maybe INT8 would be a better choice than INT4? But I'm not sure there's > any practical use-case for more than 30 grouping sets anyway. Keep in > mind the actual output volume probably grows like 2^N. > Actually using ROLLUP the output volume only grows linearly with N. I tend to think that having such a large number of grouping sets would be unlikely, however, it seems wrong to be putting an arbitrary limit on it that's significantly smaller than the number of columns allowed in a table. Regards, Dean
>>>>> "Dean" == Dean Rasheed <dean.a.rasheed@gmail.com> writes: >> Maybe INT8 would be a better choice than INT4? But I'm not sure>> there's any practical use-case for more than 30 groupingsets>> anyway. Keep in mind the actual output volume probably grows like>> 2^N. Dean> Actually using ROLLUP the output volume only grows linearly withDean> N. I tend to think that having such a large numberof groupingDean> sets would be unlikely, however, it seems wrong to be putting anDean> arbitrary limit on it that'ssignificantly smaller than theDean> number of columns allowed in a table. Limit on what exactly? Consider that in both MSSQL 2014 and Oracle 12 the limit on the number of arguments in a GROUPING() expression is ... 1. -- Andrew (irc:RhodiumToad)
On 21 May 2015 at 09:20, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: >>>>>> "Dean" == Dean Rasheed <dean.a.rasheed@gmail.com> writes: > > >> Maybe INT8 would be a better choice than INT4? But I'm not sure > >> there's any practical use-case for more than 30 grouping sets > >> anyway. Keep in mind the actual output volume probably grows like > >> 2^N. > > Dean> Actually using ROLLUP the output volume only grows linearly with > Dean> N. I tend to think that having such a large number of grouping > Dean> sets would be unlikely, however, it seems wrong to be putting an > Dean> arbitrary limit on it that's significantly smaller than the > Dean> number of columns allowed in a table. > > Limit on what exactly? > > Consider that in both MSSQL 2014 and Oracle 12 the limit on the number > of arguments in a GROUPING() expression is ... 1. > Actually Oracle haven't quite followed the standard. They have 2 separate functions: GROUPING() which only allows 1 parameter, and GROUPING_ID() which allows multiple parameters, and returns a bitmask like our GROUPING() function. However, their GROUPING_ID() function seems to return an arbitrary precision number and allows an arbitrary number of parameters (well, I tested it up 70 to prove it wasn't a 64-bit number). Regards, Dean
>>>>> "Dean" == Dean Rasheed <dean.a.rasheed@gmail.com> writes: >> Consider that in both MSSQL 2014 and Oracle 12 the limit on the number>> of arguments in a GROUPING() expression is ...1. Dean> Actually Oracle haven't quite followed the standard. They have 2Dean> separate functions: GROUPING() which only allows1 parameter, andDean> GROUPING_ID() which allows multiple parameters, and returns aDean> bitmask like our GROUPING()function. However, theirDean> GROUPING_ID() function seems to return an arbitrary precisionDean> number and allowsan arbitrary number of parameters (well, IDean> tested it up 70 to prove it wasn't a 64-bit number). True. It can handle more than 128 bits, even - I gave up trying after that. So. Options: 1) change GROUPING() to return bigint and otherwise leave it as is. 2) change GROUPING() to return numeric. 3) change GROUPING() so that the result type varies with the number of args. I don't see anything in the spec that actually forbids this - it just says the return type is implementation-defined exact numeric. A) in addition to any of the above, implement GROUPING_ID() as a simple alias for GROUPING(). 4) leave GROUPING() alone and add a separate GROUPING_ID() with a different return type. B) We don't currently have GROUP_ID() - does anyone want it? *) any other ideas? -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > So. Options: > 1) change GROUPING() to return bigint and otherwise leave it as is. > 2) change GROUPING() to return numeric. > 3) change GROUPING() so that the result type varies with the number of > args. I don't see anything in the spec that actually forbids this - it > just says the return type is implementation-defined exact numeric. I'd go with (1) --- it's cheap and doesn't lose any capability. If you do (2), you'd lose useful operations like &, unless you cast back to int which pretty much defeats the purpose. And (3) is just weird and will confuse people, especially since numeric and int don't have identical sets of operations. Or we could just leave things alone. But (1) seems like pretty cheap insurance. > A) in addition to any of the above, implement GROUPING_ID() as a simple > alias for GROUPING(). I see no reason to do that. We don't need to copy Oracle-isms; the language is already complicated enough. regards, tom lane
On 2015-05-21 16:19:27 +0100, Andrew Gierth wrote: > True. It can handle more than 128 bits, even - I gave up trying after that. > > So. Options: > > 1) change GROUPING() to return bigint and otherwise leave it as is. > > 2) change GROUPING() to return numeric. > > 3) change GROUPING() so that the result type varies with the number of > args. I don't see anything in the spec that actually forbids this - it > just says the return type is implementation-defined exact numeric. > > A) in addition to any of the above, implement GROUPING_ID() as a simple > alias for GROUPING(). > > 4) leave GROUPING() alone and add a separate GROUPING_ID() with a > different return type. > > B) We don't currently have GROUP_ID() - does anyone want it? I'd vote for either 0) do nothing or 1). I think the use case for specifying 64+ (or even 32+) columns in grouping is pretty darn slim. And as you said, it's not that hard to work around it if you need it, and that's only going to be in an automated fashion anyway. Greetings, Andres Freund
On Thu, May 21, 2015 at 04:19:27PM +0100, Andrew Gierth wrote: > >>>>> "Dean" == Dean Rasheed <dean.a.rasheed@gmail.com> writes: > > >> Consider that in both MSSQL 2014 and Oracle 12 the limit on the number > >> of arguments in a GROUPING() expression is ... 1. > > Dean> Actually Oracle haven't quite followed the standard. They have 2 > Dean> separate functions: GROUPING() which only allows 1 parameter, and > Dean> GROUPING_ID() which allows multiple parameters, and returns a > Dean> bitmask like our GROUPING() function. However, their > Dean> GROUPING_ID() function seems to return an arbitrary precision > Dean> number and allows an arbitrary number of parameters (well, I > Dean> tested it up 70 to prove it wasn't a 64-bit number). > > True. It can handle more than 128 bits, even - I gave up trying after that. > > So. Options: > > 1) change GROUPING() to return bigint and otherwise leave it as is. Seems cheap and reasonable. Making sure people know that GROUPING can be called multiple times seems like another cheap and reasonable measure. > *) any other ideas? How about a more sensible data structure as a PG-specific addon. GROUPING_JSON() seems like just the thing. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
>>>>> "David" == David Fetter <david@fetter.org> writes: David> How about a more sensible data structure as a PG-specific addon.David> GROUPING_JSON() seems like just the thing. What exactly do you think it should return? -- Andrew (irc:RhodiumToad)
On Thu, May 21, 2015 at 12:21 PM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: >>>>>> "David" == David Fetter <david@fetter.org> writes: > > David> How about a more sensible data structure as a PG-specific addon. > David> GROUPING_JSON() seems like just the thing. > > What exactly do you think it should return? I vote for { "rube" : "goldberg" }. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 21 May 2015 at 17:15, David Fetter <david@fetter.org> wrote: > On Thu, May 21, 2015 at 04:19:27PM +0100, Andrew Gierth wrote: >> >>>>> "Dean" == Dean Rasheed <dean.a.rasheed@gmail.com> writes: >> >> >> Consider that in both MSSQL 2014 and Oracle 12 the limit on the number >> >> of arguments in a GROUPING() expression is ... 1. >> >> Dean> Actually Oracle haven't quite followed the standard. They have 2 >> Dean> separate functions: GROUPING() which only allows 1 parameter, and >> Dean> GROUPING_ID() which allows multiple parameters, and returns a >> Dean> bitmask like our GROUPING() function. However, their >> Dean> GROUPING_ID() function seems to return an arbitrary precision >> Dean> number and allows an arbitrary number of parameters (well, I >> Dean> tested it up 70 to prove it wasn't a 64-bit number). >> >> True. It can handle more than 128 bits, even - I gave up trying after that. >> >> So. Options: >> >> 1) change GROUPING() to return bigint and otherwise leave it as is. > > Seems cheap and reasonable. Making sure people know that GROUPING can > be called multiple times seems like another cheap and reasonable > measure. > Yeah, seems reasonable. The lack of any bitwise operations on numeric makes it pretty-much useless in this context. Regards, Dean
>>>>> "Andres" == Andres Freund <andres@anarazel.de> writes: Andres> I'd vote for either 0) do nothing or 1). I think the use case Andres> for specifying 64+ (or even 32+) columns in grouping is pretty Andres> darn slim. And as you said, it's not that hard to work around Andres> it if you need it, and that's only going to be in an automated Andres> fashion anyway. If the vote goes with (1), this patch ought to suffice: -- Andrew (irc:RhodiumToad)
Attachment
On Thu, May 21, 2015 at 12:24:03PM -0400, Robert Haas wrote: > On Thu, May 21, 2015 at 12:21 PM, Andrew Gierth > <andrew@tao11.riddles.org.uk> wrote: > >>>>>> "David" == David Fetter <david@fetter.org> writes: > > > > David> How about a more sensible data structure as a PG-specific addon. > > David> GROUPING_JSON() seems like just the thing. > > > > What exactly do you think it should return? > > I vote for { "rube" : "goldberg" }. Your point is well taken. I had { "target_list_name" : false, ... } How about GROUPING_BYTEA()? Also is there a really great reason that bitwise operations don't work on NUMERIC? Lack of tuits is a good reason, but not, it seems to me, a great one. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes: > Also is there a really great reason that bitwise operations don't work > on NUMERIC? Lack of tuits is a good reason, but not, it seems to me, > a great one. Not sure that bitwise operations make too much sense on values that are (a) possibly fractional and (b) inherently decimal not binary. regards, tom lane