Thread: GROUPING

GROUPING

From
David Fetter
Date:
Folks,

While kicking the tires on the new GROUPING() feature, I noticed that
NUMERIC has no cast to bit(n).  GROUPING() produces essentially a
bitmap, although the standard mandates for some reason that it be a
numeric type.

I was thinking it should produce NUMERIC rather than int4 as it does
now in order to accommodate large numbers of columns, but the
usefulness of the bitmap is greatly increased if there's a simple CAST
to bit(n).

Contravening the spec, but much more usefully, GROUPING should
probably produce a (possibly ordered) set of key-value pairs.
Alternatively, we could create something like GROUPING_JSON().

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: GROUPING

From
Tom Lane
Date:
David Fetter <david@fetter.org> writes:
> While kicking the tires on the new GROUPING() feature, I noticed that
> NUMERIC has no cast to bit(n).  GROUPING() produces essentially a
> bitmap, although the standard mandates for some reason that it be a
> numeric type.

> I was thinking it should produce NUMERIC rather than int4 as it does
> now in order to accommodate large numbers of columns, but the
> usefulness of the bitmap is greatly increased if there's a simple CAST
> to bit(n).

Maybe INT8 would be a better choice than INT4?  But I'm not sure there's
any practical use-case for more than 30 grouping sets anyway.  Keep in
mind the actual output volume probably grows like 2^N.
        regards, tom lane



Re: GROUPING

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>> I was thinking it should produce NUMERIC rather than int4 as it does>> now in order to accommodate large numbers of
columns,but the>> usefulness of the bitmap is greatly increased if there's a simple>> CAST to bit(n).
 
Tom> Maybe INT8 would be a better choice than INT4?  But I'm not sureTom> there's any practical use-case for more than
30grouping setsTom> anyway.  Keep in mind the actual output volume probably grows likeTom> 2^N.
 

Spec says "The declared type of the result is exact numeric with an
implementation-defined precision and a scale of 0 (zero)."  for what
that's worth.  It doesn't give any hint that I can see for the max
number of columns; it just defines grouping(a...,z) as being equal to
2*grouping(a...) + grouping(z).

But the number of grouping sets isn't really relevant here, rather the
number of columns used for grouping.

In any case, if 31 isn't enough for you, you can call it multiple times:

select ..., grouping(a,b,...,z), grouping(a1,b1,...z1), ...

I didn't think >31 columns would be an issue, but changing it to bigint
is of course trivial if anyone thinks it necessary.

A possibly more interesting question is whether any other db products
have operations like GROUPING() that we could usefully support?

-- 
Andrew (irc:RhodiumToad)



Re: GROUPING

From
Dean Rasheed
Date:
On 20 May 2015 at 19:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Fetter <david@fetter.org> writes:
>> While kicking the tires on the new GROUPING() feature, I noticed that
>> NUMERIC has no cast to bit(n).  GROUPING() produces essentially a
>> bitmap, although the standard mandates for some reason that it be a
>> numeric type.
>
>> I was thinking it should produce NUMERIC rather than int4 as it does
>> now in order to accommodate large numbers of columns, but the
>> usefulness of the bitmap is greatly increased if there's a simple CAST
>> to bit(n).
>
> Maybe INT8 would be a better choice than INT4?  But I'm not sure there's
> any practical use-case for more than 30 grouping sets anyway.  Keep in
> mind the actual output volume probably grows like 2^N.
>

Actually using ROLLUP the output volume only grows linearly with N. I
tend to think that having such a large number of grouping sets would
be unlikely, however, it seems wrong to be putting an arbitrary limit
on it that's significantly smaller than the number of columns allowed
in a table.

Regards,
Dean



Re: GROUPING

From
Andrew Gierth
Date:
>>>>> "Dean" == Dean Rasheed <dean.a.rasheed@gmail.com> writes:
>> Maybe INT8 would be a better choice than INT4?  But I'm not sure>> there's any practical use-case for more than 30
groupingsets>> anyway.  Keep in mind the actual output volume probably grows like>> 2^N.
 
Dean> Actually using ROLLUP the output volume only grows linearly withDean> N. I tend to think that having such a large
numberof groupingDean> sets would be unlikely, however, it seems wrong to be putting anDean> arbitrary limit on it
that'ssignificantly smaller than theDean> number of columns allowed in a table.
 

Limit on what exactly?

Consider that in both MSSQL 2014 and Oracle 12 the limit on the number
of arguments in a GROUPING() expression is ... 1.

-- 
Andrew (irc:RhodiumToad)



Re: GROUPING

From
Dean Rasheed
Date:
On 21 May 2015 at 09:20, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>>> "Dean" == Dean Rasheed <dean.a.rasheed@gmail.com> writes:
>
>  >> Maybe INT8 would be a better choice than INT4?  But I'm not sure
>  >> there's any practical use-case for more than 30 grouping sets
>  >> anyway.  Keep in mind the actual output volume probably grows like
>  >> 2^N.
>
>  Dean> Actually using ROLLUP the output volume only grows linearly with
>  Dean> N. I tend to think that having such a large number of grouping
>  Dean> sets would be unlikely, however, it seems wrong to be putting an
>  Dean> arbitrary limit on it that's significantly smaller than the
>  Dean> number of columns allowed in a table.
>
> Limit on what exactly?
>
> Consider that in both MSSQL 2014 and Oracle 12 the limit on the number
> of arguments in a GROUPING() expression is ... 1.
>

Actually Oracle haven't quite followed the standard. They have 2
separate functions: GROUPING() which only allows 1 parameter, and
GROUPING_ID() which allows multiple parameters, and returns a bitmask
like our GROUPING() function. However, their GROUPING_ID() function
seems to return an arbitrary precision number and allows an arbitrary
number of parameters (well, I tested it up 70 to prove it wasn't a
64-bit number).

Regards,
Dean



Re: GROUPING

From
Andrew Gierth
Date:
>>>>> "Dean" == Dean Rasheed <dean.a.rasheed@gmail.com> writes:
>> Consider that in both MSSQL 2014 and Oracle 12 the limit on the number>> of arguments in a GROUPING() expression is
...1.
 
Dean> Actually Oracle haven't quite followed the standard. They have 2Dean> separate functions: GROUPING() which only
allows1 parameter, andDean> GROUPING_ID() which allows multiple parameters, and returns aDean> bitmask like our
GROUPING()function. However, theirDean> GROUPING_ID() function seems to return an arbitrary precisionDean> number and
allowsan arbitrary number of parameters (well, IDean> tested it up 70 to prove it wasn't a 64-bit number).
 

True. It can handle more than 128 bits, even - I gave up trying after that.

So. Options:

1) change GROUPING() to return bigint and otherwise leave it as is.

2) change GROUPING() to return numeric.

3) change GROUPING() so that the result type varies with the number of
args. I don't see anything in the spec that actually forbids this - it
just says the return type is implementation-defined exact numeric.

A) in addition to any of the above, implement GROUPING_ID() as a simple
alias for GROUPING().

4) leave GROUPING() alone and add a separate GROUPING_ID() with a
different return type.

B) We don't currently have GROUP_ID() - does anyone want it?

*) any other ideas?

-- 
Andrew (irc:RhodiumToad)



Re: GROUPING

From
Tom Lane
Date:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> So. Options:

> 1) change GROUPING() to return bigint and otherwise leave it as is.

> 2) change GROUPING() to return numeric.

> 3) change GROUPING() so that the result type varies with the number of
> args. I don't see anything in the spec that actually forbids this - it
> just says the return type is implementation-defined exact numeric.

I'd go with (1) --- it's cheap and doesn't lose any capability.  If you
do (2), you'd lose useful operations like &, unless you cast back to int
which pretty much defeats the purpose.  And (3) is just weird and will
confuse people, especially since numeric and int don't have identical
sets of operations.

Or we could just leave things alone.  But (1) seems like pretty cheap
insurance.


> A) in addition to any of the above, implement GROUPING_ID() as a simple
> alias for GROUPING().

I see no reason to do that.  We don't need to copy Oracle-isms; the
language is already complicated enough.
        regards, tom lane



Re: GROUPING

From
Andres Freund
Date:
On 2015-05-21 16:19:27 +0100, Andrew Gierth wrote:
> True. It can handle more than 128 bits, even - I gave up trying after that.
> 
> So. Options:
> 
> 1) change GROUPING() to return bigint and otherwise leave it as is.
> 
> 2) change GROUPING() to return numeric.
> 
> 3) change GROUPING() so that the result type varies with the number of
> args. I don't see anything in the spec that actually forbids this - it
> just says the return type is implementation-defined exact numeric.
> 
> A) in addition to any of the above, implement GROUPING_ID() as a simple
> alias for GROUPING().
> 
> 4) leave GROUPING() alone and add a separate GROUPING_ID() with a
> different return type.
> 
> B) We don't currently have GROUP_ID() - does anyone want it?

I'd vote for either 0) do nothing or 1). I think the use case for
specifying 64+ (or even 32+) columns in grouping is pretty darn
slim. And as you said, it's not that hard to work around it if you need
it, and that's only going to be in an automated fashion anyway.

Greetings,

Andres Freund



Re: GROUPING

From
David Fetter
Date:
On Thu, May 21, 2015 at 04:19:27PM +0100, Andrew Gierth wrote:
> >>>>> "Dean" == Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> 
>  >> Consider that in both MSSQL 2014 and Oracle 12 the limit on the number
>  >> of arguments in a GROUPING() expression is ... 1.
> 
>  Dean> Actually Oracle haven't quite followed the standard. They have 2
>  Dean> separate functions: GROUPING() which only allows 1 parameter, and
>  Dean> GROUPING_ID() which allows multiple parameters, and returns a
>  Dean> bitmask like our GROUPING() function. However, their
>  Dean> GROUPING_ID() function seems to return an arbitrary precision
>  Dean> number and allows an arbitrary number of parameters (well, I
>  Dean> tested it up 70 to prove it wasn't a 64-bit number).
> 
> True. It can handle more than 128 bits, even - I gave up trying after that.
> 
> So. Options:
> 
> 1) change GROUPING() to return bigint and otherwise leave it as is.

Seems cheap and reasonable.  Making sure people know that GROUPING can
be called multiple times seems like another cheap and reasonable
measure.

> *) any other ideas?

How about a more sensible data structure as a PG-specific addon.
GROUPING_JSON() seems like just the thing.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: GROUPING

From
Andrew Gierth
Date:
>>>>> "David" == David Fetter <david@fetter.org> writes:
David> How about a more sensible data structure as a PG-specific addon.David> GROUPING_JSON() seems like just the
thing.

What exactly do you think it should return?

-- 
Andrew (irc:RhodiumToad)



Re: GROUPING

From
Robert Haas
Date:
On Thu, May 21, 2015 at 12:21 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
>>>>>> "David" == David Fetter <david@fetter.org> writes:
>
>  David> How about a more sensible data structure as a PG-specific addon.
>  David> GROUPING_JSON() seems like just the thing.
>
> What exactly do you think it should return?

I vote for { "rube" : "goldberg" }.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: GROUPING

From
Dean Rasheed
Date:
On 21 May 2015 at 17:15, David Fetter <david@fetter.org> wrote:
> On Thu, May 21, 2015 at 04:19:27PM +0100, Andrew Gierth wrote:
>> >>>>> "Dean" == Dean Rasheed <dean.a.rasheed@gmail.com> writes:
>>
>>  >> Consider that in both MSSQL 2014 and Oracle 12 the limit on the number
>>  >> of arguments in a GROUPING() expression is ... 1.
>>
>>  Dean> Actually Oracle haven't quite followed the standard. They have 2
>>  Dean> separate functions: GROUPING() which only allows 1 parameter, and
>>  Dean> GROUPING_ID() which allows multiple parameters, and returns a
>>  Dean> bitmask like our GROUPING() function. However, their
>>  Dean> GROUPING_ID() function seems to return an arbitrary precision
>>  Dean> number and allows an arbitrary number of parameters (well, I
>>  Dean> tested it up 70 to prove it wasn't a 64-bit number).
>>
>> True. It can handle more than 128 bits, even - I gave up trying after that.
>>
>> So. Options:
>>
>> 1) change GROUPING() to return bigint and otherwise leave it as is.
>
> Seems cheap and reasonable.  Making sure people know that GROUPING can
> be called multiple times seems like another cheap and reasonable
> measure.
>

Yeah, seems reasonable. The lack of any bitwise operations on numeric
makes it pretty-much useless in this context.

Regards,
Dean



Re: GROUPING

From
Andrew Gierth
Date:
>>>>> "Andres" == Andres Freund <andres@anarazel.de> writes:

 Andres> I'd vote for either 0) do nothing or 1). I think the use case
 Andres> for specifying 64+ (or even 32+) columns in grouping is pretty
 Andres> darn slim. And as you said, it's not that hard to work around
 Andres> it if you need it, and that's only going to be in an automated
 Andres> fashion anyway.

If the vote goes with (1), this patch ought to suffice:

--
Andrew (irc:RhodiumToad)


Attachment

Re: GROUPING

From
David Fetter
Date:
On Thu, May 21, 2015 at 12:24:03PM -0400, Robert Haas wrote:
> On Thu, May 21, 2015 at 12:21 PM, Andrew Gierth
> <andrew@tao11.riddles.org.uk> wrote:
> >>>>>> "David" == David Fetter <david@fetter.org> writes:
> >
> >  David> How about a more sensible data structure as a PG-specific addon.
> >  David> GROUPING_JSON() seems like just the thing.
> >
> > What exactly do you think it should return?
> 
> I vote for { "rube" : "goldberg" }.

Your point is well taken.  I had { "target_list_name" : false, ... }

How about GROUPING_BYTEA()?

Also is there a really great reason that bitwise operations don't work
on NUMERIC?  Lack of tuits is a good reason, but not, it seems to me,
a great one.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: GROUPING

From
Tom Lane
Date:
David Fetter <david@fetter.org> writes:
> Also is there a really great reason that bitwise operations don't work
> on NUMERIC?  Lack of tuits is a good reason, but not, it seems to me,
> a great one.

Not sure that bitwise operations make too much sense on values that
are (a) possibly fractional and (b) inherently decimal not binary.
        regards, tom lane