Re: Final Patch for GROUPING SETS - Mailing list pgsql-hackers

From Andrew Gierth
Subject Re: Final Patch for GROUPING SETS
Date
Msg-id 87wq5jgv1c.fsf@news-spur.riddles.org.uk
Whole thread Raw
In response to Re: Final Patch for GROUPING SETS  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Final Patch for GROUPING SETS  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
>>>>> "Robert" == Robert Haas <robertmhaas@gmail.com> writes:
>> I would be interested in seeing more good examples of the size and>> type of grouping sets used in typical queries.
Robert> From what I have seen, there is interest in being able to doRobert> things like GROUP BY CUBE(a, b, c, d) and
havethat beRobert> efficient.
 

Yes, but that's not telling me anything I didn't already know.

What I'm curious about is things like:
- what's the largest cube(...) people actually make use of in practice
- do people make much use of the ability to mix cube and rollup, or  take the cross product of multiple grouping sets
- what's the most complex GROUPING SETS clause anyone has seen in  common use
Robert> That will require 16 different groupings, and we really wantRobert> to minimize the number of times we have to
re-sortto get allRobert> of those done.  For example, if we start by sorting on (a, b,Robert> c, d), we want to then
makea single pass over the dataRobert> computing the aggregates with (a, b, c, d), (a, b, c), (a,Robert> b), (a), and
()as the grouping columns.
 

In the case of cube(a,b,c,d), our code currently gives:

b,d,a,c:  (b,d,a,c),(b,d)
a,b,d:    (a,b,d),(a,b)
d,a,c:    (d,a,c),(d,a),(d)
c,d:      (c,d),(c)
b,c,d:    (b,c,d),(b,c),(b)
a,c,b:    (a,c,b),(a,c),(a),()

There is no solution in less than 6 sorts. (There are many possible
solutions in 6 sorts, but we don't attempt to prefer one over
another. The minimum number of sorts for a cube of N dimensions is
obviously N! / (r! * (N-r)!) where r = floor(N/2).)

If you want the theory: the set of grouping sets is a poset ordered by
set inclusion; what we want is a minimal partition of this poset into
chains (since any chain can be processed in one pass), which happens
to be equivalent to the problem of maximum cardinality matching in a
bipartite graph, which we solve in polynomial time with the
Hopcroft-Karp algorithm.  This guarantees us a minimal solution for
any combination of grouping sets however specified, not just for
cubes.

-- 
Andrew (irc:RhodiumToad)



pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: pgbench -f and vacuum
Next
From: Peter Geoghegan
Date:
Subject: Re: Doing better at HINTing an appropriate column within errorMissingColumn()