Re: Parallel grouping sets - Mailing list pgsql-hackers

From Pengzhou Tang
Subject Re: Parallel grouping sets
Date
Msg-id CAG4reASx+p3z5W59O57xhPC57Z4MW3mSnE=s-MJGromnmgg2fA@mail.gmail.com
Whole thread Raw
In response to Re: Parallel grouping sets  (Jesse Zhang <sbjesse@gmail.com>)
Responses Re: Parallel grouping sets  (Richard Guo <guofenglinux@gmail.com>)
List pgsql-hackers
Thanks to reviewing those patches.

Ha, I believe you meant to say a "normal aggregate", because what's
performed above gather is no longer "grouping sets", right?

The group key idea is clever in that it helps "discriminate" tuples by
their grouping set id. I haven't completely thought this through, but my
hunch is that this leaves some money on the table, for example, won't it
also lead to more expensive (and unnecessary) sorting and hashing? The
groupings with a few partials are now sharing the same tuplesort with
the groupings with a lot of groups even though we only want to tell
grouping 1 *apart from* grouping 10, not neccessarily that grouping 1
needs to come before grouping 10. That's why I like the multiplexed pipe
/ "dispatched by grouping set id" idea: we only pay for sorting (or
hashing) within each grouping. That said, I'm open to the criticism that
keeping multiple tuplesort and agg hash tabes running is expensive in
itself, memory-wise ...

Cheers,
Jesse

That's something we need to testing, thanks. Meanwhile, for the approach to
use "normal aggregate" with grouping set id, one concern is that it cannot use
Mixed Hashed which means if a grouping sets contain both non-hashable or
non-sortable sets, it will fallback to one-phase aggregate. 

pgsql-hackers by date:

Previous
From: nuko yokohama
Date:
Subject: Re: Implementing Incremental View Maintenance
Next
From: Amit Langote
Date:
Subject: Re: Identifying user-created objects