Re: [HACKERS] Hash support for grouping sets - Mailing list pgsql-hackers

From Andrew Gierth
Subject Re: [HACKERS] Hash support for grouping sets
Date
Msg-id 8760j0abpl.fsf@news-spur.riddles.org.uk
Whole thread Raw
In response to Re: [HACKERS] Hash support for grouping sets  (Andres Freund <andres@anarazel.de>)
Responses Re: [HACKERS] Hash support for grouping sets  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
>>>>> "Andres" == Andres Freund <andres@anarazel.de> writes:
>> -    Assert(newphase == 0 || newphase == aggstate->current_phase + 1);>> +    Assert(newphase <= 1 || newphase ==
aggstate->current_phase+ 1);
 
Andres> I think this somewhere in the file header needs an expandedAndres> explanation about what these "special"
phasesmean.
 

I could move most of the block comment for ExecInitAgg up into the file
header; would that be better?
Andres> I've not yet read up on the thread, or the whole patch, but anAndres> explanation of the memory management for
allthose tables wouldAndres> be good somewhere (maybe I've just not read that far).
 

I can expand on that.
Andres> "mixed types" sounds a bit weird.

changing to "if there are both types present"
>> +    values = palloc((1 + max_weight) * sizeof(double));>> +    sets = palloc((1 + max_weight) * sizeof(Bitmapset
*));
Andres> We usually cast the result of palloc.

Rough count in the backend has ~400 without casts to ~1350 with, so this
doesn't seem to have been consistently enforced.
>> +static void>> +_outGroupingSetData(StringInfo str, const GroupingSetData *node)>> +{>> +
WRITE_NODE_TYPE("GSDATA");>>+>> +    WRITE_NODE_FIELD(set);>> +    WRITE_FLOAT_FIELD(numGroups, "%.0f");>> +}
 
Andres> .0f?

Copied from every other node that uses "%.0f" to write out a numGroups
value.
>> +static grouping_sets_data *>> +preprocess_grouping_sets(PlannerInfo *root)>> +{
Andres> It'd not hurt to add a comment as to what this function isAndres> actually doing.

Sure.
Andres> I hope there's tests for both these branches.

A number of tests for both unsortable and unhashable cases are in the
regression test.
>> +        /*>> +         * Is it hashable? We pretend empty sets are hashable even though we>> +         * actually
forcethem not to be hashed later. But don't bother if>> +         * there's nothing but empty sets.>> +         */
 
Andres> Why?

If there's no non-empty grouping set then there's nothing to hash (the
hash code assumes at least one column). I'll put that in the comment.
>> @@ -3214,6 +3407,11 @@ get_number_of_groups(PlannerInfo *root,>> * estimate_hashagg_tablesize>> *      estimate the
numberof bytes that a hash aggregate hashtable will>> *      require based on the agg_costs, path width and
dNumGroups.>>+ *>> + * XXX this may be over-estimating the size now that hashagg knows to omit>> + * unneeded columns
fromthe hashtable. Also for mixed-mode grouping sets,>> + * grouping columns not in the hashed set are counted here
eventhough hashagg>> + * won't store them. Is this a problem?>> */
 
Andres> Hm. This seems like it could move us to use sorting, even ifAndres> not actually warranted.

This is largely a pre-existing issue that this patch doesn't try and
fix.  That would be an issue for a separate patch, I think.  I merely
added the comment to point it out.
Andres> What's the newline after the comment about?

Old style habits die hard.
>> +                            RelOptInfo *grouped_rel,>> +                            Path *path,>> +
          bool is_sorted,>> +                            bool can_hash,>> +                            PathTarget
*target,>>+                            grouping_sets_data *gd,>> +                            const AggClauseCosts
*agg_costs,>>+                            double dNumGroups)>> +{>> +    Query       *parse = root->parse;>> +>> +
/*>>+     * If we're not being offered sorted input, then only consider plans that>> +     * can be done entirely by
hashing.>>+     *>> +     * We can hash everything if it looks like it'll fit in work_mem. But if>> +     * the input
isactually sorted despite not being advertised as such, we>> +     * prefer to make use of that in order to use less
memory.>>+     * If none of the grouping sets are sortable, then ignore the work_mem>> +     * limit and generate a
pathanyway, since otherwise we'll just fail.>> +     */>> +    if (!is_sorted)>> +    {>> +        List
*new_rollups= NIL;>> +        RollupData *unhashable_rollup = NULL;>> +        List       *sets_data;>> +        List
   *empty_sets_data = NIL;>> +        List       *empty_sets = NIL;>> +        ListCell   *lc;>> +        ListCell
*l_start= list_head(gd->rollups);>> +        AggStrategy strat = AGG_HASHED;>> +        Size        hashsize;>> +
double        exclude_groups = 0.0;>> +>> +        Assert(can_hash);>> +>> +        if
(pathkeys_contained_in(root->group_pathkeys,path->pathkeys))>> +        {>> +            unhashable_rollup =
lfirst(l_start);
Andres> a) cast result of lfirst/lnext/whatnot.

casting lfirst is defensible (and only about 10% of existing uses don't
cast it), but why would you ever cast the result of lnext?
Andres> b) Why is this guaranteed to be unhashable?

It might not be unhashable; that probably needs a better name. It's the
rollup that we aren't going to hash because we're being presented with
input already in correct order. Also, it's the only rollup that can
contain empty grouping sets, which the hash code can't cope with.

Obviously this needs a comment and some consideration of naming.
>> +        if (hashsize > work_mem * 1024L && gd->rollups)>> +            return;                /* nope, won't fit
*/
Andres> Didn't you above talk about generating a path anyway?  Or isAndres> rollups guaranteed not to be set in that
case(if so, notAndres> obvious on a casual read)?
 

Unsortable groups can't participate in rollups. The presence of a rollup
implies that we're going to be offered at least one is_sorted input
path.
>> +    if (can_hash && gd->any_hashable)>> +    {>> +        List       *rollups = NIL;>> +        List
*hash_sets= list_copy(gd->unsortable_sets);>> +        double        availspace = (work_mem * 1024.0);
 
Andres> Why the parens, and why the .0?

The .0 is because the * should be done as a double, not an int.  This
isn't an uncommon idiom, there are quite a few similar uses in the
planner alone.

The parens because I thought it was clearer that way.
Andres> I think you need a higher level explanation of how you'reAndres> mapping the work_mem issue to knapsack
somewhere.

The general overview ("work_mem is the knapsack size, and we need to
figure out how best to pack it with hashtables"), or the specific issue
of the scale factor for discrete chunking of the size?
Andres> Hm.  So we're solely optimizing for the number of sorts, ratherAndres> the cost of the sorts.  Probably an
acceptabletradeoff.
 

The existing cost model for sorts doesn't actually seem to help us here;
all of our sorts sort the same data, just with different comparison
columns, but cost_sort doesn't actually account for that (all its
callers except the CLUSTER planning code pass in 0.0 for
comparison_cost).

If the sort costs were correct, we could compute a "value" for each
rollup that reflected the cost difference between the sort+unique
comparisons for it, and the hash functions that would be called if we
hashed instead; and just pass that to the knapsack function.

-- 
Andrew (irc:RhodiumToad)



pgsql-hackers by date:

Previous
From: Andrew Borodin
Date:
Subject: Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
Next
From: Craig Ringer
Date:
Subject: Re: [HACKERS] Logical decoding on standby