Re: Add enable_groupagg GUC parameter to control GroupAggregate usage - Mailing list pgsql-hackers

From Tatsuro Yamada
Subject Re: Add enable_groupagg GUC parameter to control GroupAggregate usage
Date
Msg-id CAOKkKFvBi5tNaPkP-VEUcAr=nek0nm9=sUnP+dmqCm-vmnL9DA@mail.gmail.com
Whole thread Raw
In response to Add enable_groupagg GUC parameter to control GroupAggregate usage  (Tatsuro Yamada <yamatattsu@gmail.com>)
List pgsql-hackers
Hi David,

>Typically, in the regression tests we've used enable_sort to force a
>HashAgg. There are certainly times when that's not good enough and you
>might also need to disabe enable_indexscan too, so I understand the
desire to add this GUC.

Thank you for the explanation.
I wasn't aware that enable_sort could be used to switch from GroupAgg
to HashAgg.
From a user's perspective, I think many would expect that if
enable_hashagg exists, then enable_groupagg would as well. Adding 
such a GUC parameter seems intuitive and reasonable.So, I believe 
there's value in introducing the parameter I proposed.

On the other hand, if this technique (using enable_sort) is already
widely known and commonly used, and I'm simply unfamiliar with it,
then perhaps adding this GUC might not be worth the effort.

If several people support the idea of adding this GUC parameter,
I'm thinking of creating and submitting a patch that incorporates your
comment below. I believe both David and Ashutosh support the proposal.
Can I go ahead as planned?

What do others involved in planner and plan-tuning think?


>It's probably going to be worth going over the regression tests to
>find where we use enable_sort to disable GroupAgg and replace those
>with your new GUC. Otherwise, people looking at those tests in the
>future will be a bit confused as to why the test didn't just SET
>enable_groupagg TO false;  These will likely be good enough to serve
>as your test, rather than creating a new table to test this feature.

I agree.
Replacing existing enable_sort usages in regression tests with the new
GUC parameter would help future developers understand the intent more
clearly.
Also, I realized we can sufficiently test the feature without creating
a new table.


>I think you should also look at create_setop_path(), as I imagine that
>the same arguments for using enable_hashagg in that function apply
>equally to enable_groupagg.

I looked into create_setop_path(), and I see that the aggregation
method is currently controlled as follows:
=====
        /*
         * Mark the path as disabled if enable_hashagg is off.  While this
         * isn't exactly a HashAgg node, it seems close enough to justify
         * letting that switch control it.
         */
        if (!enable_hashagg)
            pathnode->path.disabled_nodes++;
=====
I plan to add enable_groupagg handling in a similar way.


>+ if (aggstrategy == AGG_SORTED && !enable_groupagg && enable_hashagg)
>+ ++disabled_nodes;
>
>This code looks a bit strange. You're only going to disable it if hash
>agg is enabled? If they're both disabled, let add_path() decide.
>Anyone who complains that they didn't get the aggregate type they
>wanted with both enable_hashagg and enable_groupagg set to off hasn't
>got a leg to stand on.

My intention was to make groupagg the fallback when both are disabled.
However, I understand that if the both are disabled, it's acceptable to
let the planner decide which strategy to use. I'll revise that part accordingly.


Thanks,
Tatsuro Yamada

pgsql-hackers by date:

Previous
From: Dmitry Dolgov
Date:
Subject: Re: Changing shared_buffers without restart
Next
From: Rahila Syed
Date:
Subject: Re: add function for creating/attaching hash table in DSM registry