Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c) - Mailing list pgsql-hackers

From Ranier Vilela
Subject Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)
Date
Msg-id CAEudQAqewaK8nbPNcrxvp7dADAL0wZwzNd9hsa22UuaDPrGNYA@mail.gmail.com
Whole thread Raw
In response to Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
Em seg., 21 de set. de 2020 às 13:37, Tomas Vondra <tomas.vondra@2ndquadrant.com> escreveu:
On Mon, Sep 21, 2020 at 08:32:35AM -0300, Ranier Vilela wrote:
>Em dom., 20 de set. de 2020 às 21:10, Tomas Vondra <
>tomas.vondra@2ndquadrant.com> escreveu:
>
>> On Sun, Sep 20, 2020 at 08:09:56PM -0300, Ranier Vilela wrote:
>> >Em sex., 18 de set. de 2020 às 10:37, Tomas Vondra <
>> >tomas.vondra@2ndquadrant.com> escreveu:
>> >
>> >> On Thu, Sep 17, 2020 at 06:31:12PM -0300, Ranier Vilela wrote:
>> >> >Em qui., 17 de set. de 2020 às 17:09, Tomas Vondra <
>> >> >tomas.vondra@2ndquadrant.com> escreveu:
>> >> >
>> >> >> On Thu, Sep 17, 2020 at 02:12:12PM -0300, Ranier Vilela wrote:
>> >> >> >Hi,
>> >> >> >
>> >> >> >In case gd->any_hashable is FALSE, grouping_is_hashable is never
>> >> called.
>> >> >> >In this case, the planner could use HASH for groupings, but will
>> never
>> >> >> know.
>> >> >> >
>> >> >>
>> >> >> The condition is pretty simple - if the query has grouping sets,
>> look at
>> >> >> grouping sets, otherwise look at groupClause. For this to be an issue
>> >> >> the query would need to have both grouping sets and (independent)
>> group
>> >> >> clause - which AFAIK is not possible.
>> >> >>
>> >> >Uh?
>> >> >(parse->groupClause != NIL) If different from NIL we have
>> ((independent)
>> >> >group clause), grouping_is_hashable should check?
>> >> >(gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
>> >> >If gd is not NIL and gd->any_hashtable is FALSE?
>> >> >
>> >>
>> >> Sorry, I'm not quite sure I understand what this is meant to say :-(
>> >>
>> >> Anyway, (groupClause != NIL) does not mean the groupClause is somehow
>> >> independent (of what?). Add some debugging to create_grouping_paths and
>> >> you'll see that e.g. this query ends up with groupClause with 3 items:
>> >>
>> >>      select 1 from t group by grouping sets ((a), (b), (c));
>> >>
>> >> and so does this one:
>> >>
>> >>      select 1 from t group by grouping sets ((a,c), (a,b));
>> >>
>> >> I'm no expert in grouping sets, but I'd bet this means we transform the
>> >> grouping sets into a groupClause by extracting the keys. I haven't
>> >> investigated why exactly we do this, but I'm sure there's a reason (e.g.
>> >> it gives us SortGroupClause).
>> >>
>> >> You seem to believe a query can have both grouping sets and regular
>> >> grouping at the same level - but how would such query look like? Surely
>> >> you can't have two GROuP BY clauses. You can do
>> >>
>> >Translating into code:
>> >gd is grouping sets and
>> >parse->groupClause is regular grouping
>> >and we cannot have both at the same time.
>> >
>>
>> Have you verified the claim that we can't have both at the same time? As
>> I already explained, we build groupClause from grouping sets. I suggest
>> you do some debugging using the queries I used as examples, and you'll
>> see the claim is not true.
>>
>I think we already agreed that we cannot have both at the same time.
>

No, we haven't. I think we've agreed that you can't have both a group by
clause with grouping sets and then another group by clause with "plain"
grouping. But even with grouping sets you'll have groupClause generated
from the grouping sets. See preprocess_grouping_sets.

>
>>
>> >
>> >>      select 1 from t group by a, grouping sets ((b), (c));
>> >>
>> >> which is however mostly equivalent to (AFAICS)
>> >>
>> >>      select 1 from t group by grouping sets ((a,b), (a,c))
>> >>
>> >> so it's not like we have an independent groupClause either I think.
>> >>
>> >> The whole point of the original condition is - if there are grouping
>> >> sets, check if at least one can be executed using hashing (i.e. all keys
>> >> are hashable). Otherwise (without grouping sets) look at the grouping as
>> >> a whole.
>> >>
>> >Or if gd is NULL check  parse->groupClause.
>> >
>>
>> Which is what I'm saying.
>>
>> >
>> >> I don't see how your change improves this - if there are grouping sets,
>> >> it's futile to look at the whole groupClause if at least one grouping
>> >> set can't be hashed.
>> >>
>> >> But there's a simple way to disprove this - show us a case (table and a
>> >> query) where your code correctly allows hashing while the current one
>> >> does not.
>> >
>> >I'm not an expert in grouping either.
>> >The question I have here is whether gd is populated and has gd->
>> >any_hashable as FALSE,
>> >Its mean no use checking parse-> groupClause, it's a waste of time, Ok.
>> >
>>
>> I'm sorry, I don't follow your logic. Those are two separate cases. If
>> we have grouping sets, we have to check if at least one can be hashed.
>> If we don't have grouping sets, we have to check groupClause directly.
>> Why would that be a waste of time is unclear to me.
>>
>This is clear to me.
>The problem is how it was implemented in create_grouping_paths.
>

You haven't demonstrated what the problem is, though. Showing us a query
that fails would make it very clear.

>
>>
>> >
>> >>
>> >> >
>> >> >> For hashing to be worth considering, at least one grouping set has
>> to be
>> >> >> hashable - otherwise it's pointless.
>> >> >>
>> >> >> Granted, you could have something like
>> >> >>
>> >> >>      GROUP BY GROUPING SETS ((a), (b)), c
>> >> >>
>> >> >> which I think essentially says "add c to every grouping set" and that
>> >> >> will be covered by the any_hashable check.
>> >> >>
>> >> >I am not going into the merit of whether or not it is worth hashing.
>> >> >grouping_is_hashable as a last resort, must decide.
>> >> >
>> >>
>> >> I don't know what this is supposed to say either. The whole point of
>> >> this check is to simply skip construction of hash-based paths in cases
>> >> when it's obviously futile (i.e. when some of the keys don't support
>> >> hashing). We do this as early as possible, because the whole point is to
>> >> save precious CPU cycles during query planning.
>> >>
>> >
>> >> >
>> >> >> >Apparently gd pointer, will never be NULL there, verified with
>> >> Assert(gd
>> >> >> !=
>> >> >> >NULL).
>> >> >> >
>> >> >>
>> >> >> Um, what? If you add the assert right before the if condition, you
>> won't
>> >> >> even be able to do initdb. It's pretty clear it'll crash for any
>> query
>> >> >> without grouping sets.
>> >> >>
>> >> >Here not:
>> >> >Assert(gd != NULL);
>> >> >create_ordinary_grouping_paths(root, input_rel, grouped_rel,
>> >> >  agg_costs, gd, &extra,
>> >> >  &partially_grouped_rel);
>> >> >
>> >>
>> >> I have no idea where you're adding this assert. But simply adding it to
>> >> create_grouping_paths (right before the if condition changed by your
>> >> patch) will trigger a failure during initdb. Simply because for queries
>> >> without grouping sets (and with regular grouping) we pass gs=NULL.
>> >>
>> >> Try applying the attached patch and do "pg_ctl -D ... init" - you'll get
>> >> a failure proving that gd=NULL.
>> >>
>> >Well, I did a test right now, downloaded the latest HEAD and added the
>> >Assertive.
>> >I compiled everything, ran the regression tests, installed the binaries,
>> >loaded the server and installed a client database.
>> >Everything is working.
>> >Maybe in all these steps, there is no grouping that would trigger the code
>> >in question, but I doubt it.
>> >
>>
>> For me doing this leads to reliable crashes when pg_regress does initdb
>> (so before the actual checks):
>>
>>    patch -p1 < ~/grouping-assert.patch
>>    ./configure --enable-debug --enable-cassert
>>    make -s clean && make -s -j4 && make check
>>
>> And the "make check" it immediately crashes like this:
>>
>>    ============== creating temporary instance            ==============
>>    ============== initializing database system           ==============
>>
>>    pg_regress: initdb failed
>>    Examine /home/user/work/postgres/src/test/regress/log/initdb.log for
>> the reason.
>>
>> on the assert. So yeah, I'd guess there's something wrong with your
>> build. What does pg_config say?
>>
>Default.
>I never change anything. I simply clone the last head:
>git clone --single-branch https://github.com/postgres/postgres/
>and have it compiled.
>

Compiled how?
build Debug

regards,
Ranier Vilela

pgsql-hackers by date:

Previous
From: Dmitry Dolgov
Date:
Subject: Re: [HACKERS] [PATCH] Generic type subscripting
Next
From: Tom Lane
Date:
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."