Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)
Date
Msg-id 20200918133721.e6hjykgoy56365go@development
Whole thread Raw
In response to Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)  (Ranier Vilela <ranier.vf@gmail.com>)
Responses Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)
List pgsql-hackers
On Thu, Sep 17, 2020 at 06:31:12PM -0300, Ranier Vilela wrote:
>Em qui., 17 de set. de 2020 às 17:09, Tomas Vondra <
>tomas.vondra@2ndquadrant.com> escreveu:
>
>> On Thu, Sep 17, 2020 at 02:12:12PM -0300, Ranier Vilela wrote:
>> >Hi,
>> >
>> >In case gd->any_hashable is FALSE, grouping_is_hashable is never called.
>> >In this case, the planner could use HASH for groupings, but will never
>> know.
>> >
>>
>> The condition is pretty simple - if the query has grouping sets, look at
>> grouping sets, otherwise look at groupClause. For this to be an issue
>> the query would need to have both grouping sets and (independent) group
>> clause - which AFAIK is not possible.
>>
>Uh?
>(parse->groupClause != NIL) If different from NIL we have ((independent)
>group clause), grouping_is_hashable should check?
>(gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
>If gd is not NIL and gd->any_hashtable is FALSE?
>

Sorry, I'm not quite sure I understand what this is meant to say :-(

Anyway, (groupClause != NIL) does not mean the groupClause is somehow
independent (of what?). Add some debugging to create_grouping_paths and
you'll see that e.g. this query ends up with groupClause with 3 items:

     select 1 from t group by grouping sets ((a), (b), (c));

and so does this one:

     select 1 from t group by grouping sets ((a,c), (a,b));

I'm no expert in grouping sets, but I'd bet this means we transform the
grouping sets into a groupClause by extracting the keys. I haven't
investigated why exactly we do this, but I'm sure there's a reason (e.g.
it gives us SortGroupClause).

You seem to believe a query can have both grouping sets and regular
grouping at the same level - but how would such query look like? Surely
you can't have two GROuP BY clauses. You can do

     select 1 from t group by a, grouping sets ((b), (c));

which is however mostly equivalent to (AFAICS)

     select 1 from t group by grouping sets ((a,b), (a,c))

so it's not like we have an independent groupClause either I think.

The whole point of the original condition is - if there are grouping
sets, check if at least one can be executed using hashing (i.e. all keys
are hashable). Otherwise (without grouping sets) look at the grouping as
a whole.

I don't see how your change improves this - if there are grouping sets,
it's futile to look at the whole groupClause if at least one grouping
set can't be hashed.

But there's a simple way to disprove this - show us a case (table and a
query) where your code correctly allows hashing while the current one
does not.

>
>> For hashing to be worth considering, at least one grouping set has to be
>> hashable - otherwise it's pointless.
>>
>> Granted, you could have something like
>>
>>      GROUP BY GROUPING SETS ((a), (b)), c
>>
>> which I think essentially says "add c to every grouping set" and that
>> will be covered by the any_hashable check.
>>
>I am not going into the merit of whether or not it is worth hashing.
>grouping_is_hashable as a last resort, must decide.
>

I don't know what this is supposed to say either. The whole point of
this check is to simply skip construction of hash-based paths in cases
when it's obviously futile (i.e. when some of the keys don't support
hashing). We do this as early as possible, because the whole point is to
save precious CPU cycles during query planning.

>
>> >Apparently gd pointer, will never be NULL there, verified with Assert(gd
>> !=
>> >NULL).
>> >
>>
>> Um, what? If you add the assert right before the if condition, you won't
>> even be able to do initdb. It's pretty clear it'll crash for any query
>> without grouping sets.
>>
>Here not:
>Assert(gd != NULL);
>create_ordinary_grouping_paths(root, input_rel, grouped_rel,
>  agg_costs, gd, &extra,
>  &partially_grouped_rel);
>

I have no idea where you're adding this assert. But simply adding it to
create_grouping_paths (right before the if condition changed by your
patch) will trigger a failure during initdb. Simply because for queries
without grouping sets (and with regular grouping) we pass gs=NULL.

Try applying the attached patch and do "pg_ctl -D ... init" - you'll get
a failure proving that gd=NULL.

>Otherwise Coverity would be right.
>CID 1412604 (#1 of 1): Dereference after null check (FORWARD_NULL)13.
>var_deref_model: Passing null pointer gd to create_ordinary_grouping_paths,
>which dereferences it. [show details

><https://scan6.coverity.com/eventId=31389494-14&modelId=31389494-0&fileInstanceId=105740687&filePath=%2Fdll%2Fpostgres%2Fsrc%2Fbackend%2Foptimizer%2Fplan%2Fplanner.c&fileStart=4053&fileEnd=4180>]
>
>
>Which cannot be true, gd is never NULL here.
>

Or maybe coverity simply does not realize the NULL-dereference can't
happen, because it's guarded by other conditions, or something like
that ... As the assert failure demonstrates, we do indeed get NULLs
here, and it does not crash so coverity is missing something.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Andy Fan
Date:
Subject: Re: Dynamic gathering the values for seq_page_cost/xxx_cost
Next
From: Tomas Vondra
Date:
Subject: Re: Dynamic gathering the values for seq_page_cost/xxx_cost