Re: [HACKERS] Partition-wise aggregation/grouping - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: [HACKERS] Partition-wise aggregation/grouping
Date
Msg-id CAFjFpRddHm6XuKkjRpMNTZBa20J4FZe5Fydn8eM=Y9Vr0Y-tHg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Partition-wise aggregation/grouping  (Jeevan Chalke <jeevan.chalke@enterprisedb.com>)
List pgsql-hackers
On Thu, Nov 23, 2017 at 6:38 PM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
>
>
> Agree. However, there was ab existing comment in create_append_path() saying
> "We don't bother with inventing a cost_append(), but just do it here", which
> implies at sometime in future we may need it; so why not now where we are
> explicitly costing for an append node. Having a function is good so that, if
> required in future, we need update in only this function.
> Let me know if you think otherwise, I make those changes in next patchset.
>

I don't read that comment as something we will do it in future. I
don't think the amount of changes that this patch introduces just for
adding one more line of code aren't justified. There's anyway only one
place where we are costing append, so it's not that the new function
avoids code duplication. Although I am happy to defer this to the
committer, if you think that we need a separate function.

>
> As suggested by Robert, I have renamed it to APPEND_CPU_COST_MULTIPLIER in
> v7 patchset.
> Also, retained the #define for just multiplier as suggested by Robert.

Ok.


>
>>
>>
>> Anyway, it doesn't look like a good idea to pass an argument (gd) only to
>> return from that function in case of its presence. May be we should handle
>> it
>> outside this function.
>
>
> Well, I would like to have it inside the function itself. Let the function
> itself do all the necessary checking rather than doing some of them outside.

We will leave this to the committer. I don't like that style, but it's
also good to expect a function to do all related work.


>>
>> +        if (!create_child_grouping_paths(root, input_child_rel,
>> agg_costs, gd,
>> +                                         &extra))
>> +        {
>> +            /* Could not create path for childrel, return */
>> +            pfree(appinfos);
>> +            return;
>> +        }
>>
>> Can we detect this condition and bail out even before planning any of the
>> children? It looks wasteful to try to plan children only to bail out in
>> this
>> case.
>
>
> I don't think so. It is like non-reachable and added just for a safety in
> case we can't able to create a child path. The bail out conditions cannot be
> evaluated at the beginning. Do you this an Assert() will be good here? Am I
> missing something?

An Assert would help. If it's something that should not happen, we
should try catching that rather that silently ignoring it.

>
>>
>> +    /* Nothing to do if we have no live children */
>> +    if (live_children == NIL)
>> +        return;
>>
>> A parent relation with all dummy children will also be dummy. May be we
>> should
>> mark the parent dummy case using mark_dummy_rel() similar to
>> generate_partition_wise_join_paths().
>
>
> If parent is dummy, then we are not at all doing PWA. So no need to mark
> parent grouped_rel as dummy I guess.
> However, if some of the children are dummy, I am marking corresponding upper
> rel as dummy too.
> Actually, this condition will never going to be true as you said correctly
> that "A parent relation with all dummy children will also be dummy". Should
> we have an Assert() instead?

Yes.


>
>
> I have testcase for multi-level partitioned table.
> However, I did not understand by what you mean by "children with different
> order of partition key columns". I had a look over tests in
> partition_join.sql and it seems that I have cover all those scenarios.
> Please have a look over testcases added for PWA and let me know the
> scenarios missing, I will add them then.

By children with different order of partition key columns, I meant
something like this

parent(a int, b int, c int) partition by (a), child1(b int, c int, a
int) partition by b, child1_1 (c int, a int, b int);

where the attribute numbers of the partition keys in different
children are different.

>> This looks like a useful piece of general functionality
>> list_has_intersection(), which would returns boolean instead of the whole
>> intersection. I am not sure whether we should add that function to list.c
>> and
>> use here.
>
>
> Sounds good.
> But for now, I am keeping it as part of this feature itself.

ok

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: [HACKERS] HASH_CHUNK_SIZE vs malloc rounding
Next
From: Amit Langote
Date:
Subject: Re: [HACKERS] path toward faster partition pruning