Re: [HACKERS] advanced partition matching algorithm forpartition-wise join - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: [HACKERS] advanced partition matching algorithm forpartition-wise join
Date
Msg-id CAFjFpRdjYWqbhL8L36o4NxZccx_ok7SnKdBfr++Dn4Tmvu35NQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] advanced partition matching algorithm forpartition-wise join  (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses Re: [HACKERS] advanced partition matching algorithm forpartition-wise join  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
List pgsql-hackers
Thanks for your review comments. They are helpful.

On Thu, Mar 29, 2018 at 2:53 AM, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

> But actually you also mentioned
> another topic that bothers me about this patch. Different parts of the
> algorithm implementation (at least for functions that build maps of matching
> partitions) are quite dependent on each other in terms of shared state. At
> first glance in `partition_range_bounds_merge` we have about a dozen of
> variables of different mutability level, that affect the control flow:
>
>     outer_lb_index
>     inner_lb_index
>     merged
>     merged_index
>     overlap
>     merged_lb
>     merged_ub
>     finished_outer
>     finished_inner
>     ub_cmpval
>     lb_cmpval
>     inner_has_default
>     outer_has_default
>     jointype
>
> It looks a bit too much for me, and would require commentaries like "if you
> changed the logic here, also take a look there". But I'm not saying that I have
> any specific suggestions how to change it (although I'll definitely try to do
> so, at least to get some better understanding of the underlying algorithm).

I am working on commenting portions of the code to make it more clear
and readable. Will update the patches with the comments soon, mostly
this Monday.

>
> Just to make myself clear, I wanted to suggest not to change the commentary for
> `IS_PARTITIONED_REL` significantly, but just add a sentence that you need to
> check if given relation is fully set up.

For that we have REL_HAS_ALL_PART_PROPS. IS_PARTITIONED_REL is really
checking whether a relation is partitioned and following comment makes
it clear why we have to check all those things. So, I still don't see
why we need to change the comment there.

 * It's not enough to test whether rel->part_scheme is set, because it might
 * be that the basic partitioning properties of the input relations matched
 * but the partition bounds did not.

> Also, few more random notes (mostly related to readability, since I found some
> parts of the patch hard to read, but of course it's arguable).
>
> ```
>     PartitionRangeBound outer_lb;
>     PartitionRangeBound outer_ub;
>     PartitionRangeBound inner_lb;
>     PartitionRangeBound inner_ub;
>     PartitionRangeBound *merged_lb = NULL;
>     PartitionRangeBound *merged_ub = NULL;
> ```
>
> Maybe it would be better to not repeat the type here? Let's say:
>
> ```
>     PartitionRangeBound outer_lb,
>                         outer_ub,

PostgreSQL code uses both the styles, but I like declarations with the
type is right there along side the variable. It also makes it easy to
delete the variable or modify its type, without causing unnecessary
diff on adjacent lines.


> ```
> partition_range_bounds_merge(int partnatts, FmgrInfo *partsupfuncs,
>                              Oid *partcollations, PartitionBoundInfo outer_bi,
>                              int outer_nparts, PartitionBoundInfo inner_bi,
>                              int inner_nparts, JoinType jointype,
>                              List **outer_parts, List **inner_parts)
> ```
>
> From what I see in `partition.c` there are a lot functions that accept
> `partnatts`, `partcollations` only to pass it down to, e.g.
> `partition_rbound_cmp`.
> What do you think about introducing a data structure to keep these arguments,
> and pass only an instance of this structure instead?

I have discussed this in an adjacent thread, but let me repeat it
here. Those members come from PartitionKey or PartitionScheme. If we
pack those in a structure and not include it in those two
structures,we will just create a structure before calling the function
and unpack those inside the comparison function e.g.
partition_rbound_cmp(). We could argue that that will benefit
intermediate functions like partition_range_bound_cmp(), which just
pass those values down, but there is all the possibility that its
future caller may not have that packing structure available readily.
So, I am inclined not to add a new structure just for this.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] [PATCH] Lockable views
Next
From: Aleksander Alekseev
Date:
Subject: Re: Cast jsonb to numeric, int, float, bool