Home > mailing lists

Re: [HACKERS] [POC] hash partitioning - Mailing list pgsql-hackers

From	amul sul
Subject	Re: [HACKERS] [POC] hash partitioning
Date	May 12, 2017 16:34:23
Msg-id	CAAJ_b94Ee2KAcBWVpYO5Wf_he9_o7uKk39LJwirkdbMJh3ivLw@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] [POC] hash partitioning (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
List	pgsql-hackers

Tree view

On Wed, May 10, 2017 at 6:04 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Wed, May 3, 2017 at 6:39 PM, amul sul <sulamul@gmail.com> wrote:
>> On Thu, Apr 27, 2017 at 1:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>>>
>>> This is not yet a detailed review - I may be missing things, and
>>> review and commentary from others is welcome.  If there is no major
>>> disagreement with the idea of moving forward using Amul's patch as a
>>> base, then I will do a more detailed review of that patch (or,
>>> hopefully, an updated version that addresses the above comments).
>>
>
> I agree that Amul's approach makes dump/restore feasible whereas
> Nagata-san's approach makes that difficult. That is a major plus point
> about Amul's patch. Also, it makes it possible to implement
> Nagata-san's syntax, which is more user-friendly in future.
>
> Here are some review comments after my initial reading of Amul's patch:
>
> Hash partitioning will partition the data based on the hash value of the
> partition key. Does that require collation? Should we throw an error/warning if
> collation is specified in PARTITION BY clause?
>
> +    int           *indexes;        /* Partition indexes; in case of hash
> +                                 * partitioned table array length will be
> +                                 * value of largest modulus, and for others
> +                                 * one entry per member of the datums array
> +                                 * (plus one if range partitioned table) */
> This may be rewritten as "Partition indexes: For hash partitioned table the
> number of indexes will be same as the largest modulus. For list partitioned
> table the number of indexes will be same as the number of datums. For range
> partitioned table the number of indexes will be number of datums plus one.".
> You may be able to reword it to a shorter version, but essentially we will have
> separate description for each strategy.
>
Okay, will fix this.

> I guess, we need to change the comments for the other members too. For example
> "datums" does not contain tuples with key->partnatts attributes for hash
> partitions. It contains a tuple with two attributes, modulus and remainder. We
> may not want to track null_index separately since rows with NULL partition key
> will fit in the partition corresponding to the hash value of NULL. OR may be we
> want to set null_index to partition which contains NULL values, if there is a
> partition created for corresponding remainder, modulus pair and set has_null
> accordingly. Accordingly we will need to update the comments.
>
> cal_hash_value() may be renamed as calc_has_value() or compute_hash_value()?
>
Okay, will rename to compute_hash_value().

> Should we change the if .. else if .. construct in RelationBuildPartitionDesc()
> to a switch case? There's very less chance that we will support a fourth
> partitioning strategy, so if .. else if .. may be fine.
>
> +                        int        mod = hbounds[i]->modulus,
> +                                place = hbounds[i]->remainder;
> Although there are places in the code where we separate variable declaration
> with same type by comma, most of the code declares each variable with the data
> type on separate line. Should variable "place" be renamed as "remainder" since
> that's what it is ultimately?
>
Okay, will rename "place" to "remainder".

> RelationBuildPartitionDesc() fills up mapping array but never uses it. In this

Agreed, mapping array is not that much useful but not useless, it
required at the end of RelationBuildPartitionDesc() while assigning
OIDs to result->oids, see for-loop just before releasing mapping
memory.

> code the index into mapping array itself is the mapping so it doesn't need to
> be maintained separately like list partiioning case. Similary next_index usage
> looks unnecessary, although that probably improves readability, so may be fine.
>
Anyway, will remove uses of "next_index".

> + *   for p_p1: satisfies_hash_partition(2, 1, pkey, value)
> + *   for p_p2: satisfies_hash_partition(4, 2, pkey, value)
> + *   for p_p3: satisfies_hash_partition(8, 0, pkey, value)
> + *   for p_p4: satisfies_hash_partition(8, 4, pkey, value)
> What the function builds is satisfies_hash_partition(2, 1, pkey). I don't see
> code to add value as an argument to the function. Is that correct?
>
Sorry for confusion,  "pkey" & "value" are the column of table in the
give example.
Renamed those column name to "a" & "b".

> +                        int        modulus = DatumGetInt32(datum);
> May be you want to rename this variable to greatest_modulus like in the other
> places.
>
Okay, will fix this.

> +                        Assert(spec->modulus > 0 && spec->remainder >= 0);
> I liked this assertion. Do you want to add spec->modulus > spec->reminder also
> here?
>
Okay, will add this too.

> +    char       *strategy;        /* partitioning strategy
> +                                   ('hash', 'list' or 'range') */
>
> We need the second line to start with '*'
>
> +-- check validation when attaching list partitions
> Do you want to say "hash" instead of "list" here?
>
You are correct, will fix this too.

> I think we need to explain the reasoning behind this syntax somewhere
> as a README or in the documentation or in the comments. Otherwise it's
> difficult to understand how various pieces of code are related.
>
Not sure about README, I think we should focus on documentation & code
comments first, and then think about developer perspective README if
hash partitioning logic is too difficult to understand .

> This is not full review. I am still trying to understand how the hash
> partitioning implementation fits with list and range partitioning. I
> am going to continue to review this patch further.
>
Thanks a lots for your help.

Regards,
Amul

pgsql-hackers by date:

From: Simon Riggs
Date: 12 May 2017, 16:19:17
Subject: Re: [HACKERS] Time based lag tracking for logical replication

From: Amit Khandekar
Date: 12 May 2017, 16:47:23
Subject: Re: [HACKERS] UPDATE of partition key

Re: [HACKERS] [POC] hash partitioning - Mailing list pgsql-hackers

Previous

Next