Re: [HACKERS] [POC] hash partitioning - Mailing list pgsql-hackers

From amul sul
Subject Re: [HACKERS] [POC] hash partitioning
Date
Msg-id CAAJ_b97D5sJRZoeRcXkw=Ao+F95-d9b9h+dHEPUAp5UipHQPkA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [POC] hash partitioning  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Responses Re: [HACKERS] [POC] hash partitioning  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Re: [HACKERS] [POC] hash partitioning  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On Wed, May 17, 2017 at 11:11 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Wed, May 17, 2017 at 12:04 AM, amul sul <sulamul@gmail.com> wrote:
>> On Tue, May 16, 2017 at 10:00 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>>> On Tue, May 16, 2017 at 4:22 PM, amul sul <sulamul@gmail.com> wrote:
>>>> v6 patch has bug in partition oid mapping and indexing, fixed in the
>>>> attached version.
>>>>
>>>> Now partition oids will be arranged in the ascending order of hash
>>>> partition bound  (i.e. modulus and remainder sorting order)
>>>
>>> Thanks for the update patch. I have some more comments.
>>>
>>> ------------
>>> + if (spec->remainder < 0)
>>> + ereport(ERROR,
>>> + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
>>> +  errmsg("hash partition remainder must be less than modulus")));
>>>
>>> I think this error message is not correct, you might want to change it
>>> to "hash partition remainder must be non-negative integer"
>>>
>>
>> Fixed in the attached version;  used "hash partition remainder must be
>> greater than or equal to 0" instead.
>
> I would suggest "non-zero positive", since that's what we are using in
> the documentation.
>

Understood, Fixed in the attached version.

>>
>>> -------
>>>
>>> +         The table is partitioned by specifying remainder and modulus for each
>>> +         partition. Each partition holds rows for which the hash value of
>>>
>>> Wouldn't it be better to say "modulus and remainder" instead of
>>> "remainder and modulus" then it will be consistent?
>>>
>>
>> You are correct, fixed in the attached version.
>>
>>> -------
>>> +       An <command>UPDATE</> that causes a row to move from one partition to
>>> +       another fails, because
>>>
>>> fails, because -> fails because
>>>
>>
>> This hunk is no longer exists in the attached patch, that was mistaken
>> copied, sorry about that.
>>
>>> -------
>>>
>>> Wouldn't it be a good idea to document how to increase the number of
>>> hash partitions, I think we can document it somewhere with an example,
>>> something like Robert explained upthread?
>>>
>>> create table foo (a integer, b text) partition by hash (a);
>>> create table foo1 partition of foo with (modulus 2, remainder 0);
>>> create table foo2 partition of foo with (modulus 2, remainder 1);
>>>
>>> You can detach foo1, create two new partitions with modulus 4 and
>>> remainders 0 and 2, and move the data over from the old partition
>>>
>>> I think it will be good information for a user to have? or it's
>>> already documented and I missed it?
>>>
>
> This is already part of documentation contained in the patch.
>
> Here are some more comments
> @@ -3296,6 +3311,14 @@ ALTER TABLE measurement ATTACH PARTITION
> measurement_y2008m02
>         not the partitioned table.
>        </para>
>       </listitem>
> +
> +     <listitem>
> +      <para>
> +       An <command>UPDATE</> that causes a row to move from one partition to
> +       another fails, because the new value of the row fails to satisfy the
> +       implicit partition constraint of the original partition.
> +      </para>
> +     </listitem>
>      </itemizedlist>
>      </para>
>      </sect3>
> The description in this chunk is applicable to all the kinds of partitioning.
> Why should it be part of a patch implementing hash partitioning?
>

This was already addressed in the previous patch(v8).

> +        Declarative partitioning only supports hash, list and range
> +        partitioning, whereas table inheritance allows data to be
> +        divided in a manner of the user's choosing.  (Note, however,
> +        that if constraint exclusion is unable to prune partitions
> +        effectively, query performance will be very poor.)
> Looks like the line width is less than 80 characters.
>

Fixed in the attached version.

> In partition_bounds_equal(), please add comments explaining why is it safe to
> check just the indexes? May be we should add code under assertion to make sure
> that the datums are equal as well.

Added assert in the attached version.

> The comment could be something
> like, "If two partitioned tables have different greatest moduli, their
> partition schemes don't match. If they have same greatest moduli, and
> all remainders have different indexes, they all have same modulus
> specified and the partitions are ordered by remainders, thus indexes
> array will be an identity i.e. index[i] = i. If the partition
> corresponding to a given remainder exists, it will have same index
> entry for both partitioned tables or if it's missing it will be -1.
> Thus if indexes array matches, corresponding datums array matches. If
> there are multiple remainders corresponding to a given partition,
> their partitions are ordered by the lowest of the remainders, thus if
> indexes array matches, both of the tables have same indexes arrays, in
> both the tables remainders corresponding to multiple partitions all
> have same indexes and thus same modulus. Thus again if the indexes are
> same, datums are same.".
>

Thanks, added with minor modification.

> In the same function
>     if (key->strategy == PARTITION_STRATEGY_HASH)
>     {
>         int            greatest_modulus;
>
>         /*
>          * Compare greatest modulus of hash partition bound which
>          * is the last element of datums array.
>          */
>         if (b1->datums[b1->ndatums - 1][0] != b2->datums[b2->ndatums - 1][0])
>             return false;
>
>         /* Compare indexes */
>         greatest_modulus = DatumGetInt32(b1->datums[b1->ndatums - 1][0]);
>         for (i = 0; i < greatest_modulus; i++)
>             if (b1->indexes[i] != b2->indexes[i])
>                 return false;
>     }
> if we return true from where this block ends, we will save one indenation level
> for rest of the code and also FWIW extra diffs in this patch because of this
> indentation change.
>

I still do believe having this code in the IF - ELSE block will be
better for longterm, rather having code clutter to avoid diff that
unpleasant for now.

> +        /*
> +         * Hash operator classes provide only equality, not ordering.
> +         * Collation, which is relevant for ordering and not equality is
> +         * irrelevant for hash partitioning.
> +         */
> A comma is missing after "equality", and may be we need "for" before
> "equality".
>          * Collation, which is relevant for ordering and not equality, is
>
> +         * we use hash operator class. */
> */ should be on new line.
>

Fixed.

Regards,
Amul

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: [HACKERS] transition table behavior with inheritance appearsbroken (was: Declarative partitioning - another take)
Next
From: Ildus Kurbangaliev
Date:
Subject: Re: [HACKERS] Bug in ExecModifyTable function and trigger issuesfor foreign tables