Re: [HACKERS] [POC] hash partitioning - Mailing list pgsql-hackers

From amul sul
Subject Re: [HACKERS] [POC] hash partitioning
Date
Msg-id CAAJ_b95eMdLuiM2_gAZ=Bz3yoXg+OG1mHs6H4US5eRhVptPwRg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [POC] hash partitioning  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] [POC] hash partitioning  (amul sul <sulamul@gmail.com>)
List pgsql-hackers
On Fri, May 19, 2017 at 10:35 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, May 19, 2017 at 5:32 AM, amul sul <sulamul@gmail.com> wrote:
>> Updated patch attached.  0001-patch rebased against latest head.
>> 0002-patch also incorporates code comments and error message changes
>> as per Robert's & your suggestions. Thanks !
>
> -                if (spec->strategy != PARTITION_STRATEGY_LIST)
> -                    elog(ERROR, "invalid strategy in partition bound spec");
> +                Assert(spec->strategy == PARTITION_STRATEGY_LIST);
>
> Let's just drop these hunks.  I realize this is a response to a review
> comment I made, but I take it back.  If the existing code is already
> doing it this way, there's no real need to revise it.  The patch
> doesn't even make it consistent anyway, since elsewhere you elog() for
> a similar case.  Perhaps elog() is best anyway.
>
Done.

> -    partitioning methods include range and list, where each partition is
> -    assigned a range of keys and a list of keys, respectively.
> +    partitioning methods include hash, range and list, where each partition is
> +    assigned a modulus and remainder of keys, a range of keys and a list of
> +    keys, respectively.
>
> I think this sentence has become too long and unwieldy, and is more
> unclear than helpful.  I'd just write "The currently supported
> partitioning methods are list, range, and hash."  The use of the word
> include is actually wrong here, because it implies that there are more
> not mentioned here, which is false.
>
Done.

> -      expression.  If no btree operator class is specified when creating a
> -      partitioned table, the default btree operator class for the datatype will
> -      be used.  If there is none, an error will be reported.
> +      expression.  List and range partitioning uses only btree operator class.
> +      Hash partitioning uses only hash operator class. If no operator class is
> +      specified when creating a partitioned table, the default operator class
> +      for the datatype will be used.  If there is none, an error will be
> +      reported.
> +     </para>
>
> I suggest: If no operator class is specified when creating a
> partitioned table, the default operator class of the appropriate type
> (btree for list and range partitioning, hash for hash partitioning)
> will be used.  If there is none, an error will be reported.
>
Done.

> +     <para>
> +      Since hash partitiong operator class, provide only equality,
> not ordering,
> +      collation is not relevant in hash partition key column. An error will be
> +      reported if collation is specified.
>
> partitiong -> partitioning.  Also, remove the comma after "operator
> class" and change "not relevant in hash partition key column" to "not
> relevant for hash partitioning".  Also change "if collation is
> specified" to "if a collation is specified".
>
Done.

> +   Create a hash partitioned table:
> +<programlisting>
> +CREATE TABLE orders (
> +    order_id     bigint not null,
> +    cust_id      bigint not null,
> +    status       text
> +) PARTITION BY HASH (order_id);
> +</programlisting></para>
>
> Move this down so it's just above the example of creating partitions.
>
Done.

> + * For range and list partitioned tables, datums is an array of datum-tuples
> + * with key->partnatts datums each.
> + * For hash partitioned tables, it is an array of datum-tuples with 2 datums,
> + * modulus and remainder, corresponding to a given partition.
>
> Second line is very short; reflow as one paragraph.
>
Done

>   * In case of range partitioning, it stores one entry per distinct range
>   * datum, which is the index of the partition for which a given datum
>   * is an upper bound.
> + * In the case of hash partitioning, the number of the entries in the indexes
> + * array is same as the greatest modulus amongst all partitions. For a given
> + * partition key datum-tuple, the index of the partition which would
> accept that
> + * datum-tuple would be given by the entry pointed by remainder produced when
> + * hash value of the datum-tuple is divided by the greatest modulus.
>
> Insert line break before the new text as a paragraph break.

Will wait for more inputs on Ashutosh's explanation upthread.

>
> +    char        strategy;        /* hash, list or range bounds? */
>
> Might be clearer to just write /* hash, list, or range? */ or /*
> bounds for hash, list, or range? */
>

Done, used "hash, list, or range?"

>
> +static uint32 compute_hash_value(PartitionKey key, Datum *values,
> bool *isnull);
> +
>
> I think there should be a blank line after this but not before it.
>

Done.

> I don't really see why hash partitioning needs to touch
> partition_bounds_equal() at all.  Why can't the existing logic work
> for hash partitioning without change?
>

Unlike list and range partition, ndatums does not represents size of
the indexes array, also dimension of datums  array in not the same as
a key->partnatts.

> +                                valid_bound = true;
>
> valid_modulus, maybe?
>

Sure, added.

> -                   errmsg("data type %s has no default btree operator class",
> -                          format_type_be(atttype)),
> -                         errhint("You must specify a btree operator
> class or define a default btree operator class for the data type.")));
> +                      errmsg("data type %s has no default %s operator class",
> +                             format_type_be(atttype), am_method),
> +                         errhint("You must specify a %s operator
> class or define a default %s operator class for the data type.",
> +                                 am_method, am_method)));
>
> Let's use this existing wording from typecmds.c:
>
>                      errmsg("data type %s has no default operator
> class for access method \"%s\"",
>
> and for the hint, maybe: You must specify an operator class or define
> a default operator class for the data type.  Leave out the %s, in
> other words.
>

Done.

> +        /*
> +         * Hash operator classes provide only equality, not ordering.
> +         * Collation, which is relevant for ordering and not for equality, is
> +         * irrelevant for hash partitioning.
> +         */
> +        if (*strategy == PARTITION_STRATEGY_HASH && pelem->collation != NIL)
> +            ereport(ERROR,
> +                    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                     errmsg("cannot use collation for hash partitioning"),
> +                     parser_errposition(pstate, pelem->location)));
>
> This error message is not very informative, and it requires
> propagating information about the partitioning type into parts of the
> code that otherwise don't require it.  I was waffling before on
> whether to ERROR here; I think now I'm in favor of ignoring the
> problem.  The collation won't do any harm; it just won't affect the
> behavior.
>

Removed.

> +         * Identify opclass to use.  For list and range partitioning we use
> +         * only btree operator class, which seems enough for those.  For hash
> +         * partitioning, we use hash operator class.
>
> Strange wording.  Suggest: Identify the appropriate operator class.
> For list and range partitioning, we use a btree operator class; hash
> partitioning uses a hash operator class.
>

Done

> +            FOR VALUES WITH '(' hash_partbound ')' /*TODO: syntax is
> not finalised*/
>
> Remove the comment.
>

Done.

> +                    foreach (lc, $5)
> +                    {
> +                        DefElem    *opt = (DefElem *) lfirst(lc);
> +
> +                        if (strcmp(opt->defname, "modulus") == 0)
> +                            n->modulus = defGetInt32(opt);
> +                        else if (strcmp(opt->defname, "remainder") == 0)
> +                            n->remainder = defGetInt32(opt);
> +                        else
> +                            ereport(ERROR,
> +                                    (errcode(ERRCODE_SYNTAX_ERROR),
> +                                     errmsg("unrecognized hash
> partition bound specification \"%s\"",
> +                                            opt->defname),
> +                                     parser_errposition(opt->location)));
> +                    }
>
> This logic doesn't complain if the same option is specified more than
> once.  I suggest adding a check for that, and also pushing this logic
> out into a helper function that gets called here instead of including
> it inline.
>

Added duplicate error.
About separate helper function,  can't we have as it is, because, imo,
we might not going to use that elsewhere?


> +                   errmsg("hash partition modulus must be a positive
> integer")));
>
> modulus for hash partition
>
> +                     errmsg("hash partition remainder must be a
> non-negative integer")));
>
> remainder for hash partition
>
> +            errmsg("hash partition modulus must be greater than remainder")));
>
> modulus for hash partition must be greater than remainder
>

Done.  Similar changes in gram.y as well.

> +-- values are hashed, row may map to different partitions, which result in
>
> the row
>
> +-- regression failure.  To avoid this, let's create non-default hash function
>
> create a non-default

Done.

Updated patch attached. Thanks a lot for review.

Regards,
Amul

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: [HACKERS] psql - add special variable to reflect the last querystatus
Next
From: tushar
Date:
Subject: [HACKERS] "create publication..all tables" ignore 'partition not supported'error