Re: [HACKERS] [POC] hash partitioning - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: [HACKERS] [POC] hash partitioning
Date
Msg-id CAFjFpRdobJt-gHEvg+FxmWfTh+vMsEeMpsnQNsP7ed2KoNVtPA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [POC] hash partitioning  (amul sul <sulamul@gmail.com>)
Responses Re: [HACKERS] [POC] hash partitioning  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Re: [HACKERS] [POC] hash partitioning  (amul sul <sulamul@gmail.com>)
List pgsql-hackers
On Sun, May 14, 2017 at 12:30 PM, amul sul <sulamul@gmail.com> wrote:
> On Fri, May 12, 2017 at 10:39 PM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> On Fri, May 12, 2017 at 6:08 PM, amul sul <sulamul@gmail.com> wrote:
>>> Hi,
>>>
>>> Please find the following updated patches attached:
>>>
>>> 0001-Cleanup.patch : Does some cleanup and code refactoring required
>>> for hash partition patch. Otherwise, there will be unnecessary diff in
>>> 0002 patch
>>
>> Thanks for splitting the patch.
>>
>> +                if (isnull[0])
>> +                    cur_index = partdesc->boundinfo->null_index;
>> This code assumes that null_index will be set to -1 when has_null is false. Per
>> RelationBuildPartitionDesc() this is true. But probably we should write this
>> code as
>> if (isnull[0])
>> {
>>     if (partdesc->boundinfo->has_null)
>>         cur_index = partdesc->boundinfo->null_index;
>> }
>> That way we are certain that when has_null is false, cur_index = -1 similar to
>> the original code.
>>
> Okay will add this.

Thanks.

> I still don't understood point of having has_null
> variable, if no null accepting partition exists then null_index is
> alway set to -1 in RelationBuildPartitionDesc.  Anyway, let not change
> the original code.

I agree. has_null might have been folded as null_index == -1. But
that's not the problem of this patch.

0001 looks good to me now.


>
> [...]
>>
>> +        if (key->strategy == PARTITION_STRATEGY_HASH)
>> +        {
>> +            ndatums = nparts;
>> +            hbounds = (PartitionHashBound **) palloc(nparts *
>> +
>> sizeof(PartitionHashBound *));
>> +            i = 0;
>> +            foreach (cell, boundspecs)
>> +            {
>> +                PartitionBoundSpec *spec = lfirst(cell);
>> +
>> [ clipped ]
>> +                hbounds[i]->index = i;
>> +                i++;
>> +            }
>> For list and range partitioned table we order the bounds so that two
>> partitioned tables have them in the same order irrespective of order in which
>> they are specified by the user or hence stored in the catalogs. The partitions
>> then get indexes according the order in which their bounds appear in ordered
>> arrays of bounds. Thus any two partitioned tables with same partition
>> specification always have same PartitionBoundInfoData. This helps in
>> partition-wise join to match partition bounds of two given tables.  Above code
>> assigns the indexes to the partitions as they appear in the catalogs. This
>> means that two partitioned tables with same partition specification but
>> different order for partition bound specification will have different
>> PartitionBoundInfoData represenation.
>>
>> If we do that, probably partition_bounds_equal() would reduce to just matching
>> indexes and the last element of datums array i.e. the greatest modulus datum.
>> If ordered datums array of two partitioned table do not match exactly, the
>> mismatch can be because missing datums or different datums. If it's a missing
>> datum it will change the greatest modulus or have corresponding entry in
>> indexes array as -1. If the entry differs it will cause mismatching indexes in
>> the index arrays.
>>
> Make sense, will fix this.

I don't see this being addressed in the patches attached in the reply to Dilip.

>
>>
>> +        if (key->partattrs[i] != 0)
>> +        {
>> +            keyCol = (Node *) makeVar(1,
>> +                                      key->partattrs[i],
>> +                                      key->parttypid[i],
>> +                                      key->parttypmod[i],
>> +                                      key->parttypcoll[i],
>> +                                      0);
>> +
>> +            /* Form hash_fn(value) expression */
>> +            keyCol = (Node *) makeFuncExpr(key->partsupfunc[i].fn_oid,
>> +                                    get_fn_expr_rettype(&key->partsupfunc[i]),
>> +                                    list_make1(keyCol),
>> +                                    InvalidOid,
>> +                                    InvalidOid,
>> +                                    COERCE_EXPLICIT_CALL);
>> +        }
>> +        else
>> +        {
>> +            keyCol = (Node *) copyObject(lfirst(partexprs_item));
>> +            partexprs_item = lnext(partexprs_item);
>> +        }
>> I think we should add FuncExpr for column Vars as well as expressions.
>>
> Okay, will fix this.

Here, please add a check similar to get_quals_for_range()
1840             if (partexprs_item == NULL)
1841                 elog(ERROR, "wrong number of partition key expressions");


>
>> I think we need more comments for compute_hash_value(), mix_hash_value() and
>> satisfies_hash_partition() as to what each of them accepts and what it
>> computes.
>>
>> +        /* key's hash values start from third argument of function. */
>> +        if (!PG_ARGISNULL(i + 2))
>> +        {
>> +            values[i] = PG_GETARG_DATUM(i + 2);
>> +            isnull[i] = false;
>> +        }
>> +        else
>> +            isnull[i] = true;
>> You could write this as
>> isnull[i] = PG_ARGISNULL(i + 2);
>> if (isnull[i])
>>     values[i] = PG_GETARG_DATUM(i + 2);
>>
> Okay.

If we have used this technique somewhere else in PG code, please
mention that function/place.       /*        * Rotate hash left 1 bit before mixing in the next column.  This        *
preventsequal values in different keys from cancelling each other.        */
 


>
>> +                    foreach (lc, $5)
>> +                    {
>> +                        DefElem    *opt = (DefElem *) lfirst(lc);
>> A search on WITH in gram.y shows that we do not handle WITH options in gram.y.
>> Usually they are handled at the transformation stage. Why is this an exception?
>> If you do that, we can have all the error handling in
>> transformPartitionBound().
>>
> If so, ForValues need to return list for hash and PartitionBoundSpec
> for other two; wouldn't  that break code consistency? And such
> validation is not new in gram.y see xmltable_column_el.

Thanks for pointing that out. Ok, then may be leave it in gram.y. But
may be we should move the error handling in transform function.


>
>> +DATA(insert OID = 5028 ( satisfies_hash_partition PGNSP PGUID 12 1 0
>> 2276 0 f f f f f f i s 3 0 16 "23 23 2276" _null_ _null_ _null_ _null_
>> _null_ satisfies_hash_partition _null_ _null_ _null_ ));
>> Why is third argument to this function ANY? Shouldn't it be INT4ARRAY (variadic
>> INT4)?
>>
> Will use INT4ARRAY in next patch, but I am little sceptical of it.  we
> need an unsigned int32, but unfortunately there is not variadic uint32
> support.  How about INT8ARRAY?

Hmm, I think as long as the binary representation of given unsigned
integer doesn't change in the function call, we could cast an INT32
datums into unsigned int32, so spending extra 4 bytes per partition
key doesn't look like worth the effort.

A related question is, all hash functions have return type as
"integer" but internally they return uint32. Why not to do the same
for this function as well?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: [HACKERS] Event triggers + table partitioning cause server crash in current master
Next
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] [POC] hash partitioning