Re: [POC] A better way to expand hash indexes. - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [POC] A better way to expand hash indexes.
Date
Msg-id CAA4eK1LKBds9hY2ByaPHxNs--hPZi5gkGVvaUCpfGmUTXRhqJg@mail.gmail.com
Whole thread Raw
In response to Re: [POC] A better way to expand hash indexes.  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Mon, Mar 27, 2017 at 11:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Sun, Mar 26, 2017 at 11:26 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>> Thanks, Amit for the review.
>> On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>
>>> I think one-dimensional patch has fewer places to touch, so that looks
>>> better to me.  However, I think there is still hard coding and
>>> assumptions in code which we should try to improve.
>>
>> Great!, I will continue with spares 1-dimensional improvement.
>>
>
> @@ -563,18 +563,20 @@ _hash_init_metabuffer(Buffer buf, double
> num_tuples, RegProcedure procid,\
> {
> ..
>   else
> - num_buckets = ((uint32) 1) << _hash_log2((uint32) dnumbuckets);
> + num_buckets = _hash_get_totalbuckets(_hash_spareindex(dnumbuckets));
> ..
> ..
> - metap->hashm_maxbucket = metap->hashm_lowmask = num_buckets - 1;
> - metap->hashm_highmask = (num_buckets << 1) - 1;
> + metap->hashm_maxbucket = num_buckets - 1;
> +
> + /* set hishmask, which should be sufficient to cover num_buckets. */
> + metap->hashm_highmask = (1 << (_hash_log2(num_buckets))) - 1;
> + metap->hashm_lowmask = (metap->hashm_highmask >> 1);
> }
>
> I think we can't change the number of buckets to be created or lowmask
> and highmask calculation here without modifying _h_spoolinit() because
> it sorts the data to be inserted based on hashkey which in turn
> depends on the number of buckets that we are going to create during
> create index operation.  We either need to allow create index
> operation to still always create buckets in power-of-two fashion or we
> need to update _h_spoolinit according to new computation.  One minor
> drawback of using power-of-two scheme for creation of buckets during
> create index is that it can lead to wastage of space and will be
> inconsistent with what the patch does during split operation.
>

Few more comments:

1.
@@ -149,6 +149,84 @@ _hash_log2(uint32 num) return i;}

+#define SPLITPOINT_PHASES_PER_GRP 4
+#define SPLITPOINT_PHASE_MASK (SPLITPOINT_PHASES_PER_GRP - 1)
+#define Buckets_First_Split_Group 4

The defines should be at the beginning of file.

2.
+/*
+ * At splitpoint group 0 we have 2 ^ (0 + 2) = 4 buckets, then at splitpoint
+ * group 1 we have 2 ^ (1 + 2) = 8 total buckets. As the doubling continues at
+ * splitpoint group "x" we will have 2 ^ (x + 2) total buckets. Total buckets
+ * before x splitpoint group will be 2 ^ (x + 1). At each phase of allocation
+ * within splitpoint group we add 2 ^ (x - 1) buckets, as we have to divide the
+ * task of allocation of 2 ^ (x + 1) buckets among 4 phases.
+ *
+ * Also, splitpoint group of a given bucket can be taken as
+ * (_hash_log2(bucket) - 2). Subtracted by 2 because each group have
+ * 2 ^ (x + 2) buckets.
+ */
..

+#define BUCKET_TO_SPLITPOINT_GRP(num_bucket) \
+ ((num_bucket <= Buckets_First_Split_Group) ? 0 : \
+ (_hash_log2(num_bucket) - 2))

In the above computation +2 and -2 still bothers me.  I think you need
to do this because you have defined split group zero to have four
buckets, how about if you don't force that and rather define to have
split point phases only from split point which has four or more
buckets?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Arjen Nienhuis
Date:
Subject: example for xmltable with XMLNAMESPACES
Next
From: Jeevan Ladhe
Date:
Subject: Re: Adding support for Default partition in partitioning