Home > mailing lists

Re: [POC] A better way to expand hash indexes. - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: [POC] A better way to expand hash indexes.
Date	March 27, 2017 13:57:56
Msg-id	CAA4eK1LKBds9hY2ByaPHxNs--hPZi5gkGVvaUCpfGmUTXRhqJg@mail.gmail.com Whole thread Raw
In response to	Re: [POC] A better way to expand hash indexes. (Amit Kapila <amit.kapila16@gmail.com>)
List	pgsql-hackers

Tree view

On Mon, Mar 27, 2017 at 11:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Sun, Mar 26, 2017 at 11:26 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>> Thanks, Amit for the review.
>> On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>
>>> I think one-dimensional patch has fewer places to touch, so that looks
>>> better to me.  However, I think there is still hard coding and
>>> assumptions in code which we should try to improve.
>>
>> Great!, I will continue with spares 1-dimensional improvement.
>>
>
> @@ -563,18 +563,20 @@ _hash_init_metabuffer(Buffer buf, double
> num_tuples, RegProcedure procid,\
> {
> ..
>   else
> - num_buckets = ((uint32) 1) << _hash_log2((uint32) dnumbuckets);
> + num_buckets = _hash_get_totalbuckets(_hash_spareindex(dnumbuckets));
> ..
> ..
> - metap->hashm_maxbucket = metap->hashm_lowmask = num_buckets - 1;
> - metap->hashm_highmask = (num_buckets << 1) - 1;
> + metap->hashm_maxbucket = num_buckets - 1;
> +
> + /* set hishmask, which should be sufficient to cover num_buckets. */
> + metap->hashm_highmask = (1 << (_hash_log2(num_buckets))) - 1;
> + metap->hashm_lowmask = (metap->hashm_highmask >> 1);
> }
>
> I think we can't change the number of buckets to be created or lowmask
> and highmask calculation here without modifying _h_spoolinit() because
> it sorts the data to be inserted based on hashkey which in turn
> depends on the number of buckets that we are going to create during
> create index operation.  We either need to allow create index
> operation to still always create buckets in power-of-two fashion or we
> need to update _h_spoolinit according to new computation.  One minor
> drawback of using power-of-two scheme for creation of buckets during
> create index is that it can lead to wastage of space and will be
> inconsistent with what the patch does during split operation.
>

Few more comments:

1.
@@ -149,6 +149,84 @@ _hash_log2(uint32 num) return i;}

+#define SPLITPOINT_PHASES_PER_GRP 4
+#define SPLITPOINT_PHASE_MASK (SPLITPOINT_PHASES_PER_GRP - 1)
+#define Buckets_First_Split_Group 4

The defines should be at the beginning of file.

2.
+/*
+ * At splitpoint group 0 we have 2 ^ (0 + 2) = 4 buckets, then at splitpoint
+ * group 1 we have 2 ^ (1 + 2) = 8 total buckets. As the doubling continues at
+ * splitpoint group "x" we will have 2 ^ (x + 2) total buckets. Total buckets
+ * before x splitpoint group will be 2 ^ (x + 1). At each phase of allocation
+ * within splitpoint group we add 2 ^ (x - 1) buckets, as we have to divide the
+ * task of allocation of 2 ^ (x + 1) buckets among 4 phases.
+ *
+ * Also, splitpoint group of a given bucket can be taken as
+ * (_hash_log2(bucket) - 2). Subtracted by 2 because each group have
+ * 2 ^ (x + 2) buckets.
+ */
..

+#define BUCKET_TO_SPLITPOINT_GRP(num_bucket) \
+ ((num_bucket <= Buckets_First_Split_Group) ? 0 : \
+ (_hash_log2(num_bucket) - 2))

In the above computation +2 and -2 still bothers me.  I think you need
to do this because you have defined split group zero to have four
buckets, how about if you don't force that and rather define to have
split point phases only from split point which has four or more
buckets?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Arjen Nienhuis
Date: 27 March 2017, 13:56:46
Subject: example for xmltable with XMLNAMESPACES

From: Jeevan Ladhe
Date: 27 March 2017, 14:06:01
Subject: Re: Adding support for Default partition in partitioning

Re: [POC] A better way to expand hash indexes. - Mailing list pgsql-hackers

Previous

Next