Re: [POC] A better way to expand hash indexes. - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [POC] A better way to expand hash indexes.
Date
Msg-id CAA4eK1L=gE+YW1OcZiUbmnboapVvZJu0jJp7Su7oqZE6pjVKvA@mail.gmail.com
Whole thread Raw
In response to [HACKERS] [POC] A better way to expand hash indexes.  (Mithun Cy <mithun.cy@enterprisedb.com>)
Responses Re: [POC] A better way to expand hash indexes.  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Sun, Mar 26, 2017 at 11:26 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> Thanks, Amit for the review.
> On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> I think one-dimensional patch has fewer places to touch, so that looks
>> better to me.  However, I think there is still hard coding and
>> assumptions in code which we should try to improve.
>
> Great!, I will continue with spares 1-dimensional improvement.
>

@@ -563,18 +563,20 @@ _hash_init_metabuffer(Buffer buf, double
num_tuples, RegProcedure procid,\
{
.. else
- num_buckets = ((uint32) 1) << _hash_log2((uint32) dnumbuckets);
+ num_buckets = _hash_get_totalbuckets(_hash_spareindex(dnumbuckets));
..
..
- metap->hashm_maxbucket = metap->hashm_lowmask = num_buckets - 1;
- metap->hashm_highmask = (num_buckets << 1) - 1;
+ metap->hashm_maxbucket = num_buckets - 1;
+
+ /* set hishmask, which should be sufficient to cover num_buckets. */
+ metap->hashm_highmask = (1 << (_hash_log2(num_buckets))) - 1;
+ metap->hashm_lowmask = (metap->hashm_highmask >> 1);
}

I think we can't change the number of buckets to be created or lowmask
and highmask calculation here without modifying _h_spoolinit() because
it sorts the data to be inserted based on hashkey which in turn
depends on the number of buckets that we are going to create during
create index operation.  We either need to allow create index
operation to still always create buckets in power-of-two fashion or we
need to update _h_spoolinit according to new computation.  One minor
drawback of using power-of-two scheme for creation of buckets during
create index is that it can lead to wastage of space and will be
inconsistent with what the patch does during split operation.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Rafia Sabih
Date:
Subject: Re: [COMMITTERS] pgsql: Improve access to parallel queryfrom procedural languages.
Next
From: Thomas Munro
Date:
Subject: Re: [sqlsmith] Unpinning error in parallel worker