Thread: [HACKERS] [POC] A better way to expand hash indexes.

[HACKERS] [POC] A better way to expand hash indexes.

From
Mithun Cy
Date:
Hi all,

As of now, we expand the hash index by doubling the number of bucket
blocks. But unfortunately, those blocks will not be used immediately.
So I thought if we can differ bucket block allocation by some factor,
hash index size can grow much efficiently. I have written a POC patch
which does following things -
Say at overflow point 'i' we need to add new "x = 2^(i-1)" buckets as
per the old code, I think we can do this addition of buckets in a
controlled way. Instead of adding all of 'x' bucket blocks at a time,
the patch will add x/4 blocks at a time. And, once those blocks are
consumed, it adds next installment of x/4 blocks. And, this will
continue until all of 'x' blocks of the overflow point 'i' is
allocated. My test result shows index size grows in a more efficient
way with above patch.

Note: This patch is just a POC. It can have bugs and I have to change
comments in the code, and README which are related to new changes.

Test:
create table t1(t int);
create index i1 on t1 using hash(t);
And then, Incrementally add rows as below.
insert into t1 select generate_series(1, 10000000);


records        base index      new patch
in million  --  size(MB)     --  index size(MB)

10                384                  384

20                768                  768

30                1417               1161

40                1531               1531

50                2556               1788

60                2785               2273

70                2963               2709

80               3060                3061

90               5111                 3575

100             5111                 3575

To implement such an incremental addition of bucket blocks I have to
increase the size of array hashm_spares in meta page by four times.
Also, mapping methods which map a total number of buckets to a
split-point position of hashm_spares array, need to be changed. These
changes create backward compatibility issues.

Implementation Details in brief:
=======================
Each element of hashm_spares (we call overflow point) is expanded into
4 slots {0, 1, 2, 3}. If 'x' (a power2 number) is the total number of
buckets to be added before the overflow point we add only a quarter of
it (x/4) to each slot, once we have consumed previously added blocks
we add next quarter of buckets and so on.
As in old code new hashm_spares[i] stores total overflow pages
allocated between those bucket allocation.


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [POC] A better way to expand hash indexes.

From
Amit Kapila
Date:
On Fri, Feb 17, 2017 at 7:21 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>
> To implement such an incremental addition of bucket blocks I have to
> increase the size of array hashm_spares in meta page by four times.
> Also, mapping methods which map a total number of buckets to a
> split-point position of hashm_spares array, need to be changed. These
> changes create backward compatibility issues.
>

How will high and lowmask calculations work in this new strategy?
Till now they always work on doubling strategy and I don't see you
have changed anything related to that code.  Check below places.

_hash_metapinit()
{
..
/*
* We initialize the index with N buckets, 0 .. N-1, occupying physical
* blocks 1 to N.  The first freespace bitmap page is in block N+1. Since
* N is a power of 2, we can set the masks this way:
*/
metap->hashm_maxbucket = metap->hashm_lowmask = num_buckets - 1;
metap->hashm_highmask = (num_buckets << 1) - 1;
..
}

_hash_expandtable()
{
..
if (new_bucket > metap->hashm_highmask)
{
/* Starting a new doubling */
metap->hashm_lowmask = metap->hashm_highmask;
metap->hashm_highmask = new_bucket | metap->hashm_lowmask;
}
..
}

Till now, we have worked hard for not changing the page format in a
backward incompatible way, so it will be better if we could find some
way of doing this without changing the meta page format in a backward
incompatible way.  Have you considered to store some information in
shared memory based on which we can decide how much percentage of
buckets are allocated in current table half?  I think we might be able
to construct this information after crash recovery as well.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] [POC] A better way to expand hash indexes.

From
Mithun Cy
Date:
Thanks, Amit

On Mon, Feb 20, 2017 at 9:51 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> How will high and lowmask calculations work in this new strategy?
> Till now they always work on doubling strategy and I don't see you
> have changed anything related to that code.  Check below places.

It is important that the mask has to be (2^x) -1, if we have to retain
the same hash map function. So mask variables will take same values as
before. Only place I think we need change is  _hash_metapinit();
unfortunately, I did not test for the case where we build the hash
index on already existing tuples. Now I have fixed in the latest
patch.


> Till now, we have worked hard for not changing the page format in a
> backward incompatible way, so it will be better if we could find some
> way of doing this without changing the meta page format in a backward
> incompatible way.

We are not adding any new variable/ deleting some, we increase the
size of hashm_spares and hence mapping functions should be adjusted.
The problem is the block allocation, and its management is based on
the fact that all of the buckets(will be 2^x in number) belonging to a
particular split-point is allocated at once and together. The
hashm_spares is used to record those allocations and that will be used
further by map functions to reach a particular block in the file. If
we want to change the way we allocate the buckets then hashm_spares
will change and hence mapping function. So I do not think we can avoid
incompatibility issue.

One thing I can think of is if we can increase the hashm_version of
hash index; then for old indexes, we can continue to use doubling
method and its mapping. For new indexes, we can use new way as above.

Have you considered to store some information in
> shared memory based on which we can decide how much percentage of
> buckets are allocated in current table half?  I think we might be able
> to construct this information after crash recovery as well.

I think all of above data has to be persistent. I am not able to
understand what should be/can be stored in shared buffers. Can you
please correct me if I am wrong?


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [POC] A better way to expand hash indexes.

From
David Steele
Date:
On 2/21/17 4:58 AM, Mithun Cy wrote:
> Thanks, Amit
> 
> On Mon, Feb 20, 2017 at 9:51 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> How will high and lowmask calculations work in this new strategy?
>> Till now they always work on doubling strategy and I don't see you
>> have changed anything related to that code.  Check below places.
> 
> It is important that the mask has to be (2^x) -1, if we have to retain
> the same hash map function. So mask variables will take same values as
> before. Only place I think we need change is  _hash_metapinit();
> unfortunately, I did not test for the case where we build the hash
> index on already existing tuples. Now I have fixed in the latest
> patch.
> 
> 
>> Till now, we have worked hard for not changing the page format in a
>> backward incompatible way, so it will be better if we could find some
>> way of doing this without changing the meta page format in a backward
>> incompatible way.
> 
> We are not adding any new variable/ deleting some, we increase the
> size of hashm_spares and hence mapping functions should be adjusted.
> The problem is the block allocation, and its management is based on
> the fact that all of the buckets(will be 2^x in number) belonging to a
> particular split-point is allocated at once and together. The
> hashm_spares is used to record those allocations and that will be used
> further by map functions to reach a particular block in the file. If
> we want to change the way we allocate the buckets then hashm_spares
> will change and hence mapping function. So I do not think we can avoid
> incompatibility issue.
> 
> One thing I can think of is if we can increase the hashm_version of
> hash index; then for old indexes, we can continue to use doubling
> method and its mapping. For new indexes, we can use new way as above.
> 
> Have you considered to store some information in
>> shared memory based on which we can decide how much percentage of
>> buckets are allocated in current table half?  I think we might be able
>> to construct this information after crash recovery as well.
> 
> I think all of above data has to be persistent. I am not able to
> understand what should be/can be stored in shared buffers. Can you
> please correct me if I am wrong?

This patch does not apply at cccbdde:

$ patch -p1 < ../other/expand_hashbucket_efficiently_02
patching file src/backend/access/hash/hashovfl.c
Hunk #1 succeeded at 49 (offset 1 line).
Hunk #2 succeeded at 67 (offset 1 line).
patching file src/backend/access/hash/hashpage.c
Hunk #1 succeeded at 502 with fuzz 1 (offset 187 lines).
Hunk #2 succeeded at 518 with fuzz 2 (offset 168 lines).
Hunk #3 succeeded at 562 (offset 163 lines).
Hunk #4 succeeded at 744 (offset 124 lines).
Hunk #5 FAILED at 774.
Hunk #6 succeeded at 869 (offset 19 lines).
Hunk #7 succeeded at 1450 (offset 242 lines).
Hunk #8 succeeded at 1464 (offset 242 lines).
Hunk #9 succeeded at 1505 (offset 242 lines).
1 out of 9 hunks FAILED -- saving rejects to file
src/backend/access/hash/hashpage.c.rej
patching file src/backend/access/hash/hashutil.c
Hunk #1 succeeded at 150 (offset 1 line).
patching file src/include/access/hash.h
Hunk #2 succeeded at 180 (offset 12 lines).
Hunk #3 succeeded at 382 (offset 18 lines).

It does apply with fuzz on 2b32ac2, so it looks like c11453c and
subsequent commits are the cause.  They represent a fairly substantial
change to hash indexes by introducing WAL logging so I think you should
reevaluate your patches to be sure they still function as expected.

Marked "Waiting on Author".

-- 
-David
david@pgmasters.net



Re: [HACKERS] [POC] A better way to expand hash indexes.

From
Mithun Cy
Date:
On Thu, Mar 16, 2017 at 10:55 PM, David Steele <david@pgmasters.net> wrote:
> It does apply with fuzz on 2b32ac2, so it looks like c11453c and
> subsequent commits are the cause.  They represent a fairly substantial
> change to hash indexes by introducing WAL logging so I think you should
> reevaluate your patches to be sure they still function as expected.

Thanks, David here is the new improved patch I have also corrected the
pageinspect's test output. Also, added notes in README regarding the
new way of adding bucket pages efficiently in hash index. I also did
some more tests pgbench read only and read write;
There is no performance impact due to the patch. The growth of index
size has become much efficient as the numbers posted in the initial
proposal mail.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [POC] A better way to expand hash indexes.

From
Amit Kapila
Date:
On Sat, Mar 18, 2017 at 10:59 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> On Thu, Mar 16, 2017 at 10:55 PM, David Steele <david@pgmasters.net> wrote:
>> It does apply with fuzz on 2b32ac2, so it looks like c11453c and
>> subsequent commits are the cause.  They represent a fairly substantial
>> change to hash indexes by introducing WAL logging so I think you should
>> reevaluate your patches to be sure they still function as expected.
>
> Thanks, David here is the new improved patch I have also corrected the
> pageinspect's test output. Also, added notes in README regarding the
> new way of adding bucket pages efficiently in hash index. I also did
> some more tests pgbench read only and read write;
>

To make this work, I think the calculations you have introduced are
not so easy to understand.  For example, it took me quite some time to
understand how the below function works to compute the location in
hash spares.

+uint32
+_hash_spareindex(uint32 num_bucket)
+{
..
+ /*
+ * The first 4 bucket belongs to corresponding first 4 splitpoint phases.
+ */
+ if (num_bucket <= 4)
+ return (num_bucket - 1); /* converted to base 0. */
+ splitpoint_group = _hash_log2(num_bucket) - 2; /* The are 4 buckets in
..
+ /*
+ * bucket's global splitpoint phase = total number of split point phases
+ * until its splitpoint group - splitpoint phase within this splitpoint
+ * group but after buckets own splitpoint phase.
+ */
+ tbuckets = (1 << (splitpoint_group + 2));
+ phases_beyond_bucket =
+ (tbuckets - num_bucket) / (1 << (splitpoint_group - 1));
+ return (((splitpoint_group + 1) << 2) - phases_beyond_bucket) - 1;
+}


I am not sure if it is just a matter of better comments to explain it
in a simple way or maybe we can try to find some simpler mechanism to
group the split into four (or more) equal parts.  I think if someone
else can read and share their opinion it could be helpful.  Another
idea could be to make hashm_spares a two-dimensional array
hashm_spares[32][4] where the first dimension will indicate the split
point and second will indicate the sub-split number.  I am not sure
whether it will be simpler or complex than the method used in the
proposed patch, but I think we should think a bit more to see if we
can come up with some simple technique to solve this problem.


+ * allocate them at once. Each splitpoint group will have 4 slots, we
+ * distribute the buckets equally among them. So we allocate only one
+ * forth of total buckets in new splitpoint group at time to consume
+ * one phase after another.

spelling.
/forth/fourth
/at time/at a time


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] [POC] A better way to expand hash indexes.

From
Mithun Cy
Date:
Hi Amit, Thanks for the review,

On Mon, Mar 20, 2017 at 5:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> idea could be to make hashm_spares a two-dimensional array
> hashm_spares[32][4] where the first dimension will indicate the split
> point and second will indicate the sub-split number.  I am not sure
> whether it will be simpler or complex than the method used in the
> proposed patch, but I think we should think a bit more to see if we
> can come up with some simple technique to solve this problem.

I think making it a 2-dimensional array will not be any useful in fact
we really treat the given array 2-dimensional elements now.
The main concern of yours I think is the calculation steps to find the
phase of the splitpoint group the bucket belongs to.
+ tbuckets = (1 << (splitpoint_group + 2));
+ phases_beyond_bucket =
+ (tbuckets - num_bucket) / (1 << (splitpoint_group - 1));
+ return (((splitpoint_group + 1) << 2) - phases_beyond_bucket) - 1;

Quickly thinking further we allocate 2^x of buckets on each phase of
any splitpoint group and we have 4 such phases in each group; At each
splitpoint group again we allocate 2^y buckets, So buckets number have
a pattern in their bitmap representation to find out which phase of
allocation they belong to.


As below
===========
Group 0 -- bit 0, 1 define which phase of group each bucket belong to.
0 -- 00000000
1 -- 00000001
2 -- 00000010
3 -- 00000011
===========
Group 1 -- bit 0, 1 define which phase of group each bucket belong to.
4 -- 00000100
5 -- 00000101
6 -- 00000110
7 -- 00000111
===========
Group 2 -- bit 1, 2 define which phase of group each bucket belong to.
8 -- 00001000
9 -- 00001001
10 -- 00001010
11 -- 00001011
12 -- 00001100
13 -- 00001101
14 -- 00001110
15 -- 00001111
===========
Group 3 -- bit 2, 3 define which phase of group each bucket belong to.
16 -- 00010000
17 -- 00010001
18 -- 00010010
19 -- 00010011
20 -- 00010100
21 -- 00010101
22 -- 00010110
23 -- 00010111
24 -- 00011000
25 -- 00011001
26 -- 00011010
27 -- 00011011
28 -- 00011100
29 -- 00011101
30 -- 00011110
31 -- 00011111
============

So we can say given a bucket x of group n > 0 bits (n-1, n) defines
which phase of group they belong to. I see an opportunity here to
completely simplify the above calculation into a simple bitwise anding
the bucket number with (n-1,n) bitmask to get the phase of allocation
of bucket x.

Formula can be (x >> (splitpoint_group - 1)) & 0x3 = phase of bucket

Does this satisfy your concern?

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] [POC] A better way to expand hash indexes.

From
Amit Kapila
Date:
On Mon, Mar 20, 2017 at 8:58 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> Hi Amit, Thanks for the review,
>
> On Mon, Mar 20, 2017 at 5:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> idea could be to make hashm_spares a two-dimensional array
>> hashm_spares[32][4] where the first dimension will indicate the split
>> point and second will indicate the sub-split number.  I am not sure
>> whether it will be simpler or complex than the method used in the
>> proposed patch, but I think we should think a bit more to see if we
>> can come up with some simple technique to solve this problem.
>
> I think making it a 2-dimensional array will not be any useful in fact
> we really treat the given array 2-dimensional elements now.
>

Sure, I was telling you based on that.  If you are implicitly treating
it as 2-dimensional array, it might be easier to compute the array
offsets.

> The main concern of yours I think is the calculation steps to find the
> phase of the splitpoint group the bucket belongs to.
> + tbuckets = (1 << (splitpoint_group + 2));
> + phases_beyond_bucket =
> + (tbuckets - num_bucket) / (1 << (splitpoint_group - 1));
> + return (((splitpoint_group + 1) << 2) - phases_beyond_bucket) - 1;
>

It is not only about above calculation, but also what the patch is
doing in function _hash_get_tbuckets().  By the way function name also
seems unclear (mainly *tbuckets* in the name).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] [POC] A better way to expand hash indexes.

From
Amit Kapila
Date:
On Tue, Mar 21, 2017 at 8:16 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Mar 20, 2017 at 8:58 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>> Hi Amit, Thanks for the review,
>>
>> On Mon, Mar 20, 2017 at 5:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> idea could be to make hashm_spares a two-dimensional array
>>> hashm_spares[32][4] where the first dimension will indicate the split
>>> point and second will indicate the sub-split number.  I am not sure
>>> whether it will be simpler or complex than the method used in the
>>> proposed patch, but I think we should think a bit more to see if we
>>> can come up with some simple technique to solve this problem.
>>
>> I think making it a 2-dimensional array will not be any useful in fact
>> we really treat the given array 2-dimensional elements now.
>>
>
> Sure, I was telling you based on that.  If you are implicitly treating
> it as 2-dimensional array, it might be easier to compute the array
> offsets.
>

The above sentence looks incomplete.
If you are implicitly treating it as a 2-dimensional array, it might
be easier to compute the array offsets if you explicitly also treats
as a 2-dimensional array.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] [POC] A better way to expand hash indexes.

From
Mithun Cy
Date:
Hi Amit please find the new patch

On Tue, Mar 21, 2017 at 8:16 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> It is not only about above calculation, but also what the patch is
> doing in function _hash_get_tbuckets().  By the way function name also
> seems unclear (mainly *tbuckets* in the name).

Fixed I have introduced some macros for readability and added more
comments to explain why some calculations are mad. Please let me know
if you think more improvements are needed.

>spelling.
>/forth/fourth
>/at time/at a time
Thanks fixed.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] [POC] A better way to expand hash indexes.

From
Mithun Cy
Date:
On Fri, Mar 24, 2017 at 1:22 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> Hi Amit please find the new patch

The pageinspect.sgml has an example which shows the output of
"hash_metapage_info()". Since we increase the spares array and
eventually ovflpoint, I have updated the example with corresponding
values..


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [POC] A better way to expand hash indexes.

From
Mithun Cy
Date:
On Tue, Mar 21, 2017 at 8:16 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Sure, I was telling you based on that.  If you are implicitly treating
> it as 2-dimensional array, it might be easier to compute the array
>offsets.

I think calculating spares offset will not become anyway much simpler
we still need to calculate split group and split phase separately. I
have added a patch to show how a 2-dimensional spares code looks like
and where all we need changes.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [POC] A better way to expand hash indexes.

From
Amit Kapila
Date:
On Sat, Mar 25, 2017 at 10:13 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> On Tue, Mar 21, 2017 at 8:16 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> Sure, I was telling you based on that.  If you are implicitly treating
>> it as 2-dimensional array, it might be easier to compute the array
>>offsets.
>
> I think calculating spares offset will not become anyway much simpler
> we still need to calculate split group and split phase separately. I
> have added a patch to show how a 2-dimensional spares code looks like
> and where all we need changes.
>

I think one-dimensional patch has fewer places to touch, so that looks
better to me.  However, I think there is still hard coding and
assumptions in code which we should try to improve.

1.
+ /*
+ * The first 4 bucket belongs to first splitpoint group 0. And since group
+ * 0 have 4 = 2^2 buckets, we double them in group 1. So total buckets
+ * after group 1 is 8 = 2^3. Then again at group 2 we add another 2^3
+ * buckets to double the total number of buckets, which will become 2^4. I
+ * think by this time we can see a pattern which say if num_bucket > 4
+ * splitpoint group = log2(num_bucket) - 2
+ */

+ if (num_bucket <= 4)
+ splitpoint_group = 0; /* converted to base 0. */
+ else
+ splitpoint_group = _hash_log2(num_bucket) - 2;

This patch defines split point group zero has four buckets and based
on that above calculation is done.  I feel you can define it like
#define Buckets_First_Split_Group 4 and then use it in above code.
Also, in else part number 2 looks awkward, can we define it as
log2_buckets_first_group = _hash_log2(Buckets_First_Split_Group) or
some thing like that.  I think that way code will look neat.  I don't
like the way above comment is worded even though it is helpful to
understand the calculation.  If you want, then you can add such a
comment in file header, here it looks out of place.

2.
+power-of-2 groups, called "split points" in the code.  That means on every new
+split points we double the existing number of buckets.

split points/split point

3.
+ * which phase of allocation the bucket_num belogs to with in the group.

/belogs/belongs


I have still not completely reviewed the patch as I have ran out of
time for today.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [POC] A better way to expand hash indexes.

From
Mithun Cy
Date:
Thanks, Amit for the review.
On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> I think one-dimensional patch has fewer places to touch, so that looks
> better to me.  However, I think there is still hard coding and
> assumptions in code which we should try to improve.

Great!, I will continue with spares 1-dimensional improvement.

>
> 1.
> + /*
> + * The first 4 bucket belongs to first splitpoint group 0. And since group
> + * 0 have 4 = 2^2 buckets, we double them in group 1. So total buckets
> + * after group 1 is 8 = 2^3. Then again at group 2 we add another 2^3
> + * buckets to double the total number of buckets, which will become 2^4. I
> + * think by this time we can see a pattern which say if num_bucket > 4
> + * splitpoint group = log2(num_bucket) - 2
> + */
>
> + if (num_bucket <= 4)
> + splitpoint_group = 0; /* converted to base 0. */
> + else
> + splitpoint_group = _hash_log2(num_bucket) - 2;
>
> This patch defines split point group zero has four buckets and based
> on that above calculation is done.  I feel you can define it like
> #define Buckets_First_Split_Group 4 and then use it in above code.
> Also, in else part number 2 looks awkward, can we define it as
> log2_buckets_first_group = _hash_log2(Buckets_First_Split_Group) or
> some thing like that.  I think that way code will look neat.  I don't
> like the way above comment is worded even though it is helpful to
> understand the calculation.  If you want, then you can add such a
> comment in file header, here it looks out of place.

I have removed the comments. And, defined a new macro which maps
bucket to SPLIT GROUP

#define BUCKET_TO_SPLITPOINT_GRP(num_bucket) \
        ((num_bucket <= Buckets_First_Split_Group) ? 0 : \
                                                (_hash_log2(num_bucket) - 2))

I could not find a way to explain why minus 2? better than " The
splitpoint group of a given bucket can be taken as
(_hash_log2(bucket) - 2). Subtracted by 2 because each group have 2 ^
(x + 2) buckets.". Now I have added those with existing comments I
think that should make it little better.

Adding comments about spares array in hashutils.c's file header did
not appear right to me. I think README has some details about same.

> 2.
> +power-of-2 groups, called "split points" in the code.  That means on every new
> +split points we double the existing number of buckets.
>
> split points/split point

Fixed.

> 3.
> + * which phase of allocation the bucket_num belogs to with in the group.
>
> /belogs/belongs

Fixed

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [POC] A better way to expand hash indexes.

From
Amit Kapila
Date:
On Sun, Mar 26, 2017 at 11:26 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> Thanks, Amit for the review.
> On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> I think one-dimensional patch has fewer places to touch, so that looks
>> better to me.  However, I think there is still hard coding and
>> assumptions in code which we should try to improve.
>
> Great!, I will continue with spares 1-dimensional improvement.
>

@@ -563,18 +563,20 @@ _hash_init_metabuffer(Buffer buf, double
num_tuples, RegProcedure procid,\
{
.. else
- num_buckets = ((uint32) 1) << _hash_log2((uint32) dnumbuckets);
+ num_buckets = _hash_get_totalbuckets(_hash_spareindex(dnumbuckets));
..
..
- metap->hashm_maxbucket = metap->hashm_lowmask = num_buckets - 1;
- metap->hashm_highmask = (num_buckets << 1) - 1;
+ metap->hashm_maxbucket = num_buckets - 1;
+
+ /* set hishmask, which should be sufficient to cover num_buckets. */
+ metap->hashm_highmask = (1 << (_hash_log2(num_buckets))) - 1;
+ metap->hashm_lowmask = (metap->hashm_highmask >> 1);
}

I think we can't change the number of buckets to be created or lowmask
and highmask calculation here without modifying _h_spoolinit() because
it sorts the data to be inserted based on hashkey which in turn
depends on the number of buckets that we are going to create during
create index operation.  We either need to allow create index
operation to still always create buckets in power-of-two fashion or we
need to update _h_spoolinit according to new computation.  One minor
drawback of using power-of-two scheme for creation of buckets during
create index is that it can lead to wastage of space and will be
inconsistent with what the patch does during split operation.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [POC] A better way to expand hash indexes.

From
Amit Kapila
Date:
On Mon, Mar 27, 2017 at 11:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Sun, Mar 26, 2017 at 11:26 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>> Thanks, Amit for the review.
>> On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>
>>> I think one-dimensional patch has fewer places to touch, so that looks
>>> better to me.  However, I think there is still hard coding and
>>> assumptions in code which we should try to improve.
>>
>> Great!, I will continue with spares 1-dimensional improvement.
>>
>
> @@ -563,18 +563,20 @@ _hash_init_metabuffer(Buffer buf, double
> num_tuples, RegProcedure procid,\
> {
> ..
>   else
> - num_buckets = ((uint32) 1) << _hash_log2((uint32) dnumbuckets);
> + num_buckets = _hash_get_totalbuckets(_hash_spareindex(dnumbuckets));
> ..
> ..
> - metap->hashm_maxbucket = metap->hashm_lowmask = num_buckets - 1;
> - metap->hashm_highmask = (num_buckets << 1) - 1;
> + metap->hashm_maxbucket = num_buckets - 1;
> +
> + /* set hishmask, which should be sufficient to cover num_buckets. */
> + metap->hashm_highmask = (1 << (_hash_log2(num_buckets))) - 1;
> + metap->hashm_lowmask = (metap->hashm_highmask >> 1);
> }
>
> I think we can't change the number of buckets to be created or lowmask
> and highmask calculation here without modifying _h_spoolinit() because
> it sorts the data to be inserted based on hashkey which in turn
> depends on the number of buckets that we are going to create during
> create index operation.  We either need to allow create index
> operation to still always create buckets in power-of-two fashion or we
> need to update _h_spoolinit according to new computation.  One minor
> drawback of using power-of-two scheme for creation of buckets during
> create index is that it can lead to wastage of space and will be
> inconsistent with what the patch does during split operation.
>

Few more comments:

1.
@@ -149,6 +149,84 @@ _hash_log2(uint32 num) return i;}

+#define SPLITPOINT_PHASES_PER_GRP 4
+#define SPLITPOINT_PHASE_MASK (SPLITPOINT_PHASES_PER_GRP - 1)
+#define Buckets_First_Split_Group 4

The defines should be at the beginning of file.

2.
+/*
+ * At splitpoint group 0 we have 2 ^ (0 + 2) = 4 buckets, then at splitpoint
+ * group 1 we have 2 ^ (1 + 2) = 8 total buckets. As the doubling continues at
+ * splitpoint group "x" we will have 2 ^ (x + 2) total buckets. Total buckets
+ * before x splitpoint group will be 2 ^ (x + 1). At each phase of allocation
+ * within splitpoint group we add 2 ^ (x - 1) buckets, as we have to divide the
+ * task of allocation of 2 ^ (x + 1) buckets among 4 phases.
+ *
+ * Also, splitpoint group of a given bucket can be taken as
+ * (_hash_log2(bucket) - 2). Subtracted by 2 because each group have
+ * 2 ^ (x + 2) buckets.
+ */
..

+#define BUCKET_TO_SPLITPOINT_GRP(num_bucket) \
+ ((num_bucket <= Buckets_First_Split_Group) ? 0 : \
+ (_hash_log2(num_bucket) - 2))

In the above computation +2 and -2 still bothers me.  I think you need
to do this because you have defined split group zero to have four
buckets, how about if you don't force that and rather define to have
split point phases only from split point which has four or more
buckets?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [POC] A better way to expand hash indexes.

From
Jesper Pedersen
Date:
Hi Mithun,

On 03/26/2017 01:56 AM, Mithun Cy wrote:
> Thanks, Amit for the review.
> On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> I think one-dimensional patch has fewer places to touch, so that looks
>> better to me.  However, I think there is still hard coding and
>> assumptions in code which we should try to improve.
>
> Great!, I will continue with spares 1-dimensional improvement.
>

I ran some performance scenarios on the patch to see if the increased 
'spares' allocation had an impact. I havn't found any regressions in 
that regard.

Attached patch contains some small fixes, mainly to the documentation - 
on top of v7.

Best regards,
  Jesper


Re: [POC] A better way to expand hash indexes.

From
Mithun Cy
Date:
On Mon, Mar 27, 2017 at 11:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

> I think we can't change the number of buckets to be created or lowmask
> and highmask calculation here without modifying _h_spoolinit() because
> it sorts the data to be inserted based on hashkey which in turn
> depends on the number of buckets that we are going to create during
> create index operation.  We either need to allow create index
> operation to still always create buckets in power-of-two fashion or we
> need to update _h_spoolinit according to new computation.  One minor
> drawback of using power-of-two scheme for creation of buckets during
> create index is that it can lead to wastage of space and will be
> inconsistent with what the patch does during split operation.

Yes, this was a miss. Now Number of buckets allocated during
metap_init is not always a power-of-two number. The hashbuild which
uses just the hash_mask to decide which bucket does the hashkey
belong to is not sufficient. It can give buckets beyond max_buckets
and sorting of index values based on their buckets will be out of
order. When we try to actually insert the same in hash index we loose
the advantage of the spatial locality which existed before. And, hence
indexbuild performance can reduce.

As you have said we can solve it if we allocate buckets always in
power-of-2 when we do hash index meta page init. But on other
occasions, when we try to double the existing buckets we can do the
allocation in 4 equal phases.

But I think there are 2 more ways to solve same,

A. Why not pass all 3 parameters high_mask, low_mask, max-buckets to
tuplesort and let them use _hash_hashkey2bucket to figure out which
key belong to which bucket. And then sort them. I think this way we
make both sorting and insertion to hash index both consistent with
each other.

B. In tuple sort we can use hash function bucket = hash_key %
num_buckets instead of existing one which does bitwise "and" to
determine the bucket of hash key. This way we will not wrongly assign
buckets beyond max_buckets and sorted hash keys will be in sync with
actual insertion order of _hash_doinsert.


I am adding both the patches Patch_A and Patch_B. My preference is
Patch_A and I am open for suggestion.

>+#define SPLITPOINT_PHASES_PER_GRP 4
>+#define SPLITPOINT_PHASE_MASK (SPLITPOINT_PHASES_PER_GRP - 1)
>+#define Buckets_First_Split_Group 4
Fixed.

>In the above computation +2 and -2 still bothers me.  I think you need
>to do this because you have defined split group zero to have four
>buckets, how about if you don't force that and rather define to have
>split point phases only from split point which has four or more
>buckets?

Okay as suggested instead of group zero having 4 phases of 1 bucket
each, I have recalculated the spare mapping as below.
Allocating huge chunks of bucket pages all at once isn't optimal and
we will take ages to consume those. To avoid this exponential growth
of index size, we did use a trick to breakup allocation of buckets at
the splitpoint into 4 equal phases.  If (2 ^ x) is the total buckets
need to be allocated at a splitpoint (from now on we shall call this
as a splitpoint group), then we allocate 1/4th (2 ^ (x - 2)) of total
buckets at each phase of splitpoint group. Next quarter of allocation
will only happen if buckets of the previous phase have been already
consumed.  Since for buckets number < 4 we cannot further divide it
into multiple phases, the first 3 group will have only one phase of
allocation. The groups 0, 1, 2 will allocate 1, 1, 2 buckets
respectively at once in one phase. For the groups > 2 Where we
allocate buckets > 4, the allocation process is distributed among four
equal phases. At group 3 we allocate 4 buckets in 4 different phases
{1, 1, 1, 1}, the numbers in curly braces indicate number of buckets
allocated within each phase of splitpoint group 3. And, for splitpoint
group 4 and 5 allocation phase will be {2, 2, 2, 2} = 16 buckets in
total and {4, 4, 4, 4} = 32 buckets in total.  We can see that at each
splitpoint group
we double the total number of buckets from previous group but in an
incremental phase.  The bucket pages allocated within one phase of a
splitpoint group will appear consecutively in the index.


The sortbuild_hash_*.patch can be applied independently on any of
expand_hashbucket_efficiently_08.patch
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [POC] A better way to expand hash indexes.

From
Amit Kapila
Date:
On Tue, Mar 28, 2017 at 10:43 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> On Mon, Mar 27, 2017 at 11:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>
> As you have said we can solve it if we allocate buckets always in
> power-of-2 when we do hash index meta page init. But on other
> occasions, when we try to double the existing buckets we can do the
> allocation in 4 equal phases.
>
> But I think there are 2 more ways to solve same,
>
> A. Why not pass all 3 parameters high_mask, low_mask, max-buckets to
> tuplesort and let them use _hash_hashkey2bucket to figure out which
> key belong to which bucket. And then sort them. I think this way we
> make both sorting and insertion to hash index both consistent with
> each other.
>
> B. In tuple sort we can use hash function bucket = hash_key %
> num_buckets instead of existing one which does bitwise "and" to
> determine the bucket of hash key. This way we will not wrongly assign
> buckets beyond max_buckets and sorted hash keys will be in sync with
> actual insertion order of _hash_doinsert.
>
>
> I am adding both the patches Patch_A and Patch_B. My preference is
> Patch_A and I am open for suggestion.
>

I think both way it can work.  I feel there is no hard pressed need
that we should make the computation in sorting same as what you do
_hash_doinsert. In patch_B,  I don't think new naming for variables is
good.
 Assert(!a->isnull1);
- hash1 = DatumGetUInt32(a->datum1) & state->hash_mask;
+ bucket1 = DatumGetUInt32(a->datum1) % state->num_buckets;

Can we use hash_mod instead of num_buckets and retain hash1 in above
code and similar other places?

>>+#define SPLITPOINT_PHASES_PER_GRP 4
>>+#define SPLITPOINT_PHASE_MASK (SPLITPOINT_PHASES_PER_GRP - 1)
>>+#define Buckets_First_Split_Group 4
> Fixed.
>
>>In the above computation +2 and -2 still bothers me.  I think you need
>>to do this because you have defined split group zero to have four
>>buckets, how about if you don't force that and rather define to have
>>split point phases only from split point which has four or more
>>buckets?
>
> Okay as suggested instead of group zero having 4 phases of 1 bucket
> each, I have recalculated the spare mapping as below.
>

Few comments:
1.
+#define BUCKETS_WITHIN_SP_GRP(sp_g, nphase) \
+ ((sp_g < SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE) ? \
+ (1 << (sp_g - 1)) : \
+ ((nphase) * ((1 << (sp_g - 1)) >> 2)))

This will go wrong for split point group zero.  In general, I feel if
you handle computation for split groups lesser than
SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE in the caller, then all your
macros will become much simpler and less error prone.

2.
+#define BUCKET_TO_SPLITPOINT_GRP(num_bucket) (_hash_log2(num_bucket))

What is the use of such a define, can't we directly use _hash_log2 in
the caller?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [POC] A better way to expand hash indexes.

From
Mithun Cy
Date:
On Tue, Mar 28, 2017 at 12:21 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> I think both way it can work.  I feel there is no hard pressed need
> that we should make the computation in sorting same as what you do
> _hash_doinsert. In patch_B,  I don't think new naming for variables is
> good.
>
>   Assert(!a->isnull1);
> - hash1 = DatumGetUInt32(a->datum1) & state->hash_mask;
> + bucket1 = DatumGetUInt32(a->datum1) % state->num_buckets;
>
> Can we use hash_mod instead of num_buckets and retain hash1 in above
> code and similar other places?

Yes done renamed it to hash_mod.

>
> Few comments:
> 1.
> +#define BUCKETS_WITHIN_SP_GRP(sp_g, nphase) \
> + ((sp_g < SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE) ? \
> + (1 << (sp_g - 1)) : \
> + ((nphase) * ((1 << (sp_g - 1)) >> 2)))
>
> This will go wrong for split point group zero.  In general, I feel if
> you handle computation for split groups lesser than
> SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE in the caller, then all your
> macros will become much simpler and less error prone.

Fixed, apart from SPLITPOINT_PHASE_TO_SPLITPOINT_GRP rest all macros
only handle multi phase group. The SPLITPOINT_PHASE_TO_SPLITPOINT_GRP
is used in one more place at expand index so thought kepeping it as it
is is better.
.
> 2.
> +#define BUCKET_TO_SPLITPOINT_GRP(num_bucket) (_hash_log2(num_bucket))
>
> What is the use of such a define, can't we directly use _hash_log2 in
> the caller?

No real reason, just that NAMED macro appeared more readable than just
_hash_log2. Now I have removed same.

> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [POC] A better way to expand hash indexes.

From
Robert Haas
Date:
On Tue, Mar 28, 2017 at 5:00 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>> This will go wrong for split point group zero.  In general, I feel if
>> you handle computation for split groups lesser than
>> SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE in the caller, then all your
>> macros will become much simpler and less error prone.
>
> Fixed, apart from SPLITPOINT_PHASE_TO_SPLITPOINT_GRP rest all macros
> only handle multi phase group. The SPLITPOINT_PHASE_TO_SPLITPOINT_GRP
> is used in one more place at expand index so thought kepeping it as it
> is is better.

I wonder if we should consider increasing
SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE somewhat.  For example, split
point 4 is responsible for allocating only 16 new buckets = 128kB;
doing those in four groups of two (16kB) seems fairly pointless.
Suppose we start applying this technique beginning around splitpoint 9
or 10.  Breaking 1024 new buckets * 8kB = 8MB of index growth into 4
phases might save enough to be worthwhile.

Of course, even if we decide to apply this even for very small
splitpoints, it probably doesn't cost us anything other than some
space in the metapage.  But maybe saving space in the metapage isn't
such a bad idea anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [POC] A better way to expand hash indexes.

From
Amit Kapila
Date:
On Tue, Mar 28, 2017 at 8:56 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Mar 28, 2017 at 5:00 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>>> This will go wrong for split point group zero.  In general, I feel if
>>> you handle computation for split groups lesser than
>>> SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE in the caller, then all your
>>> macros will become much simpler and less error prone.
>>
>> Fixed, apart from SPLITPOINT_PHASE_TO_SPLITPOINT_GRP rest all macros
>> only handle multi phase group. The SPLITPOINT_PHASE_TO_SPLITPOINT_GRP
>> is used in one more place at expand index so thought kepeping it as it
>> is is better.
>
> I wonder if we should consider increasing
> SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE somewhat.  For example, split
> point 4 is responsible for allocating only 16 new buckets = 128kB;
> doing those in four groups of two (16kB) seems fairly pointless.
> Suppose we start applying this technique beginning around splitpoint 9
> or 10.  Breaking 1024 new buckets * 8kB = 8MB of index growth into 4
> phases might save enough to be worthwhile.
>

10 sounds better point to start allocating in phases.

> Of course, even if we decide to apply this even for very small
> splitpoints, it probably doesn't cost us anything other than some
> space in the metapage.  But maybe saving space in the metapage isn't
> such a bad idea anyway.
>

Yeah metapage space is scarce, so lets try to save it as much possible.


Few other comments:
+/*
+ * This is just a trick to save a division operation. If you look into the
+ * bitmap of 0-based bucket_num 2nd and 3rd most significant bit will indicate
+ * which phase of allocation the bucket_num belongs to with in the group. This
+ * is because at every splitpoint group we allocate (2 ^ x) buckets and we have
+ * divided the allocation process into 4 equal phases. This macro returns value
+ * from 0 to 3.
+ */

+#define SPLITPOINT_PHASES_WITHIN_GROUP(sp_g, bucket_num) \
+ (((bucket_num) >> (sp_g - SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE)) & \
+ SPLITPOINT_PHASE_MASK)

This won't work if we change SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE to
number other than 3.  I think you should change it so that it can work
with any value of  SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE.

I think you should name this define as SPLITPOINT_PHASE_WITHIN_GROUP
as this refers to only one particular phase within group.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [POC] A better way to expand hash indexes.

From
Mithun Cy
Date:
On Wed, Mar 29, 2017 at 10:12 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

>> I wonder if we should consider increasing
>> SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE somewhat.  For example, split
>> point 4 is responsible for allocating only 16 new buckets = 128kB;
>> doing those in four groups of two (16kB) seems fairly pointless.
>> Suppose we start applying this technique beginning around splitpoint 9
>> or 10.  Breaking 1024 new buckets * 8kB = 8MB of index growth into 4
>> phases might save enough to be worthwhile.
>>
>
> 10 sounds better point to start allocating in phases.

+1. At splitpoint group 10 we will allocate (2 ^ 9) buckets = 4MB in
total and each phase will allocate 2 ^ 7 buckets = 128 * 8kB = 1MB.

> Few other comments:
> +/*
> + * This is just a trick to save a division operation. If you look into the
> + * bitmap of 0-based bucket_num 2nd and 3rd most significant bit will indicate
> + * which phase of allocation the bucket_num belongs to with in the group. This
> + * is because at every splitpoint group we allocate (2 ^ x) buckets and we have
> + * divided the allocation process into 4 equal phases. This macro returns value
> + * from 0 to 3.
> + */
>
> +#define SPLITPOINT_PHASES_WITHIN_GROUP(sp_g, bucket_num) \
> + (((bucket_num) >> (sp_g - SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE)) & \
> + SPLITPOINT_PHASE_MASK)
> This won't work if we change SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE to
> number other than 3.  I think you should change it so that it can work
> with any value of  SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE.

Fixed, using SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE was accidental. All
I need is most significant 3 bits hence should be subtracted by 3
always.

> I think you should name this define as SPLITPOINT_PHASE_WITHIN_GROUP
> as this refers to only one particular phase within group.

Fixed.

Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [POC] A better way to expand hash indexes.

From
Amit Kapila
Date:
On Wed, Mar 29, 2017 at 12:51 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> On Wed, Mar 29, 2017 at 10:12 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> Few other comments:
>> +/*
>> + * This is just a trick to save a division operation. If you look into the
>> + * bitmap of 0-based bucket_num 2nd and 3rd most significant bit will indicate
>> + * which phase of allocation the bucket_num belongs to with in the group. This
>> + * is because at every splitpoint group we allocate (2 ^ x) buckets and we have
>> + * divided the allocation process into 4 equal phases. This macro returns value
>> + * from 0 to 3.
>> + */
>>
>> +#define SPLITPOINT_PHASES_WITHIN_GROUP(sp_g, bucket_num) \
>> + (((bucket_num) >> (sp_g - SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE)) & \
>> + SPLITPOINT_PHASE_MASK)
>> This won't work if we change SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE to
>> number other than 3.  I think you should change it so that it can work
>> with any value of  SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE.
>
> Fixed, using SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE was accidental. All
> I need is most significant 3 bits hence should be subtracted by 3
> always.
>

Okay, your current patch looks good to me apart from minor comments,
so marked as Read For Committer.  Please either merge the
sort_hash_b_2.patch with main patch or submit it along with next
revision for easier reference.

Few minor comments:
1.
+splitpoint phase S.  The hashm_spares[0] is always 0, so that buckets 0 and 1
+(which belong to splitpoint group 0's phase 1 and phase 2 respectively) always
+appear at block numbers 1 and 2, just after the meta page.

This explanation doesn't seem to be right as with current patch we
start phased allocation only after splitpoint group 9.

2.
-#define HASH_MAX_SPLITPOINTS 32#define HASH_MAX_BITMAPS 128

+#define SPLITPOINT_PHASES_PER_GRP 4
+#define SPLITPOINT_PHASE_MASK (SPLITPOINT_PHASES_PER_GRP - 1)
+#define SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE 10
+#define HASH_MAX_SPLITPOINTS \
+ ((32 - SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE) * \
+ SPLITPOINT_PHASES_PER_GRP) + \
+ SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE

You have changed the value of HASH_MAX_SPLITPOINTS, but the comments
explaining that value are still unchanged.  Refer below text.
"The limitation on the size of spares[] comes from the fact that there's* no point in having more than 2^32 buckets
withonly uint32 hashcodes."
 



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [POC] A better way to expand hash indexes.

From
Mithun Cy
Date:
That means at every new
+split point we double the existing number of buckets.  Allocating huge chucks

On Mon, Mar 27, 2017 at 11:56 PM, Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:

> I ran some performance scenarios on the patch to see if the increased
> 'spares' allocation had an impact. I havn't found any regressions in that
> regard.

Thanks Jasper for testing the patch.

> Attached patch contains some small fixes, mainly to the documentation - on
> top of v7.

I have taken some of the grammatical and spell check issues you have
mentioned. One major thing I left it as it is term "splitpoint" which
you have tried to change in many places to "split point", The
splitpoint is not introduced by me, it was already used in many
places, so I think it is acceptable to use that term. I think I shall
not add changes which are not part of the core issue. I think another
patch on top of this should be okay.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com



Re: [POC] A better way to expand hash indexes.

From
Mithun Cy
Date:
Thanks, Amit for a detailed review.

On Wed, Mar 29, 2017 at 4:09 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Okay, your current patch looks good to me apart from minor comments,
> so marked as Read For Committer.  Please either merge the
> sort_hash_b_2.patch with main patch or submit it along with next
> revision for easier reference.

I will keep it separated just in case commiter likes
sortbuild_hash_A.patch. We can use either of sortbuild_hash_*.patch on
top of main patch.

>
> Few minor comments:
> 1.
> +splitpoint phase S.  The hashm_spares[0] is always 0, so that buckets 0 and 1
> +(which belong to splitpoint group 0's phase 1 and phase 2 respectively) always
> +appear at block numbers 1 and 2, just after the meta page.
>
> This explanation doesn't seem to be right as with current patch we
> start phased allocation only after splitpoint group 9.

Again a mistake, removed the sentence in parentheses.

> 2.
> -#define HASH_MAX_SPLITPOINTS 32
>  #define HASH_MAX_BITMAPS 128
>
> +#define SPLITPOINT_PHASES_PER_GRP 4
> +#define SPLITPOINT_PHASE_MASK (SPLITPOINT_PHASES_PER_GRP - 1)
> +#define SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE 10
> +#define HASH_MAX_SPLITPOINTS \
> + ((32 - SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE) * \
> + SPLITPOINT_PHASES_PER_GRP) + \
> + SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE
>
> You have changed the value of HASH_MAX_SPLITPOINTS, but the comments
> explaining that value are still unchanged.  Refer below text.
> "The limitation on the size of spares[] comes from the fact that there's
>  * no point in having more than 2^32 buckets with only uint32 hashcodes."

The limitation is still indirectly imposed by the fact that we can
have only 2^32 buckets. But I also added a note that
HASH_MAX_SPLITPOINTS also considers that after
SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE bucket allocation will be done
in multiple(exactly 4) phases.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [POC] A better way to expand hash indexes.

From
Robert Haas
Date:
On Tue, Mar 28, 2017 at 1:13 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> B. In tuple sort we can use hash function bucket = hash_key %
> num_buckets instead of existing one which does bitwise "and" to
> determine the bucket of hash key. This way we will not wrongly assign
> buckets beyond max_buckets and sorted hash keys will be in sync with
> actual insertion order of _hash_doinsert.

I think approach B is incorrect.  Suppose we have 1536 buckets and
hash values 2048, 2049, 4096, 4097, 6144, 6145, 8192, and 8193.  If I
understand correctly, each of these values should be mapped either to
bucket 0 or to bucket 1, and the goal of the sort is to put all of the
bucket 0 tuples before all of the bucket 1 tuples, so that we get
physical locality when inserting.  With approach A, the sort keys will
match the bucket numbers -- we'll be sorting the list 0, 1, 0, 1, 0,
1, 0, 1 -- and we will end up doing all of the inserts to bucket 0
before any of the inserts to bucket 1.  With approach B, we'll be
sorting 512, 513, 1024, 1025, 0, 1, 512, 513 and will end up
alternating inserts to bucket 0 with inserts to bucket 1.

To put that another way, see this comment at the top of hashsort.c:
* When building a very large hash index, we pre-sort the tuples by bucket* number to improve locality of access to the
index,and thereby avoid* thrashing.  We use tuplesort.c to sort the given index tuples into order.
 

So, you can't just decide to sort on a random number, which is what
approach B effectively does.  Or, you can, but it completely misses
the point of sorting in the first place.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [POC] A better way to expand hash indexes.

From
Mithun Cy
Date:
On Thu, Mar 30, 2017 at 7:23 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think approach B is incorrect.  Suppose we have 1536 buckets and
> hash values 2048, 2049, 4096, 4097, 6144, 6145, 8192, and 8193.  If I
> understand correctly, each of these values should be mapped either to
> bucket 0 or to bucket 1, and the goal of the sort is to put all of the
> bucket 0 tuples before all of the bucket 1 tuples, so that we get
> physical locality when inserting.  With approach A, the sort keys will
> match the bucket numbers -- we'll be sorting the list 0, 1, 0, 1, 0,
> 1, 0, 1 -- and we will end up doing all of the inserts to bucket 0
> before any of the inserts to bucket 1.  With approach B, we'll be
> sorting 512, 513, 1024, 1025, 0, 1, 512, 513 and will end up
> alternating inserts to bucket 0 with inserts to bucket 1.

Oops sorry, yes 2 denominators are different (one used in an insert
and another used in sorting keys) we will end up with different bucket
numbers. I think in patch B, I should have actually taken next 2-power
number of 1536 as the denominator and try to get the mod value. If the
mod value is > 1536 then reduce the denominator by half and retake the
mod to get the bucket within 1536. Which is what effectively Patch A
is doing. Approach B is a blunder, I apologize for that mistake. I
think Patch A should be considered. If adding the members of struct
Tuplesortstate is a concern I will rewrite Patch B as said above.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com



Re: [POC] A better way to expand hash indexes.

From
Robert Haas
Date:
On Wed, Mar 29, 2017 at 8:03 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> Thanks, Amit for a detailed review.

I think that the macros in hash.h need some more work:

- Pretty much any time you use the argument of a macro, you need to
parenthesize it in the macro definition to avoid surprises if the
macros is called using an expression.  That isn't done consistently
here.

- The macros make extensive use of magic numbers like 1, 2, and 3.  I
suggest something like:

#define SPLITPOINT_PHASE_BITS 2
#define SPLITPOINT_PHASES_PER_GROUP (1 << SPLITPOINT_PHASE_BITS)

And then use SPLITPOINT_PHASE_BITS any place where you're currently
saying 2.  The reference to 3 is really SPLITPOINT_PHASE_BITS + 1.

- Many of these macros are only used in one place.  Maybe just move
the computation to that place and get rid of the macro.  For example,
_hash_spareindex() could be written like this:

if (splitpoint_group < SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE)   return splitpoint_group;

/* account for single-phase groups */
splitpoint = SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE;

/* account for completed groups */
splitpoint += (splitpoint_group -
SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE) << SPLITPOINT_PHASE_BITS;

/* account for phases within current group */
splitpoint += (bucket_num >> (SPLITPOINT_PHASE_BITS + 1)) &
SPLITPOINT_PHASE_MASK;

return splitpoint;

That eliminates the only use of two complicated macros and is in my
opinion more clear than what you've currently got.

- Some of these macros lack clear comments explaining their purpose.

- Some of them don't include HASH anywhere in the name, which is
essential for a header that may easily be included by non-hash index
code.

- The names don't all follow a consistent format.  Maybe that's too
much to hope for at some level, but I think they could be more
consistent than they are.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [POC] A better way to expand hash indexes.

From
Mithun Cy
Date:
Thanks Robert, I have tried to fix the comments given as below.

On Thu, Mar 30, 2017 at 9:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think that the macros in hash.h need some more work:
>
> - Pretty much any time you use the argument of a macro, you need to
> parenthesize it in the macro definition to avoid surprises if the
> macros is called using an expression.  That isn't done consistently
> here.

--I have tried to fix same in the latest patch.

> - The macros make extensive use of magic numbers like 1, 2, and 3.  I
> suggest something like:
>
> #define SPLITPOINT_PHASE_BITS 2
> #define SPLITPOINT_PHASES_PER_GROUP (1 << SPLITPOINT_PHASE_BITS)
>
> And then use SPLITPOINT_PHASE_BITS any place where you're currently
> saying 2.  The reference to 3 is really SPLITPOINT_PHASE_BITS + 1.

-- Taken modified same in the latest patch.

> - Many of these macros are only used in one place.  Maybe just move
> the computation to that place and get rid of the macro.  For example,
> _hash_spareindex() could be written like this:
>
> if (splitpoint_group < SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE)
>     return splitpoint_group;
>
> /* account for single-phase groups */
> splitpoint = SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE;
>
> /* account for completed groups */
> splitpoint += (splitpoint_group -
> SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE) << SPLITPOINT_PHASE_BITS;
>
> /* account for phases within current group */
> splitpoint += (bucket_num >> (SPLITPOINT_PHASE_BITS + 1)) &
> SPLITPOINT_PHASE_MASK;
>
> return splitpoint;
>
> That eliminates the only use of two complicated macros and is in my
> opinion more clear than what you've currently got.

-- Taken, also rewrote _hash_get_totalbuckets in similar lines.

With that, we will end up with only 2 macros which have some computing code
+/* defines max number of splitpoit phases a hash index can have */
+#define HASH_MAX_SPLITPOINT_GROUP 32
+#define HASH_MAX_SPLITPOINTS \
+ (((HASH_MAX_SPLITPOINT_GROUP - HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) * \
+  HASH_SPLITPOINT_PHASES_PER_GRP) + \
+ HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE)
+
+/* given a splitpoint phase get its group */
+#define HASH_SPLITPOINT_PHASE_TO_SPLITPOINT_GRP(sp_phase) \
+ (((sp_phase) < HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) ? \
+ (sp_phase) : \
+ ((((sp_phase) - HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) >> \
+   HASH_SPLITPOINT_PHASE_BITS) + \
+  HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE))

> - Some of these macros lack clear comments explaining their purpose.

-- I have written some comments to explain the use of the macros.

> - Some of them don't include HASH anywhere in the name, which is
> essential for a header that may easily be included by non-hash index
> code.

-- Fixed, all MACROS are prefixed with HASH

> - The names don't all follow a consistent format.  Maybe that's too
> much to hope for at some level, but I think they could be more
> consistent than they are.

-- Fixed, apart from old HASH_MAX_SPLITPOINTS rest all have a prefix
HASH_SPLITPOINT.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [POC] A better way to expand hash indexes.

From
Robert Haas
Date:
On Thu, Mar 30, 2017 at 2:36 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> Thanks Robert, I have tried to fix the comments given as below.

Thanks.

Since this changes the on-disk format of hash indexes in an
incompatible way, I think it should bump HASH_VERSION.  (Hopefully
that doesn't preclude REINDEX?)  pg_upgrade should probably also be
patched to issue a warning if upgrading from a version < 10 to a
version >= 10 whenever hash indexes are present; I thought we had
similar cases already, but I don't see them at the moment.  Maybe we
can get Bruce or someone to give us some advice on exactly what should
be done here.

In a couple of places, you say that a splitpoint group has 4 slots
rather than 4 phases.

I think that in _hash_get_totalbuckets(), you should use blah &
HASH_SPLITPOINT_PHASE_MASK rather than blah %
HASH_SPLITPOINT_PHASES_PER_GRP for consistency with _hash_spareindex
and, perhaps, speed.  Similarly, instead of blah /
HASH_SPLITPOINT_PHASES_PER_GRP, use blah >>
HASH_SPLITPOINT_PHASE_BITS.

buckets_toadd is punctuated oddly.  buckets_to_add?  Instead of
hand-calculating this, how about calculating it as
_hash_get_totalbuckets(spare_ndx) - _hash_get_totalbuckets(spare_ndx -
1)?  That way you reuse the existing logic instead of writing a
slightly different thing in a new place and maybe making a mistake.
If you're going to calculate it, use & and >> rather than % and /, as
above, and drop the parentheses around new_bucket -- this isn't a
macro definition.

+        uint32      splitpoint_group = 0;

Don't need the  = 0 here; the next reference to this variable is an
unconditional initialization.

+         */
+
+        splitpoint_group = HASH_SPLITPOINT_PHASE_TO_SPLITPOINT_GRP(spare_ndx);

I would delete the blank line.

-         * should start from ((2 ^ i) + metap->hashm_spares[i - 1] + 1).
+         * should start from
+         * (_hash_get_totalbuckets(i) + metap->hashm_spares[i - 1] + 1).

Won't survive pgindent.

-         * The number of buckets in the new splitpoint is equal to the total
-         * number already in existence, i.e. new_bucket.  Currently this maps
-         * one-to-one to blocks required, but someday we may need a more
-         * complicated calculation here.  We treat allocation of buckets as a
-         * separate WAL-logged action.  Even if we fail after this operation,
-         * won't leak bucket pages; rather, the next split will consume this
-         * space. In any case, even without failure we don't use all the space
-         * in one split operation.
+         * The number of buckets in the new splitpoint group is equal to the
+         * total number already in existence. But we do not allocate them at
+         * once. Each splitpoint group will have 4 slots, we distribute the
+         * buckets equally among them. So we allocate only one fourth of total
+         * buckets in new splitpoint group at a time to consume one phase after
+         * another. We treat allocation of buckets as a separate WAL-logged
+         * action. Even if we fail after this operation, won't leak bucket
+         * pages; rather, the next split will consume this space. In any case,
+         * even without failure we don't use all the space in one split
+         * operation.

I think here you should break this into two paragraphs -- start a new
paragraph with the sentence that begins "We treat..."

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [POC] A better way to expand hash indexes.

From
Mithun Cy
Date:
Thanks, I have tried to fix all of the comments.

On Fri, Mar 31, 2017 at 8:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Mar 30, 2017 at 2:36 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>> Thanks Robert, I have tried to fix the comments given as below.
>
> Thanks.
>
> Since this changes the on-disk format of hash indexes in an
> incompatible way, I think it should bump HASH_VERSION.  (Hopefully
> that doesn't preclude REINDEX?)  pg_upgrade should probably also be
> patched to issue a warning if upgrading from a version < 10 to a
> version >= 10 whenever hash indexes are present; I thought we had
> similar cases already, but I don't see them at the moment.  Maybe we
> can get Bruce or someone to give us some advice on exactly what should
> be done here.

As of now increasing version ask us to REINDEX (metapage access verify
whether we are in right version)
postgres=# set enable_seqscan= off;
SET
postgres=# select * from t1 where i = 10;
ERROR:  index "hash2" has wrong hash version
HINT:  Please REINDEX it.
postgres=# insert into t1 values(10);
ERROR:  index "hash2" has wrong hash version
HINT:  Please REINDEX it.

postgres=# REINDEX INDEX hash2;
REINDEX
postgres=# select * from t1 where i = 10;
 i
----
 10
(1 row)

Last time we changed this version from 1 to 2
(4adc2f72a4ccd6e55e594aca837f09130a6af62b), from logs I see no changes
for upgrade specifically.

Hi Bruce, can you please advise us what should be done here.

> In a couple of places, you say that a splitpoint group has 4 slots
> rather than 4 phases.

--Fixed

> I think that in _hash_get_totalbuckets(), you should use blah &
> HASH_SPLITPOINT_PHASE_MASK rather than blah %
> HASH_SPLITPOINT_PHASES_PER_GRP for consistency with _hash_spareindex
> and, perhaps, speed.  Similarly, instead of blah /
> HASH_SPLITPOINT_PHASES_PER_GRP, use blah >>
> HASH_SPLITPOINT_PHASE_BITS.

--Fixed

>
> buckets_toadd is punctuated oddly.  buckets_to_add?  Instead of
> hand-calculating this, how about calculating it as
> _hash_get_totalbuckets(spare_ndx) - _hash_get_totalbuckets(spare_ndx -
> 1)?

I think this should do that, considering new_bucket is nothng but
1-based max_buckets.
buckets_to_add = _hash_get_totalbuckets(spare_ndx) - new_bucket;

That makes me do away with

+#define HASH_SPLITPOINT_PHASE_TO_SPLITPOINT_GRP(sp_phase) \
+ (((sp_phase) < HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) ? \
+ (sp_phase) : \
+ ((((sp_phase) - HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) >> \
+   HASH_SPLITPOINT_PHASE_BITS) + \
+  HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE))

as this is now used in only one place _hash_get_totalbuckets.
I also think the comments above can be removed now. As we have removed
the code related to multi-phased allocation there.

+         * The number of buckets in the new splitpoint group is equal
to the
+         * total number already in existence. But we do not allocate
them at
+         * once. Each splitpoint group will have 4 phases, we
distribute the
+         * buckets equally among them. So we allocate only one fourth
of total
+         * buckets in new splitpoint group at a time to consume one phase after
+         * another.

> +        uint32      splitpoint_group = 0;
>
> Don't need the  = 0 here; the next reference to this variable is an
> unconditional initialization.

--Fixed, with new code splitpoint_group is not needed.

>
> +         */
> +
> +        splitpoint_group = HASH_SPLITPOINT_PHASE_TO_SPLITPOINT_GRP(spare_ndx);
>
> I would delete the blank line.

--Fixed.

>
> -         * should start from ((2 ^ i) + metap->hashm_spares[i - 1] + 1).
> +         * should start from
> +         * (_hash_get_totalbuckets(i) + metap->hashm_spares[i - 1] + 1).
>
> Won't survive pgindent.

--Fixed as pgindent has suggested.

>
> -         * The number of buckets in the new splitpoint is equal to the total
> -         * number already in existence, i.e. new_bucket.  Currently this maps
> -         * one-to-one to blocks required, but someday we may need a more
> -         * complicated calculation here.  We treat allocation of buckets as a
> -         * separate WAL-logged action.  Even if we fail after this operation,
> -         * won't leak bucket pages; rather, the next split will consume this
> -         * space. In any case, even without failure we don't use all the space
> -         * in one split operation.
> +         * The number of buckets in the new splitpoint group is equal to the
> +         * total number already in existence. But we do not allocate them at
> +         * once. Each splitpoint group will have 4 slots, we distribute the
> +         * buckets equally among them. So we allocate only one fourth of total
> +         * buckets in new splitpoint group at a time to consume one phase after
> +         * another. We treat allocation of buckets as a separate WAL-logged
> +         * action. Even if we fail after this operation, won't leak bucket
> +         * pages; rather, the next split will consume this space. In any case,
> +         * even without failure we don't use all the space in one split
> +         * operation.
>
> I think here you should break this into two paragraphs -- start a new
> paragraph with the sentence that begins "We treat..."

-- Fixed, I have removed the first paragraph it appeared as an extra
information when we do
buckets_to_add = _hash_get_totalbuckets(spare_ndx) - new_bucket;

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [POC] A better way to expand hash indexes.

From
Robert Haas
Date:
On Fri, Mar 31, 2017 at 1:15 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> Thanks, I have tried to fix all of the comments.

Thanks.

Hmm, don't the changes to contrib/pageinspect/expected/hash.out
indicate that you've broken something?  The hash index has only 4
buckets, so the new code shouldn't be doing anything differently, but
you've got highmask and lowmask changing for some reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [POC] A better way to expand hash indexes.

From
Mithun Cy
Date:
On Sat, Apr 1, 2017 at 7:05 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Hmm, don't the changes to contrib/pageinspect/expected/hash.out
> indicate that you've broken something?  The hash index has only 4
> buckets, so the new code shouldn't be doing anything differently, but
> you've got highmask and lowmask changing for some reason.

The high mask calculation has got changed a bit to accommodate
num_buckets  which are not 2-power number.
If num_bucket is not a 2-power number highmask should be its immediate
next ((2^x) - 1) and low mask should be (highmask >> 1) to cover the
first half of the buckets. Trying to generalize same has changed the
masks for 2-power num_buckets from older implementation.

+ /* set highmask, which should be sufficient to cover num_buckets. */
+ metap->hashm_highmask = (1 << (_hash_log2(num_buckets))) - 1;

But this do not cause any adverse effect the high and low mask is
sufficiently enough to get the same mod, If we add one more bucket
then
@_hash_expandtable, immediately we make the masks bigger.
    if (new_bucket > metap->hashm_highmask)
    {
        /* Starting a new doubling */
        metap->hashm_lowmask = metap->hashm_highmask;
        metap->hashm_highmask = new_bucket | metap->hashm_lowmask;

The state (metap->hashm_highmask == metap->hashm_maxbucket) is natural
state to occur while hash index is growing and just before doubling.

Another choice I could have made is bump a number so that for 2-power
num_buckets will get highmask as similar to old code, and non 2-power
num_buckets highmask will be immediate next ((2^x) - 1).
+ /* set highmask, which should be sufficient to cover num_buckets. */
+ metap->hashm_highmask = (1 << (_hash_log2(num_buckets + 1))) - 1;

It was just a personal preference I choose 1, as it appeared
consistent with running state of hash index expansion.

Also adding a patch which implements the 2nd way.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [POC] A better way to expand hash indexes.

From
Mithun Cy
Date:
On Sat, Apr 1, 2017 at 12:31 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:

> Also adding a patch which implements the 2nd way.
Sorry, I forgot to add sortbuild_hash patch, which also needs similar
changes for the hash_mask.


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [POC] A better way to expand hash indexes.

From
Robert Haas
Date:
On Sat, Apr 1, 2017 at 3:29 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> On Sat, Apr 1, 2017 at 12:31 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>> Also adding a patch which implements the 2nd way.
> Sorry, I forgot to add sortbuild_hash patch, which also needs similar
> changes for the hash_mask.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [POC] A better way to expand hash indexes.

From
Mithun Cy
Date:
On Tue, Apr 4, 2017 at 9:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Committed.

Thanks Robert,

And also sorry, one unfortunate thing happened in the last patch while
fixing one of the review comments a variable disappeared from the
equation
@_hash_spareindex.

        splitpoint_phases +=
-               (((num_bucket - 1) >> (HASH_SPLITPOINT_PHASE_BITS + 1)) &
+               (((num_bucket - 1) >>
+                 (splitpoint_group - (HASH_SPLITPOINT_PHASE_BITS + 1))) &
                 HASH_SPLITPOINT_PHASE_MASK);   /* to 0-based value. */

I wanted most significant 3 bits. And while fixing comments in patch11
I unknowingly somehow removed splitpoint_group from the equation.
Extremely sorry for the mistake. Thanks to Ashutosh Sharma for
pointing the mistake.


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: [POC] A better way to expand hash indexes.

From
Robert Haas
Date:
On Tue, Apr 4, 2017 at 6:33 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> On Tue, Apr 4, 2017 at 9:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Committed.
>
> Thanks Robert,
>
> And also sorry, one unfortunate thing happened in the last patch while
> fixing one of the review comments a variable disappeared from the
> equation
> @_hash_spareindex.
>
>         splitpoint_phases +=
> -               (((num_bucket - 1) >> (HASH_SPLITPOINT_PHASE_BITS + 1)) &
> +               (((num_bucket - 1) >>
> +                 (splitpoint_group - (HASH_SPLITPOINT_PHASE_BITS + 1))) &
>                  HASH_SPLITPOINT_PHASE_MASK);   /* to 0-based value. */
>
> I wanted most significant 3 bits. And while fixing comments in patch11
> I unknowingly somehow removed splitpoint_group from the equation.
> Extremely sorry for the mistake. Thanks to Ashutosh Sharma for
> pointing the mistake.

Ugh, OK, committed that also.  Please try to be more careful in the future.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company