Home > mailing lists

[HACKERS] [POC] A better way to expand hash indexes. - Mailing list pgsql-hackers

From	Mithun Cy
Subject	[HACKERS] [POC] A better way to expand hash indexes.
Date	February 17, 2017 19:51:32
Msg-id	CAD__OuhG6F1gQLCgMQNnMNgoCvOLQZz9zKYJQNYvYmmJoM42gA@mail.gmail.com Whole thread Raw
Responses	Re: [HACKERS] [POC] A better way to expand hash indexes.
List	pgsql-hackers

Tree view

Hi all,

As of now, we expand the hash index by doubling the number of bucket
blocks. But unfortunately, those blocks will not be used immediately.
So I thought if we can differ bucket block allocation by some factor,
hash index size can grow much efficiently. I have written a POC patch
which does following things -
Say at overflow point 'i' we need to add new "x = 2^(i-1)" buckets as
per the old code, I think we can do this addition of buckets in a
controlled way. Instead of adding all of 'x' bucket blocks at a time,
the patch will add x/4 blocks at a time. And, once those blocks are
consumed, it adds next installment of x/4 blocks. And, this will
continue until all of 'x' blocks of the overflow point 'i' is
allocated. My test result shows index size grows in a more efficient
way with above patch.

Note: This patch is just a POC. It can have bugs and I have to change
comments in the code, and README which are related to new changes.

Test:
create table t1(t int);
create index i1 on t1 using hash(t);
And then, Incrementally add rows as below.
insert into t1 select generate_series(1, 10000000);


records        base index      new patch
in million  --  size(MB)     --  index size(MB)

10                384                  384

20                768                  768

30                1417               1161

40                1531               1531

50                2556               1788

60                2785               2273

70                2963               2709

80               3060                3061

90               5111                 3575

100             5111                 3575

To implement such an incremental addition of bucket blocks I have to
increase the size of array hashm_spares in meta page by four times.
Also, mapping methods which map a total number of buckets to a
split-point position of hashm_spares array, need to be changed. These
changes create backward compatibility issues.

Implementation Details in brief:
=======================
Each element of hashm_spares (we call overflow point) is expanded into
4 slots {0, 1, 2, 3}. If 'x' (a power2 number) is the total number of
buckets to be added before the overflow point we add only a quarter of
it (x/4) to each slot, once we have consumed previously added blocks
we add next quarter of buckets and so on.
As in old code new hashm_spares[i] stores total overflow pages
allocated between those bucket allocation.


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

expand_hashbucket_efficiently_01

pgsql-hackers by date:

From: Andreas Karlsson
Date: 17 February 2017, 19:43:23
Subject: Re: [HACKERS] REINDEX CONCURRENTLY 2.0

From: Michael Paquier
Date: 17 February 2017, 19:59:15
Subject: Re: [HACKERS] SCRAM authentication, take three

[HACKERS] [POC] A better way to expand hash indexes. - Mailing list pgsql-hackers

Attachment

Previous

Next