Home > mailing lists

Re: [POC] A better way to expand hash indexes. - Mailing list pgsql-hackers

From	Mithun Cy
Subject	Re: [POC] A better way to expand hash indexes.
Date	March 31, 2017 11:15:52
Msg-id	CAD__OujD-iBxm91ZcqziaYftWqJxnFqgMv361V9zke83s6ifBg@mail.gmail.com Whole thread Raw
In response to	Re: [POC] A better way to expand hash indexes. (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

Thanks, I have tried to fix all of the comments.

On Fri, Mar 31, 2017 at 8:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Mar 30, 2017 at 2:36 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
>> Thanks Robert, I have tried to fix the comments given as below.
>
> Thanks.
>
> Since this changes the on-disk format of hash indexes in an
> incompatible way, I think it should bump HASH_VERSION.  (Hopefully
> that doesn't preclude REINDEX?)  pg_upgrade should probably also be
> patched to issue a warning if upgrading from a version < 10 to a
> version >= 10 whenever hash indexes are present; I thought we had
> similar cases already, but I don't see them at the moment.  Maybe we
> can get Bruce or someone to give us some advice on exactly what should
> be done here.

As of now increasing version ask us to REINDEX (metapage access verify
whether we are in right version)
postgres=# set enable_seqscan= off;
SET
postgres=# select * from t1 where i = 10;
ERROR:  index "hash2" has wrong hash version
HINT:  Please REINDEX it.
postgres=# insert into t1 values(10);
ERROR:  index "hash2" has wrong hash version
HINT:  Please REINDEX it.

postgres=# REINDEX INDEX hash2;
REINDEX
postgres=# select * from t1 where i = 10;
 i
----
 10
(1 row)

Last time we changed this version from 1 to 2
(4adc2f72a4ccd6e55e594aca837f09130a6af62b), from logs I see no changes
for upgrade specifically.

Hi Bruce, can you please advise us what should be done here.

> In a couple of places, you say that a splitpoint group has 4 slots
> rather than 4 phases.

--Fixed

> I think that in _hash_get_totalbuckets(), you should use blah &
> HASH_SPLITPOINT_PHASE_MASK rather than blah %
> HASH_SPLITPOINT_PHASES_PER_GRP for consistency with _hash_spareindex
> and, perhaps, speed.  Similarly, instead of blah /
> HASH_SPLITPOINT_PHASES_PER_GRP, use blah >>
> HASH_SPLITPOINT_PHASE_BITS.

--Fixed

>
> buckets_toadd is punctuated oddly.  buckets_to_add?  Instead of
> hand-calculating this, how about calculating it as
> _hash_get_totalbuckets(spare_ndx) - _hash_get_totalbuckets(spare_ndx -
> 1)?

I think this should do that, considering new_bucket is nothng but
1-based max_buckets.
buckets_to_add = _hash_get_totalbuckets(spare_ndx) - new_bucket;

That makes me do away with

+#define HASH_SPLITPOINT_PHASE_TO_SPLITPOINT_GRP(sp_phase) \
+ (((sp_phase) < HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) ? \
+ (sp_phase) : \
+ ((((sp_phase) - HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE) >> \
+   HASH_SPLITPOINT_PHASE_BITS) + \
+  HASH_SPLITPOINT_GROUPS_WITH_ONE_PHASE))

as this is now used in only one place _hash_get_totalbuckets.
I also think the comments above can be removed now. As we have removed
the code related to multi-phased allocation there.

+         * The number of buckets in the new splitpoint group is equal
to the
+         * total number already in existence. But we do not allocate
them at
+         * once. Each splitpoint group will have 4 phases, we
distribute the
+         * buckets equally among them. So we allocate only one fourth
of total
+         * buckets in new splitpoint group at a time to consume one phase after
+         * another.

> +        uint32      splitpoint_group = 0;
>
> Don't need the  = 0 here; the next reference to this variable is an
> unconditional initialization.

--Fixed, with new code splitpoint_group is not needed.

>
> +         */
> +
> +        splitpoint_group = HASH_SPLITPOINT_PHASE_TO_SPLITPOINT_GRP(spare_ndx);
>
> I would delete the blank line.

--Fixed.

>
> -         * should start from ((2 ^ i) + metap->hashm_spares[i - 1] + 1).
> +         * should start from
> +         * (_hash_get_totalbuckets(i) + metap->hashm_spares[i - 1] + 1).
>
> Won't survive pgindent.

--Fixed as pgindent has suggested.

>
> -         * The number of buckets in the new splitpoint is equal to the total
> -         * number already in existence, i.e. new_bucket.  Currently this maps
> -         * one-to-one to blocks required, but someday we may need a more
> -         * complicated calculation here.  We treat allocation of buckets as a
> -         * separate WAL-logged action.  Even if we fail after this operation,
> -         * won't leak bucket pages; rather, the next split will consume this
> -         * space. In any case, even without failure we don't use all the space
> -         * in one split operation.
> +         * The number of buckets in the new splitpoint group is equal to the
> +         * total number already in existence. But we do not allocate them at
> +         * once. Each splitpoint group will have 4 slots, we distribute the
> +         * buckets equally among them. So we allocate only one fourth of total
> +         * buckets in new splitpoint group at a time to consume one phase after
> +         * another. We treat allocation of buckets as a separate WAL-logged
> +         * action. Even if we fail after this operation, won't leak bucket
> +         * pages; rather, the next split will consume this space. In any case,
> +         * even without failure we don't use all the space in one split
> +         * operation.
>
> I think here you should break this into two paragraphs -- start a new
> paragraph with the sentence that begins "We treat..."

-- Fixed, I have removed the first paragraph it appeared as an extra
information when we do
buckets_to_add = _hash_get_totalbuckets(spare_ndx) - new_bucket;

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachment

yet_another_expand_hashbucket_efficiently_14.patch

pgsql-hackers by date:

From: Kyotaro HORIGUCHI
Date: 31 March 2017, 11:11:44
Subject: Re: [patch] reorder tablespaces in basebackup tar streamfor backup_label

From: Jan Michálek
Date: 31 March 2017, 11:17:08
Subject: Re: Other formats in pset like markdown, rst, mediawiki

Re: [POC] A better way to expand hash indexes. - Mailing list pgsql-hackers

Attachment

Previous

Next