Re: [POC] A better way to expand hash indexes. - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [POC] A better way to expand hash indexes.
Date
Msg-id CA+TgmoZp2Vf1Dt0u7RgcpfUf5dQfoSb3eKKVzvpnCEHF+h3PcA@mail.gmail.com
Whole thread Raw
In response to Re: [POC] A better way to expand hash indexes.  (Mithun Cy <mithun.cy@enterprisedb.com>)
Responses Re: [POC] A better way to expand hash indexes.  (Mithun Cy <mithun.cy@enterprisedb.com>)
List pgsql-hackers
On Thu, Mar 30, 2017 at 2:36 PM, Mithun Cy <mithun.cy@enterprisedb.com> wrote:
> Thanks Robert, I have tried to fix the comments given as below.

Thanks.

Since this changes the on-disk format of hash indexes in an
incompatible way, I think it should bump HASH_VERSION.  (Hopefully
that doesn't preclude REINDEX?)  pg_upgrade should probably also be
patched to issue a warning if upgrading from a version < 10 to a
version >= 10 whenever hash indexes are present; I thought we had
similar cases already, but I don't see them at the moment.  Maybe we
can get Bruce or someone to give us some advice on exactly what should
be done here.

In a couple of places, you say that a splitpoint group has 4 slots
rather than 4 phases.

I think that in _hash_get_totalbuckets(), you should use blah &
HASH_SPLITPOINT_PHASE_MASK rather than blah %
HASH_SPLITPOINT_PHASES_PER_GRP for consistency with _hash_spareindex
and, perhaps, speed.  Similarly, instead of blah /
HASH_SPLITPOINT_PHASES_PER_GRP, use blah >>
HASH_SPLITPOINT_PHASE_BITS.

buckets_toadd is punctuated oddly.  buckets_to_add?  Instead of
hand-calculating this, how about calculating it as
_hash_get_totalbuckets(spare_ndx) - _hash_get_totalbuckets(spare_ndx -
1)?  That way you reuse the existing logic instead of writing a
slightly different thing in a new place and maybe making a mistake.
If you're going to calculate it, use & and >> rather than % and /, as
above, and drop the parentheses around new_bucket -- this isn't a
macro definition.

+        uint32      splitpoint_group = 0;

Don't need the  = 0 here; the next reference to this variable is an
unconditional initialization.

+         */
+
+        splitpoint_group = HASH_SPLITPOINT_PHASE_TO_SPLITPOINT_GRP(spare_ndx);

I would delete the blank line.

-         * should start from ((2 ^ i) + metap->hashm_spares[i - 1] + 1).
+         * should start from
+         * (_hash_get_totalbuckets(i) + metap->hashm_spares[i - 1] + 1).

Won't survive pgindent.

-         * The number of buckets in the new splitpoint is equal to the total
-         * number already in existence, i.e. new_bucket.  Currently this maps
-         * one-to-one to blocks required, but someday we may need a more
-         * complicated calculation here.  We treat allocation of buckets as a
-         * separate WAL-logged action.  Even if we fail after this operation,
-         * won't leak bucket pages; rather, the next split will consume this
-         * space. In any case, even without failure we don't use all the space
-         * in one split operation.
+         * The number of buckets in the new splitpoint group is equal to the
+         * total number already in existence. But we do not allocate them at
+         * once. Each splitpoint group will have 4 slots, we distribute the
+         * buckets equally among them. So we allocate only one fourth of total
+         * buckets in new splitpoint group at a time to consume one phase after
+         * another. We treat allocation of buckets as a separate WAL-logged
+         * action. Even if we fail after this operation, won't leak bucket
+         * pages; rather, the next split will consume this space. In any case,
+         * even without failure we don't use all the space in one split
+         * operation.

I think here you should break this into two paragraphs -- start a new
paragraph with the sentence that begins "We treat..."

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Venkata B Nagothi
Date:
Subject: Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?
Next
From: Vitaly Burovoy
Date:
Subject: Re: sequence data type