Thanks for having a look at this.
On Mon, 21 Jun 2021 at 05:02, Zhihong Yu <zyu@yugabyte.com> wrote:
> + * GH_ELEMENT_TYPE defines the data type that the hashtable stores. Each
> + * instance of GH_ELEMENT_TYPE which is stored in the hash table is done so
> + * inside a GH_SEGMENT.
>
> I think the second sentence can be written as (since done means stored, it is redundant):
I've rewords this entire paragraph slightly.
> Each instance of GH_ELEMENT_TYPE is stored in the hash table inside a GH_SEGMENT.
>
> + * Macros for translating a bucket's index into the segment and another to
> + * determine the item number within the segment.
> + */
> +#define GH_INDEX_SEGMENT(i) (i) / GH_ITEMS_PER_SEGMENT
>
> into the segment -> into the segment number (in the code I see segidx but I wonder if segment index may cause slight
confusion).
I've adjusted this comment
> + GH_BITMAP_WORD used_items[GH_BITMAP_WORDS]; /* A 1-bit for each used item
> + * in the items array */
>
> 'A 1-bit' -> One bit (A and 1 mean the same)
I think you might have misread this. We're storing a 1-bit for each
used item rather than a 0-bit. If I remove the 'A' then it's not
clear what the meaning of each bit's value is.
> + uint32 first_free_segment;
>
> Since the segment may not be totally free, maybe name the field first_segment_with_free_slot
I don't really like that. It feels too long to me.
> + * This is similar to GH_NEXT_ONEBIT but flips the bits before operating on
> + * each GH_BITMAP_WORD.
>
> It seems the only difference from GH_NEXT_ONEBIT is in this line:
>
> + GH_BITMAP_WORD word = ~(words[wordnum] & mask); /* flip bits */
>
> If a 4th parameter is added to signify whether the flipping should be done, these two functions can be unified.
I don't want to do that. I'd rather have them separate to ensure the
compiler does not create any additional needless branching. Those
functions are pretty hot.
> + * next insert will store in this segment. If it's already pointing to an
> + * earlier segment, then leave it be.
>
> The last sentence is confusing: first_free_segment cannot point to some segment and earlier segment at the same
time.
> Maybe drop the last sentence.
I've adjusted this comment to become:
* Check if we need to lower the next_free_segment. We want new inserts
* to fill the lower segments first, so only change the first_free_segment
* if the removed entry was stored in an earlier segment.
Thanks for having a look at this.
I'll attach an updated patch soon.
David