Re: Comment fix and question about dshash.c - Mailing list pgsql-hackers

From Antonin Houska
Subject Re: Comment fix and question about dshash.c
Date
Msg-id 15639.1540639321@localhost
Whole thread Raw
In response to Re: Comment fix and question about dshash.c  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: Comment fix and question about dshash.c
List pgsql-hackers
Thomas Munro <thomas.munro@enterprisedb.com> wrote:

> On Sat, Oct 27, 2018 at 12:30 AM Antonin Houska <ah@cybertec.at> wrote:
>
> Thanks, will fix.
>
> > 2. Can anyone please explain this macro?
> >
> > /* Max entries before we need to grow.  Half + quarter = 75% load factor. */
> > #define MAX_COUNT_PER_PARTITION(hash_table)                             \
> >         (BUCKETS_PER_PARTITION(hash_table->size_log2) / 2 + \
> >          BUCKETS_PER_PARTITION(hash_table->size_log2) / 4)
> >
> > I'm failing to understand why the maximum number of hash table entries in a
> > partition should be smaller than the number of buckets in that partition.
> >
> > The fact that MAX_COUNT_PER_PARTITION refers to entries and not buckets is
> > obvious from this condition in dshash_find_or_insert()
> >
> >         /* Check if we are getting too full. */
> >         if (partition->count > MAX_COUNT_PER_PARTITION(hash_table))
>
> Are you saying there is a bug in this logic (which is nbuckets * 0.75
> expressed as integer maths), or saying that 0.75 is not a good maximum
> load factor?  I looked around at a couple of general purpose hash
> tables and saw that some used 0.75 and some used 1.0, as a wasted
> space-vs-collision trade-off.  If I have my maths right[1], with 0.75
> you expect to have 75 entries in ~53 buckets, but with 1.0 you expect
> to have 100 entries in ~64 buckets.

I don't know how exactly you apply the [1] formula (what is "n" and what is
"N" in your case?), but my consideration was much simpler: For example, if
BUCKETS_PER_PARTITION returns 8 (power of 2 is expected here and also more
convenient), then MAX_COUNT_PER_PARTITION returns 8 / 2 + 8 / 4 = 6. Thus the
hashtable gets resized if we're going to add the 7th entry to the partition,
i.e. we the number of entries in the partition is lower than the number of
buckets. Is that o.k.?

> [1] https://math.stackexchange.com/questions/470662/average-number-of-bins-occupied-when-throwing-n-balls-into-n-bins

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Remove obsolete pg_attrdef.adsrc column
Next
From: Antonin Houska
Date:
Subject: Re: Comment fix and question about dshash.c