Re: Comment fix and question about dshash.c - Mailing list pgsql-hackers

From Antonin Houska
Subject Re: Comment fix and question about dshash.c
Date
Msg-id 16250.1540641063@localhost
Whole thread Raw
In response to Re: Comment fix and question about dshash.c  (Antonin Houska <ah@cybertec.at>)
Responses Re: Comment fix and question about dshash.c  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
Antonin Houska <ah@cybertec.at> wrote:

> Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>
> > On Sat, Oct 27, 2018 at 12:30 AM Antonin Houska <ah@cybertec.at> wrote:
> >
> > Are you saying there is a bug in this logic (which is nbuckets * 0.75
> > expressed as integer maths), or saying that 0.75 is not a good maximum
> > load factor?  I looked around at a couple of general purpose hash
> > tables and saw that some used 0.75 and some used 1.0, as a wasted
> > space-vs-collision trade-off.  If I have my maths right[1], with 0.75
> > you expect to have 75 entries in ~53 buckets, but with 1.0 you expect
> > to have 100 entries in ~64 buckets.
>
> I don't know how exactly you apply the [1] formula (what is "n" and what is
> "N" in your case?), but my consideration was much simpler: For example, if
> BUCKETS_PER_PARTITION returns 8 (power of 2 is expected here and also more
> convenient), then MAX_COUNT_PER_PARTITION returns 8 / 2 + 8 / 4 = 6. Thus the
> hashtable gets resized if we're going to add the 7th entry to the partition,
> i.e. we the number of entries in the partition is lower than the number of
> buckets. Is that o.k.?

Well, it may be o.k. I've just checked what the fill factor means in hash
index and it's also the number of entries divided by the number of buckets.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com


pgsql-hackers by date:

Previous
From: Antonin Houska
Date:
Subject: Re: Comment fix and question about dshash.c
Next
From: Tomas Vondra
Date:
Subject: Re: Comment fix and question about dshash.c