Home > mailing lists

Re: Comment fix and question about dshash.c - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Re: Comment fix and question about dshash.c
Date	October 28, 2018 05:27:11
Msg-id	CAEepm=3f5Uvu=6GB2PXOPEOtQ-EgqSKOxZ_emonhLsLYK5uAZg@mail.gmail.com Whole thread Raw
In response to	Re: Comment fix and question about dshash.c (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List	pgsql-hackers

Tree view

On Sun, Oct 28, 2018 at 1:22 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> On 10/27/2018 01:51 PM, Antonin Houska wrote:
> > Antonin Houska <ah@cybertec.at> wrote:
> >> Thomas Munro <thomas.munro@enterprisedb.com> wrote:
> >>> On Sat, Oct 27, 2018 at 12:30 AM Antonin Houska <ah@cybertec.at> wrote:
> >>> Are you saying there is a bug in this logic (which is nbuckets * 0.75
> >>> expressed as integer maths), or saying that 0.75 is not a good maximum
> >>> load factor?  I looked around at a couple of general purpose hash
> >>> tables and saw that some used 0.75 and some used 1.0, as a wasted
> >>> space-vs-collision trade-off.  If I have my maths right[1], with 0.75
> >>> you expect to have 75 entries in ~53 buckets, but with 1.0 you expect
> >>> to have 100 entries in ~64 buckets.
> >>
> >> I don't know how exactly you apply the [1] formula (what is "n" and what is
> >> "N" in your case?), but my consideration was much simpler: For example, if
> >> BUCKETS_PER_PARTITION returns 8 (power of 2 is expected here and also more
> >> convenient), then MAX_COUNT_PER_PARTITION returns 8 / 2 + 8 / 4 = 6. Thus the
> >> hashtable gets resized if we're going to add the 7th entry to the partition,
> >> i.e. we the number of entries in the partition is lower than the number of
> >> buckets. Is that o.k.?

n balls = the keys being hashed and insert
N bins = the hash table buckets

Unless we know of some special properties of the hash function and
input data, I believe we have to treat the hashes like uniformly
distributed random numbers when making predictions, hence the
balls-into-bins probability stuff that can tell you about the expected
distribution.

The expected number of occupied bins for 1 million balls into 1 million bins:

select 1000000.0 * (1.0 - pow(1.0 - (1.0 / 1000000), 1000000.0));
-> 632120.7427683549057142050000000

The same thing by counting distinct positive random numbers modulo 1 million:

select count(distinct (random() * 200000000)::int % 1000000) from
generate_series(1, 1000000) ss;
-> 632373, 632246, 631954, ...

Distinct hashes of the first 1 million integers (arbitrary keys)
modulo 1 million (buckets):

select count(distinct abs(hashoid(n)) % 1000000) from
generate_series(1, 1000000) ss(n);
-> 632115

(For a moment I was confused about getting a higher number until I
realised I need abs() to avoid doubling the effective number of
buckets...)

> > Well, it may be o.k. I've just checked what the fill factor means in hash
> > index and it's also the number of entries divided by the number of buckets.
> >
>
> Using load factor ~0.75 is definitely the right thing to do. One way to
> interpret it is "average chain length" (or whatever is the approach in
> that particular hash table implementations) and one of the main points
> of hash tables is to eliminate linear scans. We could pick a value
> closer to 1.0, but experience says it's not worth it - it's easy to get
> a annoyingly long chains due to hash collisions, in exchange for fairly
> minimal space savings.

Using the hashes of the first 1 million integers, let's try putting
them into different numbers of buckets, and see how many buckets we
get with each chain length:

WITH entries AS (SELECT abs(hashoid(generate_series(1, 1000000))) %
NBUCKETS AS bucket),
     lengths AS (SELECT bucket, COUNT(*) AS chain_length FROM entries
GROUP BY bucket)
SELECT chain_length, COUNT(*) AS count
  FROM lengths GROUP BY chain_length ORDER BY 2 DESC;

NBUCKETS = 1000000 (load factor 1.0)

 chain_length | count
--------------+--------
            1 | 367537
            2 | 184580
            3 |  61109
            4 |  15197
            5 |   3079
            6 |    509
            7 |     95
            8 |      7
            9 |      2

NBUCKETS = 1333333 (load factor 0.75)

 chain_length | count
--------------+--------
            1 | 472012
            2 | 177174
            3 |  44348
            4 |   8361
            5 |   1243
            6 |    143
            7 |      9
            8 |      2

NBUCKETS = 2000000 (load factor 0.5)

 chain_length | count
--------------+--------
            1 | 606451
            2 | 151741
            3 |  25226
            4 |   3148
            5 |    323
            6 |     28
            7 |      2

> That being said, I wonder if we should tweak NTUP_PER_BUCKET=1 in hash
> joins to a lower value. IIRC we ended up with 1.0 because originally it
> was set to 10.0, and we reduced it to 1.0 in 9.5 which gave us nice
> speedup. But I don't recall if we tried using even lower value (probably
> not). Maybe we don't need to do that because we only build the hash
> table at the very end, when we exactly know how many entries will it
> contain, so we don't need to do lookups and inserts at the same time,
> and we don't need to grow the hash table (at least in the non-parallel
> case). And we end up with 0.75 load factor on average, due to the
> doubling (the sizes are essentially uniformly distributed between
> 0.5+epsilon and 1.0-epsilon).

Yeah, it would indeed be interesting to experiment with load factors
lower than 1.0.  Every link in the chain is a cache miss.  In future
we should work on mitigating that by prefetching during both building
and probing (see nearby thread), but I suspect you can only do that
effectively for the bucket header and perhaps the first tuple in the
chain; if I'm right then longer chains will become even more expensive
relative to short ones.

By the way, aside from wasting memory, when you add extra buckets you
make full/right outer hash joins slower because they scan all the
buckets.

-- 
Thomas Munro
http://www.enterprisedb.com

pgsql-hackers by date:

From: Narayanan V
Date: 28 October 2018, 04:10:14
Subject: Re: PostgreSQL Limits and lack of documentation about them.

From: Amit Kapila
Date: 28 October 2018, 08:55:01
Subject: Re: [RFC] Removing "magic" oids

Re: Comment fix and question about dshash.c - Mailing list pgsql-hackers

Previous

Next