Re: [HACKERS] Performance degradation in TPC-H Q18 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Performance degradation in TPC-H Q18
Date
Msg-id CA+TgmoZrytS7ZTYXWqU9yznCsoMc3h-UV7jWAoWZcUwcGO5+DQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Performance degradation in TPC-H Q18  (Andres Freund <andres@anarazel.de>)
Responses Re: [HACKERS] Performance degradation in TPC-H Q18  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Wed, Mar 1, 2017 at 8:48 AM, Andres Freund <andres@anarazel.de> wrote:
>> BTW, another option to consider is lowering the target fillfactor.
>> IIRC, Kuntal mentioned to me that cranking it down seemed to fix the
>> issue.  Obviously, it's better to detect when we need a lower
>> fillfactor than to always use a lower one, but obviously the tighter
>> you pack the hash table, the more likely it is that you're going to
>> have these kinds of problems.
>
> Yea, that'd be one approach, but I feel better dynamically increasing
> the fillfactor like in the patch. Even with a lower fillfactor you could
> see issues, and as you say a higher fillfactor is nice [TM]; after the
> patch I played with *increasing* the fillfactor, without being able to
> measure a downside.

I am going to bet that increasing the fillfactor would be a mistake.
The expected length of a clump in the table is going to be about
1/(1-ff), which increases very quickly beyond the current value of
0.8.  For example, if the actual fillfactor is 0.9 and the keys are
perfectly distributed -- nine in a row and then one empty slot,
lather, rinse, repeat -- probing for a key that's not there has a 10%
chance of hitting an empty slot, a 10% chance of hitting the slot just
before an empty slot, and so on.  So the expected number of probes to
find that there's no match is 4.5.  However, it could easily happen
that you have 3 or 4 empty slots in a row in one place and 27 or 36
occupied slots in a row in another place, and that could cause all
kinds of grief.  Moreover, real-world hash functions on real-world
data aren't always perfect, which increases the chances of things
going off track.  I'm not exactly sure how high a fillfactor is too
high, but note that
https://en.wikipedia.org/wiki/Hash_table#Open_addressing claims that
"performance dramatically degrades when the load factor grows beyond
0.7 or so", which isn't cited but the same text appears in
https://www.unf.edu/~wkloster/3540/wiki_book2.pdf for whatever that's
worth.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [HACKERS] Performance degradation in TPC-H Q18
Next
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] perlcritic