Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Date
Msg-id 20180126234537.dyu6jwnqz7snpq7n@alap3.anarazel.de
Whole thread Raw
In response to Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop
List pgsql-bugs
Hi,

On 2017-12-10 23:09:42 +0100, Tomas Vondra wrote:
> 1) randomized hashing, i.e. using the extended hash functions and a
> random seed

Given the lack of extended hash functions for everything this doesn't
seem quite viable. But I think we can make it work quite easily
nonetheless - we can just do the combining with a random seed outside of
the hashtable. We kinda already do so (with a really crappy combiner) in
TupleHashTableHash(), except that we'd actually have to use a random
seed.

This seems like something we should do, just out of defensiveness. It'll
not be as good as persisting the hashfunction's state across calls, but
well, ...

Let me write a patch for that.


> 2) universal hashing [1], i.e. passing the hash through ((a*h+b) mod p)
> formula, where h is "our" hash, and (a,b) are random and p is a prime.
> This can't possibly fix the collisions reported by Todd, because the
> result will remain the same. But it should fix "my" chains of values by
> spreading them a bit (and the a,b,p values are generated on each grow,
> so in the unlikely case where we get a chain, it should disappear after
> a growth)

This seems on the too expensive side of things for my taste to do
unconditionally.


> 3) disabling growth of the hash table when the fillfactor would drop too
> much (e.g. below 0.2)

Yea, I think we should just apply something like this.


> FWIW I do agree the data sets shared in this thread are pretty extreme
> and it doesn't make much sense to slow the regular cases. I'll be
> perfectly happy if we stop the OOM, making those cases fast is a bonus.

Yea, agreed on that. I'm kinda inclined to go for stop-growing in 10,
and so something better in 11. And then later possibly backpatch if
we've grown some confidence?

Greetings,

Andres Freund


pgsql-bugs by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Next
From: Guo Xiang Tan
Date:
Subject: Re: BUG #15032: Segmentation fault when running a particular query