Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop - Mailing list pgsql-bugs

From Tomas Vondra
Subject Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Date
Msg-id ebe8b411-006e-7863-3c60-67ddafe1c636@2ndquadrant.com
Whole thread Raw
In response to Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop  (Andres Freund <andres@anarazel.de>)
Responses Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop  (Andres Freund <andres@anarazel.de>)
List pgsql-bugs

On 12/06/2017 08:35 PM, Andres Freund wrote:
> On 2017-12-06 20:32:24 +0100, Tomas Vondra wrote:
>> On 12/05/2017 11:01 PM, Tomas Vondra wrote:
>>>
>>> Based on the initial discussion, the problem here is twofold.
>>>
>>> Firstly, the code assumes that if it gets certain number of bucket
>>> collisions (essentially, the initial bucket and certain number of
>>> neighboring buckets already being full), making the table larger is
>>> guaranteed to reduce the number of collisions.
>>>
>>> Which is obviously untrue for duplicate hash values, but it may also
>>> happen for distinct hash values that form a chain (think a sequence of
>>> hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed).
> 
>> I've managed to construct (by sheer brute force) an example that breaks
>> simplehash in this way. It triggers the growth by having a sequence of
>> distinct but consecutive hash values, e.g. 1, 2, 3, 4, ..., 1000, and
>> then trying to insert a value with duplicate hash (say, 4).
> 
> I fail to be excited by this. That's possible for any sort of hashtable
> in some way. You might hit issues due to resizing or due to lookup
> performance, but you'll hit problem. That's a fairly fundamental issue
> of pure hashtables.
> 

Eh? I think you misunderstood the issue this triggers. I'm perfectly
fine with poor lookup performance - that's expected and reasonable. But
it actually triggers the same behavior as the already presented example,
i.e. the hash table grows indefinitely (so eventually gets OOM).

I really doubt allocating 3GB of memory (which is when my laptop gives
up and kills the query) to do distinct on 50MB table is a sane behavior.
Even if the table is constructed in so intentionally evil way.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Next
From: Andres Freund
Date:
Subject: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop