Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop - Mailing list pgsql-bugs

From Tomas Vondra
Subject Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Date
Msg-id 06523583-b9c5-1424-ef0a-1fc8063e9ed7@2ndquadrant.com
Whole thread Raw
In response to Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop  (Andres Freund <andres@anarazel.de>)
Responses Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop  (Andres Freund <andres@anarazel.de>)
List pgsql-bugs

On 12/06/2017 09:00 PM, Andres Freund wrote:
> 
> 
> On December 6, 2017 11:57:44 AM PST, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>
>>
>> On 12/06/2017 08:35 PM, Andres Freund wrote:
>>> On 2017-12-06 20:32:24 +0100, Tomas Vondra wrote:
>>>> On 12/05/2017 11:01 PM, Tomas Vondra wrote:
>>>>>
>>>>> Based on the initial discussion, the problem here is twofold.
>>>>>
>>>>> Firstly, the code assumes that if it gets certain number of bucket
>>>>> collisions (essentially, the initial bucket and certain number of
>>>>> neighboring buckets already being full), making the table larger is
>>>>> guaranteed to reduce the number of collisions.
>>>>>
>>>>> Which is obviously untrue for duplicate hash values, but it may
>> also
>>>>> happen for distinct hash values that form a chain (think a sequence
>> of
>>>>> hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed).
>>>
>>>> I've managed to construct (by sheer brute force) an example that
>> breaks
>>>> simplehash in this way. It triggers the growth by having a sequence
>> of
>>>> distinct but consecutive hash values, e.g. 1, 2, 3, 4, ..., 1000,
>> and
>>>> then trying to insert a value with duplicate hash (say, 4).
>>>
>>> I fail to be excited by this. That's possible for any sort of
>> hashtable
>>> in some way. You might hit issues due to resizing or due to lookup
>>> performance, but you'll hit problem. That's a fairly fundamental
>> issue
>>> of pure hashtables.
>>>
>>
>> Eh? I think you misunderstood the issue this triggers. I'm perfectly
>> fine with poor lookup performance - that's expected and reasonable. But
>> it actually triggers the same behavior as the already presented
>> example,
>> i.e. the hash table grows indefinitely (so eventually gets OOM).
>>
>> I really doubt allocating 3GB of memory (which is when my laptop gives
>> up and kills the query) to do distinct on 50MB table is a sane
>> behavior.
>> Even if the table is constructed in so intentionally evil way.
> 
> No, I got that. I just don't think time and memory are that different.
>
It's one thing when the hash table takes longer to lookup something or
when it consumes a bit more memory. Say, ~2x more than needed, give or
take. I'm perfectly fine with that, particularly when it's a worst-case
evil data set like this one.

But it's a very different situation when it consumes tens of GBs of
memory (essentially building the largest possible hash table, despite
the load factor dropping to 0), because the code grows the hash table
assuming it will magically resolve the collisions. And even if the
machine has enough RAM, it's guaranteed to fail we'll try growing the
table beyond that limit.

I don't quite understand why should that be OK? It doesn't seem very
sane to me.

FWIW I've constructed the data sets for two reasons - to convince myself
that my understanding of the simplehash code is correct, and to provide
a data set triggering the other growth condition in simplehash code. My
understanding is that if we stop growing the table after the load factor
drops below some threshold (as TL proposed earlier in this thread), it
should address both of these cases.

cheers

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop
Next
From: Andres Freund
Date:
Subject: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop