Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop
Date
Msg-id 790BC80E-D8E5-499B-A1E3-8EE0752B53ED@anarazel.de
Whole thread Raw
In response to Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-bugs

On December 6, 2017 11:57:44 AM PST, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
>
>On 12/06/2017 08:35 PM, Andres Freund wrote:
>> On 2017-12-06 20:32:24 +0100, Tomas Vondra wrote:
>>> On 12/05/2017 11:01 PM, Tomas Vondra wrote:
>>>>
>>>> Based on the initial discussion, the problem here is twofold.
>>>>
>>>> Firstly, the code assumes that if it gets certain number of bucket
>>>> collisions (essentially, the initial bucket and certain number of
>>>> neighboring buckets already being full), making the table larger is
>>>> guaranteed to reduce the number of collisions.
>>>>
>>>> Which is obviously untrue for duplicate hash values, but it may
>also
>>>> happen for distinct hash values that form a chain (think a sequence
>of
>>>> hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed).
>>
>>> I've managed to construct (by sheer brute force) an example that
>breaks
>>> simplehash in this way. It triggers the growth by having a sequence
>of
>>> distinct but consecutive hash values, e.g. 1, 2, 3, 4, ..., 1000,
>and
>>> then trying to insert a value with duplicate hash (say, 4).
>>
>> I fail to be excited by this. That's possible for any sort of
>hashtable
>> in some way. You might hit issues due to resizing or due to lookup
>> performance, but you'll hit problem. That's a fairly fundamental
>issue
>> of pure hashtables.
>>
>
>Eh? I think you misunderstood the issue this triggers. I'm perfectly
>fine with poor lookup performance - that's expected and reasonable. But
>it actually triggers the same behavior as the already presented
>example,
>i.e. the hash table grows indefinitely (so eventually gets OOM).
>
>I really doubt allocating 3GB of memory (which is when my laptop gives
>up and kills the query) to do distinct on 50MB table is a sane
>behavior.
>Even if the table is constructed in so intentionally evil way.

No, I got that. I just don't think time and memory are that different.

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


pgsql-bugs by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Next
From: Tomas Vondra
Date:
Subject: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop