Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop - Mailing list pgsql-bugs

From Tomas Vondra
Subject Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Date
Msg-id 2a76784c-06cb-46ec-35be-9895d144b839@2ndquadrant.com
Whole thread Raw
In response to Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop  ("Todd A. Cook" <tcook@blackducksoftware.com>)
Responses Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop  ("Todd A. Cook" <tcook@blackducksoftware.com>)
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-bugs

On 12/05/2017 10:23 PM, Todd A. Cook wrote:
> On 11/27/17 23:03, Tom Lane wrote:
>> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>>> When SH_INSERT tries to insert that final extra value, insertdist
>>> keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we
>>> double the size (at least until my computer gives up, somewhere around
>>> 11 doublings and 75GB of virtual memory).  If you set SH_GROW_MAX_DIB
>>> to 26 then it succeeds, but I guess some other attack could be crafted
>>> for that.  What is the theory behind this parameter?
>>
>> You beat me to it --- after looking at simplehash.h I'd guessed that
>> either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing
>> an infinite loop, but I'd not gotten to determining which one yet.
>>
>> I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well.  Neither
>> of them are obviously loop-proof.
>>
>> Note that the sample data has a lot of collisions:
>>
>> regression=# select hashint8(val), count(*) from reproducer group by 1
>> order by 2 desc;
>>    hashint8   | count
>> -------------+-------
>>     441526644 |  2337
>>   -1117776826 |  1221
>>   -1202007016 |   935
>>   -2068831050 |   620
>>    1156644653 |   538
>>     553783815 |   510
>>     259780770 |   444
>>     371047036 |   394
>>     915722575 |   359
>>   ... etc etc ...
>>
>> It's evidently more complicated than just that the code fails with
>> more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not
>> by a lot.  There needs to be a safety valve that prevents letting
>> the hash fill factor approach zero, which is what's happening in
>> this test case.
> 
> FWIW, I can also reproduce the infinite loop with 167834 unique values.
> 

Unique values or unique *hash* values?

Can you share the data, so that whoever fixes the bug can verify it also
fixes your example?

> It kinda looks like the problematic collisions arise from masking the
> computed hash values; e.g.:
> 
>     SH_INITIAL_BUCKET(SH_TYPE * tb, uint32 hash)
>     {
>     return hash & tb->sizemask;
>     }
> 

That's pretty standard way to do modulo (when computing index of the
hash bucket), and certainly not the root cause.

> (Also FWIW, changing SH_FILLFACTOR to 0.5 (from 0.9) did not help any.)
> 

Based on the initial discussion, the problem here is twofold.

Firstly, the code assumes that if it gets certain number of bucket
collisions (essentially, the initial bucket and certain number of
neighboring buckets already being full), making the table larger is
guaranteed to reduce the number of collisions.

Which is obviously untrue for duplicate hash values, but it may also
happen for distinct hash values that form a chain (think a sequence of
hash values 1,2,3,4,5,6,7,...,K - that is never gonna get fixed).

This causes the insane growth to the largest possible hash table.

Secondly, there's a bug that for the largest hash table we set
sizemask=0, which means we never ever get bucket index other than 0.

This causes the infinite loop.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #14948: cost overflow
Next
From: "Todd A. Cook"
Date:
Subject: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop