Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop - Mailing list pgsql-bugs

From Tomas Vondra
Subject Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Date
Msg-id c15378b3-d4aa-a5c0-5538-dd52904dcbc6@2ndquadrant.com
Whole thread Raw
In response to Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop  ("Todd A. Cook" <tcook@blackducksoftware.com>)
Responses Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop  ("Todd A. Cook" <tcook@blackducksoftware.com>)
List pgsql-bugs

On 12/05/2017 11:33 PM, Todd A. Cook wrote:
> On 12/05/17 17:01, Tomas Vondra wrote:
>>
>>
>> On 12/05/2017 10:23 PM, Todd A. Cook wrote:
>>> On 11/27/17 23:03, Tom Lane wrote:
>>>> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>>>>> When SH_INSERT tries to insert that final extra value, insertdist
>>>>> keeps exceeding SH_GROW_MAX_DIB (25) no matter how many times we
>>>>> double the size (at least until my computer gives up, somewhere around
>>>>> 11 doublings and 75GB of virtual memory).  If you set SH_GROW_MAX_DIB
>>>>> to 26 then it succeeds, but I guess some other attack could be crafted
>>>>> for that.  What is the theory behind this parameter?
>>>>
>>>> You beat me to it --- after looking at simplehash.h I'd guessed that
>>>> either the SH_GROW_MAX_DIB or SH_GROW_MAX_MOVE code path was causing
>>>> an infinite loop, but I'd not gotten to determining which one yet.
>>>>
>>>> I'd ask what's the theory behind SH_GROW_MAX_MOVE, as well.  Neither
>>>> of them are obviously loop-proof.
>>>>
>>>> Note that the sample data has a lot of collisions:
>>>>
>>>> regression=# select hashint8(val), count(*) from reproducer group by 1
>>>> order by 2 desc;
>>>>     hashint8   | count
>>>> -------------+-------
>>>>      441526644 |  2337
>>>>    -1117776826 |  1221
>>>>    -1202007016 |   935
>>>>    -2068831050 |   620
>>>>     1156644653 |   538
>>>>      553783815 |   510
>>>>      259780770 |   444
>>>>      371047036 |   394
>>>>      915722575 |   359
>>>>    ... etc etc ...
>>>>
>>>> It's evidently more complicated than just that the code fails with
>>>> more than SH_GROW_MAX_DIB duplicate hashcodes, but I suspect not
>>>> by a lot.  There needs to be a safety valve that prevents letting
>>>> the hash fill factor approach zero, which is what's happening in
>>>> this test case.
>>>
>>> FWIW, I can also reproduce the infinite loop with 167834 unique values.
>>>
>>
>> Unique values or unique *hash* values?
> 
> Unique values.
> 
> 
>> Can you share the data, so that whoever fixes the bug can verify it also
>> fixes your example?
> 
> Sure.  It's attached.
> 

Seems the dataset has pretty much the same issue as the one reported
before, that is

select hashint8(val), count(distinct val), count(*) from temp_f_03 group
by 1 order by 2 desc;

      hashint8   | count | count
    -------------+-------+-------
     -1971396144 |    45 |    45
      2035896403 |    42 |    42
     -1633843397 |    30 |    30
      1425704662 |    29 |    29
      -455482779 |    22 |    22
      -300163565 |    17 |    17
     -1803963420 |    17 |    17
      -537082846 |    14 |    14
       603707034 |    13 |    13
      -176369887 |    12 |    12
      1274957136 |    11 |    11
      1465522632 |    11 |    11
     -1589862230 |    10 |    10
     -1145403239 |    10 |    10

i.e. there are many hash collisions (more than in the other data set).


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-bugs by date:

Previous
From: jarda.urik@gmail.com
Date:
Subject: BUG #14949: array_append() - performance issues (in update)
Next
From: kop@meme.com
Date:
Subject: BUG #14950: RPM EPEL PostGIS24_10 install breaks "yum update"