Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop - Mailing list pgsql-bugs

From Tomas Vondra
Subject Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Date
Msg-id 7de4b2d6-b119-7d15-04d8-4c0fcb29cbc3@2ndquadrant.com
Whole thread Raw
In response to Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop  (Andres Freund <andres@anarazel.de>)
Responses Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop  (Andres Freund <andres@anarazel.de>)
List pgsql-bugs
On 12/06/2017 06:19 PM, Andres Freund wrote:
> On 2017-12-06 12:14:24 -0500, Tom Lane wrote:
>> Checking out hashint8() on random data shows no such obvious fault.
>> You've apparently got a data set that exposes a weakness in hashint8,
>> but it's not very clear what that is.
> 
> It's intentionally designed to cause problems afaict:
> 
> http://archives.postgresql.org/message-id/861b9f1f-cdc0-bc49-2595-80bc39c37dc3%40blackducksoftware.com
> 
> 
>> In any case, the hashtable code needs to not fall over in the
>> presence of a lot of collisions, regardless of the exact reason
>> for there being a lot.
> 
> Yes, we need to be more resilient about it. Working on a patch.
> 

I've been playing with this a bit today (sorry for trespassing on your
turf a bit, but curiosity killed the cat), particularly with the tree
basic ideas mentioned in this thread. Let me share some numbers.

1) randomized hashing, i.e. using the extended hash functions and a
random seed

2) universal hashing [1], i.e. passing the hash through ((a*h+b) mod p)
formula, where h is "our" hash, and (a,b) are random and p is a prime.
This can't possibly fix the collisions reported by Todd, because the
result will remain the same. But it should fix "my" chains of values by
spreading them a bit (and the a,b,p values are generated on each grow,
so in the unlikely case where we get a chain, it should disappear after
a growth)

[1] https://en.wikipedia.org/wiki/Universal_hashing

3) disabling growth of the hash table when the fillfactor would drop too
much (e.g. below 0.2)

Very rough patches for these three approaches are attached.


Now, some very rough numbers comparing the impact of those approaches
(and their combinations). The following two tables present timings for
for a couple of datasets - five cases without any collisions (allowing
us to quantify impact on regular datasets), the two datasets generated
by me (one without collisions, one with collisions), and the two issues
reported by Todd.

RH - randomized hashing (seed + extended)
UH - universal hashing
ST - stop growing

             master     RH     UH     ST    RH     UH    ST
    -------------------------------------------------------
    Ok/1        542    540    552    537    0%     2%   -1%
    Ok/2        232    221    242    237   -5%     4%    2%
    Ok/3        161    167    180    161    4%    12%    0%
    Ok/4        276    270    285    274   -2%     3%   -1%
    Ok/5        174    179    193    174    3%    11%    0%
    Tomas/1     543    555    551    533    2%     1%   -2%
    Tomas/2     OOM   2322   3251      -
    Todd/1      OOM    OOM     63     63
    Todd/2      OOM    OOM    OOM     72


This is mostly as expected, although I'm surprised that the universal
hashing seems to fix Todd's first reproducer. The "stop growing"
approach technically fixes "my" dataset, but it takes so much time I
haven't timed it.

In terms of slowdown (compared to master), I'd say RH and ST are mostly
neutral. Universal hashing has significant impact in some cases, but I
suppose that might be fixed by using a slightly different scheme
(instead of using simple modulo arithmetic).

Now, let's see on combinations of the approaches (broken into two
tables, so that it doesn't get mangled by mail clients):

             master  RH+UH  RH+ST  UH+ST  RH+UH+ST
    Ok/1        542    553    545    549       558
    Ok/2        232    241    234    241       250
    Ok/3        161    180    170    180       187
    Ok/4        276    274    270    285       290
    Ok/5        174    187    182    193       198
    Tomas/1     543    571    561    554       570
    Tomas/2       -   2380   2340   3130      2380
    Todd/1        -     61     65     63        64
    Todd/2        -      -     90     91        93

              RH+UH   RH+ST  UH+ST  RH+UH+ST
    Ok/1         2%      1%     1%        3%
    Ok/2         4%      1%     4%        8%
    Ok/3        12%      6%    12%       16%
    Ok/4        -1%     -2%     3%        5%
    Ok/5         7%      5%    11%       14%
    Tomas/1      5%      3%     2%        5%

It seems only the "stop growth" has to be part of the solution, as it's
included in any combination fixing all cases shared in this thread. I'd
say (randomized hashing + stop growing) seems like the best option.

FWIW I do agree the data sets shared in this thread are pretty extreme
and it doesn't make much sense to slow the regular cases. I'll be
perfectly happy if we stop the OOM, making those cases fast is a bonus.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-bugs by date:

Previous
From: benthorner@hotmail.co.uk
Date:
Subject: BUG #14957: service initdb fails (initdbcmd run in background)
Next
From: Devrim Gündüz
Date:
Subject: Re: BUG #14957: service initdb fails (initdbcmd run in background)