Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop - Mailing list pgsql-bugs

From Tomas Vondra
Subject Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Date
Msg-id 0c3d7ca9-1619-8b78-0a33-277a08db335a@2ndquadrant.com
Whole thread Raw
In response to Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop  (Andres Freund <andres@anarazel.de>)
List pgsql-bugs

On 12/06/2017 10:12 PM, Andres Freund wrote:
> On 2017-12-06 22:05:22 +0100, Tomas Vondra wrote:
>> On 12/06/2017 09:46 PM, Andres Freund wrote:
>>> On 2017-12-06 21:38:42 +0100, Tomas Vondra wrote:
>>>> It's one thing when the hash table takes longer to lookup something or
>>>
>>> longer aka "forever".
>>>
>>
>> Not necessarily.
> 
>> The datasets I shared are somewhat extreme in the sense that there are
>> many contiguous sequences of hash values, but it only takes one such
>> sequence with at least SH_GROW_MAX_MOVE values to trigger the issue. So
>> the hash table may still be perfectly fine for most keys, and only
>> slightly slower for the keys in the sequence.
> 
> Meh, we're talking about adversarial attacks here.
> 
> 
>>>> when it consumes a bit more memory. Say, ~2x more than needed, give or
>>>> take. I'm perfectly fine with that, particularly when it's a worst-case
>>>> evil data set like this one.
>>>
>>> I think the way to prevent that kind of attack is to add randomization.
>>>
>>
>> By randomization you mean universal hashing [1], or something else?
> 
> No, adding in a random seed to the hash. It'd be a lot better if we
> had a way to provide internal state to the hashfunction, but even
> just using hash_combine()ing a random number into a hash would be a
> lot better than what we're doing now.
> 

The universal hashing (of "our" hash value) would have the benefit that
we could change it when growing the hash table. I.e. do this

    h(x) = ((A * hash_any(x) + B) mod P)) mod M

and generate new A, B, P whenever we grow the hashtable. That would
further reduce the probability of either of the failures described in
this thread (I mean, hitting it again after growing the table).

I don't know if it's worth the additional CPU cycles, of course.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Next
From: "Todd A. Cook"
Date:
Subject: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop