Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop - Mailing list pgsql-bugs

From Tomas Vondra
Subject Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Date
Msg-id 67e5af9b-f509-7b9e-f7a1-e01112f6c18d@2ndquadrant.com
Whole thread Raw
In response to Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop  (Andres Freund <andres@anarazel.de>)
Responses Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop  (Andres Freund <andres@anarazel.de>)
List pgsql-bugs

On 12/06/2017 09:46 PM, Andres Freund wrote:
> On 2017-12-06 21:38:42 +0100, Tomas Vondra wrote:
>> It's one thing when the hash table takes longer to lookup something or
> 
> longer aka "forever".
> 

Not necessarily.

The datasets I shared are somewhat extreme in the sense that there are
many contiguous sequences of hash values, but it only takes one such
sequence with at least SH_GROW_MAX_MOVE values to trigger the issue. So
the hash table may still be perfectly fine for most keys, and only
slightly slower for the keys in the sequence.

> 
>> when it consumes a bit more memory. Say, ~2x more than needed, give or
>> take. I'm perfectly fine with that, particularly when it's a worst-case
>> evil data set like this one.
> 
> I think the way to prevent that kind of attack is to add randomization.
> 

By randomization you mean universal hashing [1], or something else?

[1] https://en.wikipedia.org/wiki/Universal_hashing

In any case, randomization should make it much harder (or pretty much
impossible) to construct datasets that are guaranteed to cause failures
of this kind. No problem with that (although if someone hits the issue,
it will be more difficult to reproduce).

It would still be possible to hit it by chance, but that should be very
unlikely (OTOH the current code assumes the same thing about collisions
and sequences, and here we are.)

> 
>> FWIW I've constructed the data sets for two reasons - to convince myself
>> that my understanding of the simplehash code is correct, and to provide
>> a data set triggering the other growth condition in simplehash code. My
>> understanding is that if we stop growing the table after the load factor
>> drops below some threshold (as TL proposed earlier in this thread), it
>> should address both of these cases.
> 
> Yea, I'm not adverse to adding a few stopgaps that break in a less
> annoying manner. WAll I'm saying is that I don't think we need to be
> super concerned about this specific way of breaking things.
> 

OK, understood.


cheers

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-bugs by date:

Previous
From: Stephen Frost
Date:
Subject: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Next
From: Andres Freund
Date:
Subject: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop