On 2017-12-06 22:05:22 +0100, Tomas Vondra wrote:
> On 12/06/2017 09:46 PM, Andres Freund wrote:
> > On 2017-12-06 21:38:42 +0100, Tomas Vondra wrote:
> >> It's one thing when the hash table takes longer to lookup something or
> >
> > longer aka "forever".
> >
>
> Not necessarily.
> The datasets I shared are somewhat extreme in the sense that there are
> many contiguous sequences of hash values, but it only takes one such
> sequence with at least SH_GROW_MAX_MOVE values to trigger the issue. So
> the hash table may still be perfectly fine for most keys, and only
> slightly slower for the keys in the sequence.
Meh, we're talking about adversarial attacks here.
> >> when it consumes a bit more memory. Say, ~2x more than needed, give or
> >> take. I'm perfectly fine with that, particularly when it's a worst-case
> >> evil data set like this one.
> >
> > I think the way to prevent that kind of attack is to add randomization.
> >
>
> By randomization you mean universal hashing [1], or something else?
No, adding in a random seed to the hash. It'd be a lot better if we had
a way to provide internal state to the hashfunction, but even just using
hash_combine()ing a random number into a hash would be a lot better than
what we're doing now.
Greetings,
Andres Freund