Home > mailing lists

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop - Mailing list pgsql-bugs

From	Andres Freund
Subject	Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Date	January 29, 2018 22:32:20
Msg-id	20180129193219.fk5vnliembhacuys@alap3.anarazel.de Whole thread Raw
In response to	Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-bugs

Tree view

Hi,

On 2018-01-29 13:45:10 -0500, Tom Lane wrote:
> Also, the fact that you needed one more ORDER BY in testing on a
> single machine is already proof of my worry about the regression
> tests, and it'll only get worse when testing across a variety
> of machines.

That's an entirely unconvincing argument. A lot of changes to hashtable
code will result in ordering changes, and has done so in the past. Sure
there occasionally was need for a followup ORDER BY addition, but that's
not that bad. The fact that only a single order by was needed when
entirely changing the hash function output (by randomizing it) doesn't
at all seem to suggest that it's a huge problem.

On 2018-01-29 14:11:26 -0500, Tom Lane wrote:
> One other point here is that it's not really clear to me what a randomly
> varying IV is supposed to accomplish.  Surely we're not intending that
> it prevents somebody from crafting a data set that causes bad hash
> performance.

I do think we should make that harder.

> If a user with DB access wants to cause a performance
> problem, there are and always will be plenty of other avenues to making
> that happen.

Sure, but that's different from somebody triggering, e.g. by website
input, such behaviour.

> If the idea is that for a data set that otherwise would have
> bad hash performance, choosing a different IV would (almost always) fix
> it, that sounds good but you're ignoring the inverse case: for a data set
> that works fine, there would be some choices of IV that create a problem
> where there was none before.  I see no reason to think that the
> probability of the former kind of situation is higher than the latter.

It's easily intentionally triggerable vs. not.

> So I'm on board with using the extended hash functions when available,
> but I'm not convinced that a varying IV buys us anything but trouble.

I think we really need both. A constant IV of e.g. 0 for a single-column
hashtable doesn't really fix much, even though it does improve hash
quality a bit for multi-column indexes. A random IV does address issues
due to a lot of tuples falling into the same bucket, but does not
address actual collisions. Both together are pretty convincing tho.

Greetings,

Andres Freund

pgsql-bugs by date:

From: Andres Freund
Date: 29 January 2018, 22:26:47
Subject: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

From: Greg Stark
Date: 30 January 2018, 01:16:24
Subject: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop - Mailing list pgsql-bugs

Previous

Next