On 11/28/17 13:14, Andres Freund wrote:
> On 2017-11-28 09:55:13 -0500, Todd A. Cook wrote:
>> On 11/27/17 23:03, Tom Lane wrote:
>>>
>>> Note that the sample data has a lot of collisions:
>>>
>>> regression=# select hashint8(val), count(*) from reproducer group by 1 order by 2 desc;
>>> hashint8 | count
>>> -------------+-------
>>> 441526644 | 2337
>>> -1117776826 | 1221
>>> -1202007016 | 935
>>> -2068831050 | 620
>>> 1156644653 | 538
>>> 553783815 | 510
>>> 259780770 | 444
>>> 371047036 | 394
>>> 915722575 | 359
>>> ... etc etc ...
>>
>> In case it matters, the complete data set will have some outlier values with 10k to 100k
>> collisions in this column.
>
> To make sure we're on the same page, this is data intentionally created
> to have a lot of hash collisions, is that right?
More or less. It might be more accurate to say that it's created in such a way that
we expect to get lots of collisions.
-- todd