Re: General purpose hashing func in pgbench - Mailing list pgsql-hackers

From Ildar Musin
Subject Re: General purpose hashing func in pgbench
Date
Msg-id 75f45845-24ed-347b-66d5-2f18d39c6793@postgrespro.ru
Whole thread Raw
In response to Re: General purpose hashing func in pgbench  (Teodor Sigaev <teodor@sigaev.ru>)
Responses Re: General purpose hashing func in pgbench  (Ildar Musin <i.musin@postgrespro.ru>)
List pgsql-hackers
Hello Teodor,

Thank you for reviewing this patch.

On 06.03.2018 15:53, Teodor Sigaev wrote:
>>> Patch applies, compiles, pgbench & global "make check" ok, doc
>>> built ok.
>
> Agree.
>
> If I understand upthread correctly, implementation of Murmur hash
> algorithm based on Austin Appleby work
> https://github.com/aappleby/smhasher/blob/master/src/MurmurHash2.cpp
>
> If so, I have notice and objections:
>
> 1) Seems, it's good idea to add credits to Austin Appleby to
> comments.
>

Sounds fair, I'll send an updated version soon.

> 2) Reference implementaion directly says (link above): // 2. It will
> not produce the same results on little-endian and big-endian //
> machines.
>
> I don't think that is good thing for testing and benchmarking for
> several reasons: it could produce different data collection,
> different selects, different distribution.
>
> 3) Again, from comments of reference implementation: // Note - This
> code makes a few assumptions about how your machine behaves - // 1.
> We can read a 4-byte value from any address without crashing
>
> It's not true for all supported platforms. Any box with strict
> aligment will SIGBUSed here.
>

I think that both points refer to the fact that original algorithm
accepts a byte string as an input, slices it up by 8 bytes and form
unsigned int values from it. In that case the order of bytes in the
input string does matter since it may result in different integers on
different architectures. And it is also fair requirement for a byte
string to be aligned as some architectures cannot handle unaligned data.
In this patch though I slightly modified the original algorithm in a way
that it takes unsigned ints as an input (not byte string), so neither of
this points should be a problem as it seems to me. But I'll double check
it on big-endian machine with strict alignment (Sparc).

Thanks!

-- 
Ildar Musin
i.musin@postgrespro.ru


pgsql-hackers by date:

Previous
From: Darafei "Komяpa" Praliaskouski
Date:
Subject: Re: All Taxi Services need Index Clustered Heap Append
Next
From: Arthur Zakirov
Date:
Subject: Re: [HACKERS] Another oddity in handling of WCO constraints inpostgres_fdw