On Tue, Aug 22, 2017 at 8:14 AM, amul sul <sulamul@gmail.com> wrote: > Attaching patch 0002 for the reviewer's testing.
I think that this 0002 is not something we can think of committing because there's no guarantee that hash functions will return the same results on all platforms. However, what we could and, I think, should do is hash some representative values of each data type and verify that hash(x) and hashextended(x, 0) come out the same at least as to the low-order 32 bits -- and maybe that hashextend(x, 1) comes out different as to the low-order 32 bits. The easiest way to do this seems to be to cast to bit(32). For example:
SELECT v, hashint4(v)::bit(32) as standard, hashint4extended(v, 0)::bit(32) as extended0, hashint4extended(v, 1)::bit(32) as extended1 FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v) WHERE hashint4(v)::bit(32) != hashint4extended(v, 0)::bit(32) OR hashint4(v)::bit(32) = hashint4extended(v, 1)::bit(32);
I suggest adding a version of this for each data type.
Thanks for the suggestion, I have updated 0002-patch accordingly.
Using this I found some strange behaviours as follow:
1) standard and extended0 output for the jsonb_hash case is not same.
2) standard and extended0 output for the hash_range case is not same when input
is int4range(550274, 1550274) other case in the patch are fine. This can be
reproducible with other values as well, for e.g. int8range(1570275, 208112489).
Will look into this tomorrow.
Another case which I want to discuss is, extended and standard version of
hashfloat4, hashfloat8 & hash_numeric function will have the same output for zero
value irrespective of seed value. Do you think we need a fix for this?
From your latest version of 0001, I get:
rangetypes.c:1297:8: error: unused variable 'rngtypid' [-Werror,-Wunused-variable] Oid rngtypid = RangeTypeGetOid(r);
Fixed in the attached version.
I suggest not duplicating the comments from each regular function into the extended function, but just writing /* Same approach as hashfloat8 */ when the implementation is non-trivial (no need for this if the extended function is a single line or the original function had no comments anyway).
Fixed in the attached version.
hash_aclitem seems embarrassingly stupid. I suggest that we make the extended version slightly less stupid -- i.e. if the seed is non-zero, actually call hash_any_extended on the sum and pass the seed through.
* Reset info about hash function whenever we pick up new info about * equality operator. This is so we can ensure that the hash function * matches the operator. */ typentry->flags &= ~(TCFLAGS_CHECKED_HASH_PROC); + typentry->flags &= ~(TCFLAGS_CHECKED_HASH_EXTENDED_PROC);