Re: [HACKERS] Hash Functions - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Hash Functions
Date
Msg-id CA+TgmoYPvoTMGJYkVBA=2j1o1wpZ-WZCYiS7_B-AqqZBkWT4HQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Hash Functions  (amul sul <sulamul@gmail.com>)
Responses Re: [HACKERS] Hash Functions
List pgsql-hackers
On Tue, Aug 22, 2017 at 8:14 AM, amul sul <sulamul@gmail.com> wrote:
> Attaching patch 0002 for the reviewer's testing.

I think that this 0002 is not something we can think of committing
because there's no guarantee that hash functions will return the same
results on all platforms.  However, what we could and, I think, should
do is hash some representative values of each data type and verify
that hash(x) and hashextended(x, 0) come out the same at least as to
the low-order 32 bits -- and maybe that hashextend(x, 1) comes out
different as to the low-order 32 bits.  The easiest way to do this
seems to be to cast to bit(32).  For example:

SELECT v, hashint4(v)::bit(32) as standard,       hashint4extended(v, 0)::bit(32) as extended0,
hashint4extended(v,1)::bit(32) as extended1
 
FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
WHERE hashint4(v)::bit(32) != hashint4extended(v, 0)::bit(32)  OR hashint4(v)::bit(32) = hashint4extended(v,
1)::bit(32);

I suggest adding a version of this for each data type.

From your latest version of 0001, I get:

rangetypes.c:1297:8: error: unused variable 'rngtypid'     [-Werror,-Wunused-variable]       Oid
rngtypid= RangeTypeGetOid(r);
 

I suggest not duplicating the comments from each regular function into
the extended function, but just writing /* Same approach as hashfloat8
*/ when the implementation is non-trivial (no need for this if the
extended function is a single line or the original function had no
comments anyway).

hash_aclitem seems embarrassingly stupid.  I suggest that we make the
extended version slightly less stupid -- i.e. if the seed is non-zero,
actually call hash_any_extended on the sum and pass the seed through.
         * Reset info about hash function whenever we pick up new info about         * equality operator.  This is so
wecan ensure that the hash function         * matches the operator.         */        typentry->flags &=
~(TCFLAGS_CHECKED_HASH_PROC);
+        typentry->flags &= ~(TCFLAGS_CHECKED_HASH_EXTENDED_PROC);

Adjust comment: "about hash function" -> "about hash functions", "hash
functions matches" -> "hash functions match".

+extern Datum
+hash_any_extended(register const unsigned char *k, register int
+                  keylen, uint64 seed);

Formatting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: "Bossart, Nathan"
Date:
Subject: Re: [HACKERS] [Proposal] Allow users to specify multiple tables inVACUUM commands
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Improving overflow checks when adding tuple to PGresult Re: [GENERAL] Retrieving query results