Saner interval hash function - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Saner interval hash function |
Date | |
Msg-id | 23798.1238795165@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Saner interval hash function
Re: Saner interval hash function |
List | pgsql-hackers |
The present implementation of interval_hash() is very carefully designed and coded ... to meet the wrong specification :-(. What it should be doing is producing equal hashcodes for values that interval_eq() considers equal. The error is exhibited in the recent bug report #4748. Attached is a prototype fix for this, which I have verified fixes #4748. The only problem with applying it is that it changes the contents of hash indexes on interval columns. This is no problem for HEAD, seeing that we've already whacked around hash indexes in far more fundamental ways than that since 8.3. However, since interval hashing is broken in this way in every active branch, we really ought to back-patch. I don't think there's a whole lot of choice in the matter. We have to patch this, and put in the next release notes "if you have any hash indexes on interval columns, REINDEX them after updating". Does anyone see it differently, or have some brilliant idea for another solution? regards, tom lane Index: timestamp.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/adt/timestamp.c,v retrieving revision 1.197 diff -c -r1.197 timestamp.c *** timestamp.c 15 Mar 2009 20:31:19 -0000 1.197 --- timestamp.c 3 Apr 2009 21:34:51 -0000 *************** *** 2047,2052 **** --- 2047,2053 ---- TimeOffset span1, span2; + /* If you change this, see also interval_hash() */ span1 = interval1->time; span2 = interval2->time; *************** *** 2131,2158 **** Datum interval_hash(PG_FUNCTION_ARGS) { ! Interval *key = PG_GETARG_INTERVAL_P(0); uint32 thash; - uint32 mhash; /* ! * To avoid any problems with padding bytes in the struct, we figure the ! * field hashes separately and XOR them. This also provides a convenient ! * framework for dealing with the fact that the time field might be either ! * double or int64. */ #ifdef HAVE_INT64_TIMESTAMP thash = DatumGetUInt32(DirectFunctionCall1(hashint8, ! Int64GetDatumFast(key->time))); #else thash = DatumGetUInt32(DirectFunctionCall1(hashfloat8, ! Float8GetDatumFast(key->time))); #endif ! thash ^= DatumGetUInt32(hash_uint32(key->day)); ! /* Shift so "k days" and "k months" don't hash to the same thing */ ! mhash = DatumGetUInt32(hash_uint32(key->month)); ! thash ^= mhash << 24; ! thash ^= mhash >> 8; PG_RETURN_UINT32(thash); } --- 2132,2162 ---- Datum interval_hash(PG_FUNCTION_ARGS) { ! Interval *interval = PG_GETARG_INTERVAL_P(0); ! TimeOffset span; uint32 thash; /* ! * We must produce equal hashvals for values that interval_cmp_internal() ! * considers equal. So, compute the net span the same way it does, ! * and then hash that, using either int64 or float8 hashing. */ + span = interval->time; + #ifdef HAVE_INT64_TIMESTAMP + span += interval->month * INT64CONST(30) * USECS_PER_DAY; + span += interval->day * INT64CONST(24) * USECS_PER_HOUR; + thash = DatumGetUInt32(DirectFunctionCall1(hashint8, ! Int64GetDatumFast(span))); #else + span += interval->month * ((double) DAYS_PER_MONTH * SECS_PER_DAY); + span += interval->day * ((double) HOURS_PER_DAY * SECS_PER_HOUR); + thash = DatumGetUInt32(DirectFunctionCall1(hashfloat8, ! Float8GetDatumFast(span))); #endif ! PG_RETURN_UINT32(thash); }
pgsql-hackers by date: