Saner interval hash function - Mailing list pgsql-hackers

From Tom Lane
Subject Saner interval hash function
Date
Msg-id 23798.1238795165@sss.pgh.pa.us
Whole thread Raw
Responses Re: Saner interval hash function
Re: Saner interval hash function
List pgsql-hackers
The present implementation of interval_hash() is very carefully designed
and coded ... to meet the wrong specification :-(.  What it should
be doing is producing equal hashcodes for values that interval_eq()
considers equal.  The error is exhibited in the recent bug report #4748.

Attached is a prototype fix for this, which I have verified fixes #4748.
The only problem with applying it is that it changes the contents of
hash indexes on interval columns.  This is no problem for HEAD, seeing
that we've already whacked around hash indexes in far more fundamental
ways than that since 8.3.  However, since interval hashing is broken
in this way in every active branch, we really ought to back-patch.

I don't think there's a whole lot of choice in the matter.  We have to
patch this, and put in the next release notes "if you have any hash
indexes on interval columns, REINDEX them after updating".  Does anyone
see it differently, or have some brilliant idea for another solution?

            regards, tom lane

Index: timestamp.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/timestamp.c,v
retrieving revision 1.197
diff -c -r1.197 timestamp.c
*** timestamp.c    15 Mar 2009 20:31:19 -0000    1.197
--- timestamp.c    3 Apr 2009 21:34:51 -0000
***************
*** 2047,2052 ****
--- 2047,2053 ----
      TimeOffset    span1,
                  span2;

+     /* If you change this, see also interval_hash() */
      span1 = interval1->time;
      span2 = interval2->time;

***************
*** 2131,2158 ****
  Datum
  interval_hash(PG_FUNCTION_ARGS)
  {
!     Interval   *key = PG_GETARG_INTERVAL_P(0);
      uint32        thash;
-     uint32        mhash;

      /*
!      * To avoid any problems with padding bytes in the struct, we figure the
!      * field hashes separately and XOR them.  This also provides a convenient
!      * framework for dealing with the fact that the time field might be either
!      * double or int64.
       */
  #ifdef HAVE_INT64_TIMESTAMP
      thash = DatumGetUInt32(DirectFunctionCall1(hashint8,
!                                                Int64GetDatumFast(key->time)));
  #else
      thash = DatumGetUInt32(DirectFunctionCall1(hashfloat8,
!                                              Float8GetDatumFast(key->time)));
  #endif
!     thash ^= DatumGetUInt32(hash_uint32(key->day));
!     /* Shift so "k days" and "k months" don't hash to the same thing */
!     mhash = DatumGetUInt32(hash_uint32(key->month));
!     thash ^= mhash << 24;
!     thash ^= mhash >> 8;
      PG_RETURN_UINT32(thash);
  }

--- 2132,2162 ----
  Datum
  interval_hash(PG_FUNCTION_ARGS)
  {
!     Interval   *interval = PG_GETARG_INTERVAL_P(0);
!     TimeOffset    span;
      uint32        thash;

      /*
!      * We must produce equal hashvals for values that interval_cmp_internal()
!      * considers equal.  So, compute the net span the same way it does,
!      * and then hash that, using either int64 or float8 hashing.
       */
+     span = interval->time;
+
  #ifdef HAVE_INT64_TIMESTAMP
+     span += interval->month * INT64CONST(30) * USECS_PER_DAY;
+     span += interval->day * INT64CONST(24) * USECS_PER_HOUR;
+
      thash = DatumGetUInt32(DirectFunctionCall1(hashint8,
!                                                Int64GetDatumFast(span)));
  #else
+     span += interval->month * ((double) DAYS_PER_MONTH * SECS_PER_DAY);
+     span += interval->day * ((double) HOURS_PER_DAY * SECS_PER_HOUR);
+
      thash = DatumGetUInt32(DirectFunctionCall1(hashfloat8,
!                                                Float8GetDatumFast(span)));
  #endif
!
      PG_RETURN_UINT32(thash);
  }


pgsql-hackers by date:

Previous
From: Grzegorz Jaskiewicz
Date:
Subject: Re: a few crazy ideas about hash joins
Next
From: Tom Lane
Date:
Subject: Re: GetCurrentVirtualXIDs()