Thread: dynahash API questions

dynahash API questions

From
Neil Conway
Date:
I'd like to use the dynahash package to construct a hash table whose
keys are an array of k Datums. The value of "k" for a particular hash
table is known at hash_create() time. This raises two questions:

(1) How should variably-sized keys be represented?

The simplest way would be just a k * sizeof(Datum) block of memory.
Unfortunately, the dynahash package requires the key to be a prefix of
the full hash entry, so a variably-sized key would mean that we wouldn't
know the right offset to use for accesses to fields of a hash entry
struct until runtime. Maybe you could make this ugly by hiding the
offset calculation in macros, but it still seems gross. An alternative
is to just make the hash key a pointer to an array of Datums:
   struct HashEntry { Datum *key; };

which makes the hash key constant-sized, but requires an extra level of
indirection and therefore a bit of legwork when writing the custom
copying, comparison and hashing functions. Anyone see a better way of
doing this?

(2) How should the hashing and comparison functions be implemented?

We can get the right behavior by using the appropriate opclass members.
To call the corresponding fmgr functions, we need access to some state
(FmgrInfo, etc.) within the hashing and comparison functions. However,
the signatures for these function pointers are:
   typedef uint32 (*HashValueFunc) (const void *key, Size keysize);   typedef int (*HashCompareFunc) (const void *key1,
constvoid *key2,                                   Size keysize);
 

i.e. there is no provision for passing additional data to either of
these callbacks. It would be possible to workaround this by including a
pointer to the additional data in the key itself:
   struct HashEntryKey   {       Datum *key_values;       void *data_ptr;   };
   struct HashEntry { HashEntryKey key; };

and then making sure to set the data_ptr correctly before calling
hash_search(). This works, but obviously it's pretty ugly. However, the
simple solution of adding a void pointer argument to these function
pointers would uglify all the existing hashing and comparison functions.
Does anyone have any ideas for a better solution?

-Neil




Re: dynahash API questions

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> ... This works, but obviously it's pretty ugly. However, the
> simple solution of adding a void pointer argument to these function
> pointers would uglify all the existing hashing and comparison functions.
> Does anyone have any ideas for a better solution?

The solution that's been used so far is a static variable known to
the caller and the hash/comparison functions; see for instance
CurTupleHashTable in executor/execGrouping.c.  At some point there
might be enough of these to justify adding a void pointer argument
... not sure that we're there yet though.

(Right offhand it sounds like you might be reinventing execGrouping.c
--- what is your application exactly?)
        regards, tom lane


Re: dynahash API questions

From
Neil Conway
Date:
On Sat, 2006-11-25 at 01:36 -0500, Tom Lane wrote:
> The solution that's been used so far is a static variable known to
> the caller and the hash/comparison functions; see for instance
> CurTupleHashTable in executor/execGrouping.c

Ah, fair enough -- that should work.

> (Right offhand it sounds like you might be reinventing execGrouping.c
> --- what is your application exactly?)

I'm playing around with writing a memoization facility as a UDF:
cache(f, a1, a2, ...) = f(a1, a2, ...), except that cache() keeps an
internal hash of f()'s return values and uses that to avoid calling f()
when possible. Hence the need for a hash table whose keys are (a1,
a2, ...).

-Neil