dynahash API questions - Mailing list pgsql-hackers

From Neil Conway
Subject dynahash API questions
Date
Msg-id 1164430158.23622.242.camel@localhost.localdomain
Whole thread Raw
Responses Re: dynahash API questions
List pgsql-hackers
I'd like to use the dynahash package to construct a hash table whose
keys are an array of k Datums. The value of "k" for a particular hash
table is known at hash_create() time. This raises two questions:

(1) How should variably-sized keys be represented?

The simplest way would be just a k * sizeof(Datum) block of memory.
Unfortunately, the dynahash package requires the key to be a prefix of
the full hash entry, so a variably-sized key would mean that we wouldn't
know the right offset to use for accesses to fields of a hash entry
struct until runtime. Maybe you could make this ugly by hiding the
offset calculation in macros, but it still seems gross. An alternative
is to just make the hash key a pointer to an array of Datums:
   struct HashEntry { Datum *key; };

which makes the hash key constant-sized, but requires an extra level of
indirection and therefore a bit of legwork when writing the custom
copying, comparison and hashing functions. Anyone see a better way of
doing this?

(2) How should the hashing and comparison functions be implemented?

We can get the right behavior by using the appropriate opclass members.
To call the corresponding fmgr functions, we need access to some state
(FmgrInfo, etc.) within the hashing and comparison functions. However,
the signatures for these function pointers are:
   typedef uint32 (*HashValueFunc) (const void *key, Size keysize);   typedef int (*HashCompareFunc) (const void *key1,
constvoid *key2,                                   Size keysize);
 

i.e. there is no provision for passing additional data to either of
these callbacks. It would be possible to workaround this by including a
pointer to the additional data in the key itself:
   struct HashEntryKey   {       Datum *key_values;       void *data_ptr;   };
   struct HashEntry { HashEntryKey key; };

and then making sure to set the data_ptr correctly before calling
hash_search(). This works, but obviously it's pretty ugly. However, the
simple solution of adding a void pointer argument to these function
pointers would uglify all the existing hashing and comparison functions.
Does anyone have any ideas for a better solution?

-Neil




pgsql-hackers by date:

Previous
From: "Luke Lonergan"
Date:
Subject: Re: [PATCHES] Avg performance for int8/numeric
Next
From: Tom Lane
Date:
Subject: Re: dynahash API questions