Re: NAMEDATALEN Changes - Mailing list pgsql-hackers

From Greg Copeland
Subject Re: NAMEDATALEN Changes
Date
Msg-id CCC.200202141401.g1EE1Dk24399@CopelandConsulting.Net
Whole thread Raw
In response to Re: NAMEDATALEN Changes  (Neil Conway <nconway@klamath.dyndns.org>)
Responses Re: NAMEDATALEN Changes  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wednesday 13 February 2002 23:59, Neil Conway wrote:
> On Wed, 2002-02-13 at 20:00, Tom Lane wrote:

[perf hit comment removed]

>
> I've attached a pretty trivial patch that implements this. Instead of
> automatically hashing NAMEDATALEN bytes, hashname() uses only strlen()
> bytes: this should improve both the common case (small identifers, 5-10
> characters long), as well as reduce the penalty when NAMEDATALEN is
> increased. The patch passes the regression tests, FWIW. I didn't remove
> cc_hashname() -- I'll tackle that tomorrow unless anyone objects...
>
> I also did some pretty simple benchmarks; however, I'd appreciate it
> anyone could confirm these results.
>

Please bare with me on this as this is my first posting having any real 
content.  Please don't hang me out if I've overlooked anything and I'm 
pointing out that I'm making a rather large assumption.  Please correct as 
needed. 

The primary assumption is that the actual key lengths can be less than 
NAMEDATALEN.  That is, if the string, "shortkey" is a valid input key (??) 
which provides a key length of 8-bytes as input to the hash_any() function 
even though NAMEDATALEN may be something like 128 or larger.  If this 
assumption is correct, then wouldn't increasing the default input key size 
(NAMEDATALEN) beyond the maximum actual key length be a bug?  That is to say, 
if we have a key with only 8-bytes of data and we iterrate over 128-bytes, 
wouldn't the resulting hash be arbitrary and invalid as it would be hashing 
memory which is not reflective of the key being hashed?

If my assumptions are correct, then it sounds like using the strlen() 
implementation (assuming input keys are valid C-strings) is really the proper 
implementation short of using an adjusted min(NAMEDATALEN,strlen()) type 
approach.

[snip - var NAMEDATALEN benchmark results]


Greg
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE8a8Mg4lr1bpbcL6kRAlaxAJ47CO+ExL/ZMo/i6LDoetXrul9qqQCfQli3
AvqN6RJjSuAH/p/mpZ8J4JY=
=wnVM
-----END PGP SIGNATURE-----


pgsql-hackers by date:

Previous
From: Jean-Michel POURE
Date:
Subject: Re: alter table drop column status
Next
From: Tom Lane
Date:
Subject: Re: NAMEDATALEN Changes