Re: Hash function for numeric (WIP) - Mailing list pgsql-patches

From Tom Lane
Subject Re: Hash function for numeric (WIP)
Date
Msg-id 14201.1177682545@sss.pgh.pa.us
Whole thread Raw
In response to Re: Hash function for numeric (WIP)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Hash function for numeric (WIP)
List pgsql-patches
I wrote:
> I feel uncomfortable about this proposal because it will compute
> different hashes for values that differ only in having different
> numbers of trailing zeroes.  Now the numeric.c code is supposed to
> suppress extra trailing zeroes on output, but that's never been a
> correctness property ... are we willing to make it one?

> There are various related cases involving unstripped leading zeroes.

> Another point is that sign = NUMERIC_NAN makes it a NAN regardless
> of any other fields; ignoring the sign does not get the right result
> here.

Something else I just remembered is that ndigits = 0 makes it a zero
regardless of the weight.

Perhaps a sufficiently robust way would be to form the hash as the
XOR of each supplied digit, circular-shifted by say 3 times the
digit's weight.  This is insensitive to leading/trailing zeroes:

    if (is NAN)
        return -1;    // or any other fixed value
    hash = 0;
    shift = 3 * weight;
    for (i = 0; i < ndigits; i++)
    {
        thisshift = (shift & 31);
        hash |= ((uint32) digit[i]) << thisshift;
        if (thisshift > 0)
            hash |= ((uint32) digit[i]) >> (32 - thisshift);
        shift -= 3;
    }
    return hash;

That might look pretty ugly, but then again hash_any isn't especially
cheap.

            regards, tom lane

pgsql-patches by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: [BUGS] BUG #3245: PANIC: failed to re-find shared lock object
Next
From: Bruce Momjian
Date:
Subject: Re: New version of GENERATED/IDENTITY, was Re: parser dilemma