Home > mailing lists

Re: Hash function for numeric (WIP) - Mailing list pgsql-patches

From	Neil Conway
Subject	Re: Hash function for numeric (WIP)
Date	May 4, 2007 00:04:18
Msg-id	1178247839.18303.53.camel@goldbach Whole thread Raw
In response to	Re: Hash function for numeric (WIP) (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Hash function for numeric (WIP)
List	pgsql-patches

Tree view

On Mon, 2007-30-04 at 00:04 -0400, Tom Lane wrote:
> I'm still not very comfortable with that.  You're proposing to add a
> pretty obvious failure mechanism --- any numeric-returning function
> that failed to "normalize" its output would now create a subtle,
> hard-to-find bug.

What about teaching hash_numeric() to explicitly ignore leading and
trailing zero digits?

> Perhaps a suitable test would be to compare the number of
> hash collisions in a large set of randomly-chosen-but-distinct
> numeric values.

Okay, I did a little testing. I created a table with ~2 million
numerics, generated with equal probability by one of the two
expressions:

    random()::numeric * ((random() * 100) ^ 20) * 100
    random()::numeric * -100;

There were 251 duplicate inputs that I didn't bother eliminating; of
course, they should have effected both hash functions identically.
Results:

neilc=# create temp table r1 as select hash_numeric(a), count(*) from x
group by hash_numeric(a);
neilc=# create temp table r2 as select old_hash_numeric(a), count(*)
from x group by old_hash_numeric(a);

neilc=# select count(*) from r1 where count > 1;
 count
--------
 123342
(1 row)

neilc=# select count(*) from r2 where count > 1;
 count
-------
   756
(1 row)

old_hash_numeric() is the hash_any()-based hash, hash_numeric() is my
coding of your suggested hash function (see attached patch).

So it seems we need a better hash if we're not going to use hash_any().
The analysis required to write a robust hash function from scratch is
precisely why I'd prefer to use hash_any() if possible.

-Neil

Attachment

numeric_hash_func-4.patch

pgsql-patches by date:

From: Tom Lane
Date: 03 May 2007, 23:45:27
Subject: Re: document plperl argument and return value representation

From: Tom Lane
Date: 04 May 2007, 00:57:29
Subject: Re: Hash function for numeric (WIP)

Re: Hash function for numeric (WIP) - Mailing list pgsql-patches

Attachment

Previous

Next