Re: Fast, stable, portable hash function producing 4-byte or 8-byte values? - Mailing list pgsql-general

From Erwin Brandstetter
Subject Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?
Date
Msg-id CAGHENJ6GibwodRSmgsQ2tLnXs9HcNeBJ2e2NYtwCB4Pbfqm-ow@mail.gmail.com
Whole thread Raw
In response to Re: Fast, stable, portable hash function producing 4-byte or 8-bytevalues?  (Laurenz Albe <laurenz.albe@cybertec.at>)
Responses Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?
Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?
List pgsql-general
Thanks for the suggestion. Seems like a good assumption and I have been using hashtext() in the past. But I am uncertain whether it is the best option.

Guess Tom's warning in https://www.postgresql.org/message-id/9434.1568839177@sss.pgh.pa.us about portability only refers to hashtextextended() and friends not being there in Postgres 10 or older.

But why are none of these functions documented? Does the project still not ...

> want people to rely on them continuing to have exactly the current behavior.

I am not complaining, maybe just nobody did the work. But it's also mentioned in this old thread, that hastext() changed in the past. Is all of that outdated and we are welcome to use those functions for indexing?

Filtering with amprocnum = 2 gets functions producing bigint in Postgres 11 or later.  Not sure about the exact meaning of amprocnum, manual says "Support function number".

Remaining problem either way: no hash function returning bigint for Postgres 10.

Regards
Erwin

On Tue, Dec 10, 2019 at 11:13 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Tue, 2019-12-10 at 22:11 +0100, Erwin Brandstetter wrote:
> I am looking for stable hash functions producing 8-byte or 4-byte hashes from long text values in Postgres 10 or later.
>
> [...]
>
> There is an old post from 2012 by Tom Lane suggesting that hashtext() and friends are not for users:
>
> https://www.postgresql.org/message-id/24463.1329854466%40sss.pgh.pa.us

Changing a hash function would corrupt hash indexes, wouldn't it?

So I'd expect these functions to be pretty stable:

SELECT amp.amproc
FROM pg_amproc AS amp
   JOIN pg_opfamily AS opf ON amp.amprocfamily = opf.oid
   JOIN pg_am ON opf.opfmethod = pg_am.oid
WHERE pg_am.amname = 'hash'
  AND amp.amprocnum = 1;

Or at least there would have to be a fat warning in the release notes
to reindex hash indexes.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: PGUSER and initdb
Next
From: Erwin Brandstetter
Date:
Subject: Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?