Home > mailing lists

Re: Aggregate versions of hashing functions (md5, sha1, etc...) - Mailing list pgsql-general

From	Dominique Devienne
Subject	Re: Aggregate versions of hashing functions (md5, sha1, etc...)
Date	July 11 15:11:44
Msg-id	CAFCRh-9dMQC99F22VreuOF9sv7kNjqVzXvaHZQerk0aBHUyhTA@mail.gmail.com Whole thread Raw
In response to	Re: Aggregate versions of hashing functions (md5, sha1, etc...) (Dominique Devienne <ddevienne@gmail.com>)
List	pgsql-general

Tree view

On Fri, Jul 11, 2025 at 11:00 AM Dominique Devienne <ddevienne@gmail.com> wrote:
> The current md5() and pgcrypto.digest() functions roll the x1
> init, xN process, and x1 finish into a single call, processing a
> single bytea (or perhaps more intelligently for TOAST'ed values, the
> 2K "rows" of those in streaming-fashion, hopefully. Can a dev confirm?)

FWIW, I've [asked ChatGPT about that][1], and assuming it's right (md5
and pgcrypto.digest not leveraging the "substring-optimization" on
TOASTED bytea), that's an unfortunate lost opportunity, especially for
byteas reaching close to the 1GB limit. And again (sorry to lay it on
thick...), when required to manually chunk for sizes > 1GB, the lack
of aggregate is a bit crippling, I'm afraid.

So again, can a dev confirm what ChatGPT blurted out?

And if true, any interest in improving that for better TOAST support
for true streaming hashing for current scalar digests?

And of course, the main point of this thread, add (true streaming)
aggregate support in a future version?

Thanks, --DD

[1]: https://chatgpt.com/share/6870fe03-416c-800e-8633-a76e478a794a

pgsql-general by date:

From: Ron Johnson
Date: 11 July, 14:46:45
Subject: Re: Aggregate versions of hashing functions (md5, sha1, etc...)

From: gzh
Date: 11 July, 15:20:38
Subject: Question Regarding COPY Command Handling of Line Breaks in PostgreSQL

Re: Aggregate versions of hashing functions (md5, sha1, etc...) - Mailing list pgsql-general

Previous

Next