Re: Aggregate versions of hashing functions (md5, sha1, etc...) - Mailing list pgsql-general

From Dominique Devienne
Subject Re: Aggregate versions of hashing functions (md5, sha1, etc...)
Date
Msg-id CAFCRh-9dMQC99F22VreuOF9sv7kNjqVzXvaHZQerk0aBHUyhTA@mail.gmail.com
Whole thread Raw
In response to Re: Aggregate versions of hashing functions (md5, sha1, etc...)  (Dominique Devienne <ddevienne@gmail.com>)
List pgsql-general
On Fri, Jul 11, 2025 at 11:00 AM Dominique Devienne <ddevienne@gmail.com> wrote:
> The current md5() and pgcrypto.digest() functions roll the x1
> init, xN process, and x1 finish into a single call, processing a
> single bytea (or perhaps more intelligently for TOAST'ed values, the
> 2K "rows" of those in streaming-fashion, hopefully. Can a dev confirm?)

FWIW, I've [asked ChatGPT about that][1], and assuming it's right (md5
and pgcrypto.digest not leveraging the "substring-optimization" on
TOASTED bytea), that's an unfortunate lost opportunity, especially for
byteas reaching close to the 1GB limit. And again (sorry to lay it on
thick...), when required to manually chunk for sizes > 1GB, the lack
of aggregate is a bit crippling, I'm afraid.

So again, can a dev confirm what ChatGPT blurted out?

And if true, any interest in improving that for better TOAST support
for true streaming hashing for current scalar digests?

And of course, the main point of this thread, add (true streaming)
aggregate support in a future version?

Thanks, --DD

[1]: https://chatgpt.com/share/6870fe03-416c-800e-8633-a76e478a794a



pgsql-general by date:

Previous
From: Ron Johnson
Date:
Subject: Re: Aggregate versions of hashing functions (md5, sha1, etc...)
Next
From: gzh
Date:
Subject: Question Regarding COPY Command Handling of Line Breaks in PostgreSQL