Re: MD5 aggregate - Mailing list pgsql-hackers

From Noah Misch
Subject Re: MD5 aggregate
Date
Msg-id 20130626214814.GA865548@tornado.leadboat.com
Whole thread Raw
In response to Re: MD5 aggregate  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses Re: MD5 aggregate
List pgsql-hackers
On Wed, Jun 26, 2013 at 09:04:34PM +0100, Dean Rasheed wrote:
> On 26 June 2013 19:32, Noah Misch <noah@leadboat.com> wrote:
> > On Mon, Jun 17, 2013 at 11:34:52AM +0100, Dean Rasheed wrote:

> > md5_agg() is well-defined and not cryptographically novel, and your use case
> > is credible.  However, not every useful-sometimes function belongs in core; we
> > mostly stick to functions with ubiquitous value and functions that would be
> > onerous to implement externally.  (We do carry legacy stuff that wouldn't make
> > the cut today.)  In my estimation, md5_agg() does not make that cut.  The
> > variety of intuitive md5_agg() definitions attested upthread doesn't bode well
> > for its broad applicability.  It could just as well live in an extension
> > published on PGXN.  Mine is just one opinion, though; I won't be horrified if
> > the community wants an md5_agg() in core.
> 
> I disagree with this though. I see md5_agg() as a natural extension of
> the already-in-core md5() functions and underlying code, satisfying a
> well-defined use-case, and providing a checksum comparable with
> externally generated md5 sums.

All true, but I don't see those facts justifying core inclusion.

> A quick google search reveals several people asking for something like
> this, and people recommending md5(string_agg(...)) or
> md5(string_agg(md5(...))) based solutions, which are doomed to failure
> on larger tables. So I think that there is a case for having md5_agg()
> in core as an alternative to such hacks, while having more
> sophisticated crypto functions available as extensions.

My nine Google hits for "md5(string_agg" included one Stack Overflow answer,
several mirrors of that answer, and a few posts on this thread.

> I haven't looked at pgcrypto because, as I said, performance wasn't my
> primary criterion either, but removing the unnessary data copy just
> seemed like good sense.
> 
> I'll take a look at it, and then as you and Peter suggest, look to
> split the patch into two. I think I will withdraw md5_total() because
> I was never entirely happy with that anyway.

Understood; feel free to back off from any performance improvements in which
you didn't wish to involve yourself.

If we do end up moving forward with md5_agg(), I note that the pgcrypto
version already has an initialize/accumulate/finalize API division.  A patch
importing that code largely-unchanged would be easier to verify than a patch
restructuring the src/backend/libpq/md5.c implementation.  The two patches
would then be a "use pgcrypto md5.c in core" patch and an md5_agg() patch.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Selena Deckelmann
Date:
Subject: Re: Kudos for Reviewers -- straw poll
Next
From: Jeff Janes
Date:
Subject: Re: Spin Lock sleep resolution