Re: MD5 aggregate - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: MD5 aggregate |
Date | |
Msg-id | 20130626214814.GA865548@tornado.leadboat.com Whole thread Raw |
In response to | Re: MD5 aggregate (Dean Rasheed <dean.a.rasheed@gmail.com>) |
Responses |
Re: MD5 aggregate
|
List | pgsql-hackers |
On Wed, Jun 26, 2013 at 09:04:34PM +0100, Dean Rasheed wrote: > On 26 June 2013 19:32, Noah Misch <noah@leadboat.com> wrote: > > On Mon, Jun 17, 2013 at 11:34:52AM +0100, Dean Rasheed wrote: > > md5_agg() is well-defined and not cryptographically novel, and your use case > > is credible. However, not every useful-sometimes function belongs in core; we > > mostly stick to functions with ubiquitous value and functions that would be > > onerous to implement externally. (We do carry legacy stuff that wouldn't make > > the cut today.) In my estimation, md5_agg() does not make that cut. The > > variety of intuitive md5_agg() definitions attested upthread doesn't bode well > > for its broad applicability. It could just as well live in an extension > > published on PGXN. Mine is just one opinion, though; I won't be horrified if > > the community wants an md5_agg() in core. > > I disagree with this though. I see md5_agg() as a natural extension of > the already-in-core md5() functions and underlying code, satisfying a > well-defined use-case, and providing a checksum comparable with > externally generated md5 sums. All true, but I don't see those facts justifying core inclusion. > A quick google search reveals several people asking for something like > this, and people recommending md5(string_agg(...)) or > md5(string_agg(md5(...))) based solutions, which are doomed to failure > on larger tables. So I think that there is a case for having md5_agg() > in core as an alternative to such hacks, while having more > sophisticated crypto functions available as extensions. My nine Google hits for "md5(string_agg" included one Stack Overflow answer, several mirrors of that answer, and a few posts on this thread. > I haven't looked at pgcrypto because, as I said, performance wasn't my > primary criterion either, but removing the unnessary data copy just > seemed like good sense. > > I'll take a look at it, and then as you and Peter suggest, look to > split the patch into two. I think I will withdraw md5_total() because > I was never entirely happy with that anyway. Understood; feel free to back off from any performance improvements in which you didn't wish to involve yourself. If we do end up moving forward with md5_agg(), I note that the pgcrypto version already has an initialize/accumulate/finalize API division. A patch importing that code largely-unchanged would be easier to verify than a patch restructuring the src/backend/libpq/md5.c implementation. The two patches would then be a "use pgcrypto md5.c in core" patch and an md5_agg() patch. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: