Thread: sha1, sha2 functions into core?
I would like to see whether there is support for adding sha1 and sha2 functions into the core. These are obviously well-known and widely used functions, but currently the only way to get them is either through pgcrypto or one of the PLs. We could say that's OK, but then we do support md5 in core, which then encourages people to use that, when they really shouldn't use that for new applications. Another weirdness is that md5() doesn't return bytea but instead the result hex-encoded in a string, which makes it weird to use in some cases. One thing that might be reasonable would be to move the digest() functions digest(data text, type text) returns bytea digest(data bytea, type text) returns bytea from pgcrypto into core, so that pgcrypto is mostly restricted to encryption, and can be kept at arm's length for those who need to do that. (Side note: Would the extension mechanism be able to easily cope with a move like that?)
Peter Eisentraut <peter_e@gmx.net> writes: > I would like to see whether there is support for adding sha1 and sha2 > functions into the core. I can't get excited about that, but could put up with it as long as there wasn't scope creep ... > One thing that might be reasonable would be to move the digest() > functions > digest(data text, type text) returns bytea > digest(data bytea, type text) returns bytea > from pgcrypto into core, ... which this approach would create, because digest() isn't restricted to just those algorithms. I think it'd be better to just invent two new functions, which also avoids issues for applications that currently expect the digest functions to be installed in pgcrypto's schema. regards, tom lane
On 08/10/2011 02:06 PM, Peter Eisentraut wrote: > I would like to see whether there is support for adding sha1 and sha2 > functions into the core. These are obviously well-known and widely used > functions, but currently the only way to get them is either through > pgcrypto or one of the PLs. We could say that's OK, but then we do > support md5 in core, which then encourages people to use that, when they > really shouldn't use that for new applications. Another weirdness is > that md5() doesn't return bytea but instead the result hex-encoded in a > string, which makes it weird to use in some cases. > > One thing that might be reasonable would be to move the digest() > functions > > digest(data text, type text) returns bytea > digest(data bytea, type text) returns bytea > > from pgcrypto into core, so that pgcrypto is mostly restricted to > encryption, and can be kept at arm's length for those who need to do > that. > > (Side note: Would the extension mechanism be able to easily cope with a > move like that?) > It's come up before: <http://archives.postgresql.org/pgsql-hackers/2009-09/msg01293.php> +1 for returning bytea though. cheers andrew
On Wed, Aug 10, 2011 at 2:24 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > It's come up before: > <http://archives.postgresql.org/pgsql-hackers/2009-09/msg01293.php> I was about to wonder out loud if we might be trying to hit a moving target.... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Aug 10, 2011 at 7:06 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > I would like to see whether there is support for adding sha1 and sha2 > functions into the core. These are obviously well-known and widely used > functions, but currently the only way to get them is either through > pgcrypto or one of the PLs. We could say that's OK, but then we do > support md5 in core, which then encourages people to use that, when they > really shouldn't use that for new applications. Slightly different, but related - I've seen complaints that we only use md5 for password storage/transmission, which is apparently not acceptable under some government security standards. In the most recent case, they wanted to be able to use sha256 for password storage (transmission isn't really an issue where SSL can be used of course). If we're ready to move more hashing functions into core, then it seems reasonable to add more options for password storage to help those who need to meet mandated standards. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On ons, 2011-08-10 at 19:29 +0100, Dave Page wrote: > On Wed, Aug 10, 2011 at 7:06 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > > I would like to see whether there is support for adding sha1 and sha2 > > functions into the core. These are obviously well-known and widely used > > functions, but currently the only way to get them is either through > > pgcrypto or one of the PLs. We could say that's OK, but then we do > > support md5 in core, which then encourages people to use that, when they > > really shouldn't use that for new applications. > > Slightly different, but related - I've seen complaints that we only > use md5 for password storage/transmission, which is apparently not > acceptable under some government security standards. In the most > recent case, they wanted to be able to use sha256 for password storage > (transmission isn't really an issue where SSL can be used of course). Yeah, that's one of those things. These days, using md5 for anything raises red flags, so it would be better to slowly move some alternatives into place. > If we're ready to move more hashing functions into core, then it seems > reasonable to add more options for password storage to help those who > need to meet mandated standards. Yes, that would be good.
On ons, 2011-08-10 at 14:26 -0400, Robert Haas wrote: > On Wed, Aug 10, 2011 at 2:24 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > > It's come up before: > > <http://archives.postgresql.org/pgsql-hackers/2009-09/msg01293.php> > > I was about to wonder out loud if we might be trying to hit a moving target.... I think we are dealing with a lot more moving targets than adding a new version of SHA every 12 to 15 years.
On 10.08.2011 21:45, Peter Eisentraut wrote: > On ons, 2011-08-10 at 14:26 -0400, Robert Haas wrote: >> On Wed, Aug 10, 2011 at 2:24 PM, Andrew Dunstan<andrew@dunslane.net> wrote: >>> It's come up before: >>> <http://archives.postgresql.org/pgsql-hackers/2009-09/msg01293.php> >> >> I was about to wonder out loud if we might be trying to hit a moving target.... > > I think we are dealing with a lot more moving targets than adding a new > version of SHA every 12 to 15 years. Moving to a something more modern for internal use is one thing. But regarding the user-visible md5() function, how about we jump off this treadmill and remove it altogether? And provide a backwards-compatible function in pgcrypto. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, Aug 10, 2011 at 21:02, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 10.08.2011 21:45, Peter Eisentraut wrote: >> >> On ons, 2011-08-10 at 14:26 -0400, Robert Haas wrote: >>> >>> On Wed, Aug 10, 2011 at 2:24 PM, Andrew Dunstan<andrew@dunslane.net> >>> wrote: >>>> >>>> It's come up before: >>>> <http://archives.postgresql.org/pgsql-hackers/2009-09/msg01293.php> >>> >>> I was about to wonder out loud if we might be trying to hit a moving >>> target.... >> >> I think we are dealing with a lot more moving targets than adding a new >> version of SHA every 12 to 15 years. > > Moving to a something more modern for internal use is one thing. But > regarding the user-visible md5() function, how about we jump off this > treadmill and remove it altogether? And provide a backwards-compatible > function in pgcrypto. -1. There are certainly a number of perfectly valid use-cases for md5, and it would probably break a *lot* of applications to remove it. +1 for adding the SHA functions to core as choices, of course. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On ons, 2011-08-10 at 14:19 -0400, Tom Lane wrote: > > One thing that might be reasonable would be to move the digest() > > functions > > digest(data text, type text) returns bytea > > digest(data bytea, type text) returns bytea > > from pgcrypto into core, > > ... which this approach would create, because digest() isn't > restricted > to just those algorithms. I think it'd be better to just invent two > new functions, which also avoids issues for applications that > currently > expect the digest functions to be installed in pgcrypto's schema. I would also prefer to simply add sha1(bytea/text) => bytea, but the existing md5 function is md5(bytea/text) => test, so either the new functions would be inconsistent, or we make the new functions broken like the old one, or we invent a different naming system, such as digest().
On Thu, Aug 11, 2011 at 09:06, Peter Eisentraut <peter_e@gmx.net> wrote: > On ons, 2011-08-10 at 14:19 -0400, Tom Lane wrote: >> > One thing that might be reasonable would be to move the digest() >> > functions >> > digest(data text, type text) returns bytea >> > digest(data bytea, type text) returns bytea >> > from pgcrypto into core, >> >> ... which this approach would create, because digest() isn't >> restricted >> to just those algorithms. I think it'd be better to just invent two >> new functions, which also avoids issues for applications that >> currently >> expect the digest functions to be installed in pgcrypto's schema. > > I would also prefer to simply add sha1(bytea/text) => bytea, but the > existing md5 function is md5(bytea/text) => test, so either the new > functions would be inconsistent, or we make the new functions broken > like the old one, or we invent a different naming system, such as > digest(). You could always combine them and create digest_sha1(bytea/text) => bytea, etc. That still won't have the "open ended" problem of just digest(). -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Wed, Aug 10, 2011 at 9:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> I would like to see whether there is support for adding sha1 and sha2 >> functions into the core. > > I can't get excited about that, but could put up with it as long as > there wasn't scope creep ... > >> One thing that might be reasonable would be to move the digest() >> functions >> digest(data text, type text) returns bytea >> digest(data bytea, type text) returns bytea >> from pgcrypto into core, > > ... which this approach would create, because digest() isn't restricted > to just those algorithms. I think it'd be better to just invent two > new functions, which also avoids issues for applications that currently > expect the digest functions to be installed in pgcrypto's schema. I would suggest digest() with fixed list of algorithms: md5, sha1, sha2. The uncommon/obsolete algorithms that can be used from digest() if compiled with openssl, are not something we need to worry over. In fact we have never "supported" them, as no testing has been done. Then we could also add hexdigest() which would fix whole bytea/hex confusion without bloating pg_proc. -- marko
Marko Kreen <markokr@gmail.com> writes: > On Wed, Aug 10, 2011 at 9:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> ... which this approach would create, because digest() isn't restricted >> to just those algorithms. �I think it'd be better to just invent two >> new functions, which also avoids issues for applications that currently >> expect the digest functions to be installed in pgcrypto's schema. > I would suggest digest() with fixed list of algorithms: md5, sha1, sha2. > The uncommon/obsolete algorithms that can be used > from digest() if compiled with openssl, are not something we > need to worry over. In fact we have never "supported" them, > as no testing has been done. Hmm ... they may be untested by us, but I feel sure that if we remove that functionality from pgcrypto, *somebody* is gonna complain. I don't see anything much wrong with sha1(bytea/text) -> bytea. There's no law that says it has to work exactly like md5() does. regards, tom lane
On 08/11/2011 10:46 AM, Tom Lane wrote: > Marko Kreen<markokr@gmail.com> writes: >> On Wed, Aug 10, 2011 at 9:19 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> ... which this approach would create, because digest() isn't restricted >>> to just those algorithms. I think it'd be better to just invent two >>> new functions, which also avoids issues for applications that currently >>> expect the digest functions to be installed in pgcrypto's schema. >> I would suggest digest() with fixed list of algorithms: md5, sha1, sha2. >> The uncommon/obsolete algorithms that can be used >> from digest() if compiled with openssl, are not something we >> need to worry over. In fact we have never "supported" them, >> as no testing has been done. > Hmm ... they may be untested by us, but I feel sure that if we remove > that functionality from pgcrypto, *somebody* is gonna complain. Yeah. Maybe we should add a test or two. > I don't see anything much wrong with sha1(bytea/text) -> bytea. > There's no law that says it has to work exactly like md5() does. > > I agree. We could provide an md5_b(text/bytea) -> bytea if people are really concerned about orthogonality. cheers andrew
On Thu, Aug 11, 2011 at 5:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Marko Kreen <markokr@gmail.com> writes: >> On Wed, Aug 10, 2011 at 9:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> ... which this approach would create, because digest() isn't restricted >>> to just those algorithms. I think it'd be better to just invent two >>> new functions, which also avoids issues for applications that currently >>> expect the digest functions to be installed in pgcrypto's schema. > >> I would suggest digest() with fixed list of algorithms: md5, sha1, sha2. > >> The uncommon/obsolete algorithms that can be used >> from digest() if compiled with openssl, are not something we >> need to worry over. In fact we have never "supported" them, >> as no testing has been done. > > Hmm ... they may be untested by us, but I feel sure that if we remove > that functionality from pgcrypto, *somebody* is gonna complain. Well, if you are worried about that, you can duplicate current pgcrypto behaviour - if postgres is compiled against openssl, you get whatever algorithms are available in that particular version of openssl. My point was that giving such open-ended list of algorithms was bad idea, but there is no problem keeping old behaviour. > I don't see anything much wrong with sha1(bytea/text) -> bytea. > There's no law that says it has to work exactly like md5() does. The problem is that list of must-have algorithms is getting quite long: md5, sha1, sha224, sha256, sha384, sha512, + at least 4 from upcoming sha3. Another problem is that generic hashes are bad way for storing passwords - identical passwords will look identical, and its easy to brute-force passwords as the algorithms are very fast. So the question is: is there actual *good* reason add each algorithm separately, without uniform API to core functions? If the user requests are about storing passwords, and we want to make that easier, then we should import crypt() also, as that is the secure way for password storage. Then the md5(), md5_b() plus bunch of sha-s will look silly. -- marko
On Aug 12, 2011, at 5:02 AM, Marko Kreen wrote: > My point was that giving such open-ended list of algorithms > was bad idea, but there is no problem keeping old behaviour. > >> I don't see anything much wrong with sha1(bytea/text) -> bytea. >> There's no law that says it has to work exactly like md5() does. > > The problem is that list of must-have algorithms is getting > quite long: md5, sha1, sha224, sha256, sha384, sha512, > + at least 4 from upcoming sha3. +1 I think some sort of digest() function that takes a parameter naming the algorithm would be the way to go. That's not tosay that the existing named functions could continue to exist -- md5() in core and sha1() in pg_crypto. But it sure seemsto me like we ought to have just one function for digests (or 2, if we also have hexdigest()). Best, David
On Thu, Aug 11, 2011 at 5:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Marko Kreen <markokr@gmail.com> writes: >> On Wed, Aug 10, 2011 at 9:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> ... which this approach would create, because digest() isn't restricted >>> to just those algorithms. I think it'd be better to just invent two >>> new functions, which also avoids issues for applications that currently >>> expect the digest functions to be installed in pgcrypto's schema. > >> I would suggest digest() with fixed list of algorithms: md5, sha1, sha2. > >> The uncommon/obsolete algorithms that can be used >> from digest() if compiled with openssl, are not something we >> need to worry over. In fact we have never "supported" them, >> as no testing has been done. > > Hmm ... they may be untested by us, but I feel sure that if we remove > that functionality from pgcrypto, *somebody* is gonna complain. If you dont want to break digest() but do not want such behaviour in core, we could go with hash(data, algo) that has fixed number of digests, but also couple non-cryptographic hashes like crc32, lookup2/3. This would also fix the problem of people using hashtext() in user code. -- marko
On Fri, Aug 12, 2011 at 10:14:58PM +0300, Marko Kreen wrote: > On Thu, Aug 11, 2011 at 5:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Marko Kreen <markokr@gmail.com> writes: > >> On Wed, Aug 10, 2011 at 9:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> ... which this approach would create, because digest() isn't restricted > >>> to just those algorithms. I think it'd be better to just invent two > >>> new functions, which also avoids issues for applications that currently > >>> expect the digest functions to be installed in pgcrypto's schema. > > > >> I would suggest digest() with fixed list of algorithms: md5, sha1, sha2. > > > >> The uncommon/obsolete algorithms that can be used > >> from digest() if compiled with openssl, are not something we > >> need to worry over. In fact we have never "supported" them, > >> as no testing has been done. > > > > Hmm ... they may be untested by us, but I feel sure that if we remove > > that functionality from pgcrypto, *somebody* is gonna complain. > > If you dont want to break digest() but do not want such behaviour in core, > we could go with hash(data, algo) that has fixed number of digests, > but also couple non-cryptographic hashes like crc32, lookup2/3. > This would also fix the problem of people using hashtext() in user code. Hmm, this thread seems to have petered out without a conclusion. Just wanted to comment that there _are_ non-password storage uses for these digests: I use them in a context of storing large files in a bytea column, as a means to doing data deduplication, and avoiding pushing files from clients to server and back. Ross -- Ross Reedstrom, Ph.D. reedstrm@rice.edu Systems Engineer & Admin, Research Scientist phone: 713-348-6166 Connexions http://cnx.org fax: 713-348-3665 Rice University MS-375, Houston, TX 77005 GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE
On Wed, Aug 31, 2011 at 11:12 AM, Ross J. Reedstrom <reedstrm@rice.edu> wrote: > Hmm, this thread seems to have petered out without a conclusion. Just > wanted to comment that there _are_ non-password storage uses for these > digests: I use them in a context of storing large files in a bytea > column, as a means to doing data deduplication, and avoiding pushing > files from clients to server and back. Yes, agreed: there is no decent content-addressing type in PostgreSQL, so one rolls their own using shas and joins; I've seen this more than once. It's a useful way to get non-bloated index on a series of (larger than sha1) values where one only cares about the equality operator (hash indexes, as unattractive as they were before in PostgreSQL's implementation are even less so now with streaming replication). When that content to be addressed can be submitted from another source, anything with md5 is correctly met with suspicion. We have gone to the trouble of using pgcrypto to get sha1 access, but I know of other applications that would have preferred to use sha but settle for md5 simply because it's known to be bundled in core everywhere. CREATE EXTENSION -- particularly if there is *any* way (is there? even with ugliness like utility statement hooks) to configure it on the provider end to not require superuser for common extensions like 'pgcrypto' -- could ablate this issue and one could get off the hash "treadmill", including md5 -- but I think that would be a mistake. Applications need a high quality digest to enable any kind of principled content addressing use case, and I think making that any harder than a builtin is going to negatively impact the state of things at large. As a compromise, I'd also be happy with making CREATE EXTENSION so trivial that everyone who has that use case can get pgcrypto on any hosting provider. -- fdr
On ons, 2011-08-31 at 13:12 -0500, Ross J. Reedstrom wrote: > Hmm, this thread seems to have petered out without a conclusion. Just > wanted to comment that there _are_ non-password storage uses for these > digests: I use them in a context of storing large files in a bytea > column, as a means to doing data deduplication, and avoiding pushing > files from clients to server and back. But I suppose you don't need the hash function in the database system for that.
On Fri, Sep 02, 2011 at 09:54:07PM +0300, Peter Eisentraut wrote: > On ons, 2011-08-31 at 13:12 -0500, Ross J. Reedstrom wrote: > > Hmm, this thread seems to have petered out without a conclusion. Just > > wanted to comment that there _are_ non-password storage uses for these > > digests: I use them in a context of storing large files in a bytea > > column, as a means to doing data deduplication, and avoiding pushing > > files from clients to server and back. > > But I suppose you don't need the hash function in the database system > for that. > It is very useful to have the same hash function used internally by PostgreSQL exposed externally. I know you can get the code and add an equivalent one of your own... Regards, Ken
On Fri, Sep 02, 2011 at 02:05:45PM -0500, ktm@rice.edu wrote: > On Fri, Sep 02, 2011 at 09:54:07PM +0300, Peter Eisentraut wrote: > > On ons, 2011-08-31 at 13:12 -0500, Ross J. Reedstrom wrote: > > > Hmm, this thread seems to have petered out without a conclusion. Just > > > wanted to comment that there _are_ non-password storage uses for these > > > digests: I use them in a context of storing large files in a bytea > > > column, as a means to doing data deduplication, and avoiding pushing > > > files from clients to server and back. > > > > But I suppose you don't need the hash function in the database system > > for that. > > > > It is very useful to have the same hash function used internally by > PostgreSQL exposed externally. I know you can get the code and add an > equivalent one of your own... > Thanks for the support Ken, but Peter's right: the only backend use in my particular case is to let the backend do the hash calc during bulk loads: in the production code path, having the hash in two places doesn't save any work, since the client code has to calculate the hash in order to test for its existence in the backend. I suppose if the network cost was negligable, I could just push the files anyway, and have a before-insert trigger calculate the hash and do the dedup: then it'd be hidden in the backend completely. But as is, I can do all the work in the client. Ross -- Ross Reedstrom, Ph.D. reedstrm@rice.edu Systems Engineer & Admin, Research Scientist phone: 713-348-6166 Connexions http://cnx.org fax: 713-348-3665 Rice University MS-375, Houston, TX 77005 GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE
On Fri, Sep 02, 2011 at 04:27:46PM -0500, Ross J. Reedstrom wrote: > On Fri, Sep 02, 2011 at 02:05:45PM -0500, ktm@rice.edu wrote: > > On Fri, Sep 02, 2011 at 09:54:07PM +0300, Peter Eisentraut wrote: > > > On ons, 2011-08-31 at 13:12 -0500, Ross J. Reedstrom wrote: > > > > Hmm, this thread seems to have petered out without a conclusion. Just > > > > wanted to comment that there _are_ non-password storage uses for these > > > > digests: I use them in a context of storing large files in a bytea > > > > column, as a means to doing data deduplication, and avoiding pushing > > > > files from clients to server and back. > > > > > > But I suppose you don't need the hash function in the database system > > > for that. > > > > > > > It is very useful to have the same hash function used internally by > > PostgreSQL exposed externally. I know you can get the code and add an > > equivalent one of your own... > > > Thanks for the support Ken, but Peter's right: the only backend use in > my particular case is to let the backend do the hash calc during bulk > loads: in the production code path, having the hash in two places > doesn't save any work, since the client code has to calculate the hash > in order to test for its existence in the backend. I suppose if the > network cost was negligable, I could just push the files anyway, and > have a before-insert trigger calculate the hash and do the dedup: then > it'd be hidden in the backend completely. But as is, I can do all the > work in the client. > While it is true that it doesn't save any work. My motivation for having it exposed is that "good" hash functions are non-trivial to find. I have dealt with computational artifacts produced by hash functions that seemed at first to be good. We use a very well behaved function within the data- base and exposing it will help prevent bad user hash function implementations. Regards, Ken
Is there a TODO here? --------------------------------------------------------------------------- On Wed, Aug 10, 2011 at 09:43:18PM +0300, Peter Eisentraut wrote: > On ons, 2011-08-10 at 19:29 +0100, Dave Page wrote: > > On Wed, Aug 10, 2011 at 7:06 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > > > I would like to see whether there is support for adding sha1 and sha2 > > > functions into the core. These are obviously well-known and widely used > > > functions, but currently the only way to get them is either through > > > pgcrypto or one of the PLs. We could say that's OK, but then we do > > > support md5 in core, which then encourages people to use that, when they > > > really shouldn't use that for new applications. > > > > Slightly different, but related - I've seen complaints that we only > > use md5 for password storage/transmission, which is apparently not > > acceptable under some government security standards. In the most > > recent case, they wanted to be able to use sha256 for password storage > > (transmission isn't really an issue where SSL can be used of course). > > Yeah, that's one of those things. These days, using md5 for anything > raises red flags, so it would be better to slowly move some alternatives > into place. > > > If we're ready to move more hashing functions into core, then it seems > > reasonable to add more options for password storage to help those who > > need to meet mandated standards. > > Yes, that would be good. > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Wed, Aug 15, 2012 at 6:11 AM, Bruce Momjian <bruce@momjian.us> wrote: > Is there a TODO here? There is still open ToDecide here: 1) Status quo - md5() in core, everything else in pgcrypto 2) Start moving other hashes as separate functions into core: eg. sha1() Problems: - sha1 will be obsolete soon, like md5 - many newer hashes: sha2 family, upcoming sha3 family - hex vs. binary api issue - hex-by-default in not good when you actually need cryptographic hash (eg. indexes) - does not solve the password storage problem. 3) Move digest() from pgcrypto into core, with same api. Problems: - does not solve the password storage problem. 4) Move both digest() and crypt() into core, with same api. Password problem - the cryptographic hashes are meant for universal usage, thus they need to be usable even on megabytes of data. This means they are easily bruteforceable, when the amount of data is microscopic (1..16 chars). Also they are unsalted, thus making cracking even easier. crypt() is better api for passwords. So when the main need to have hashes is password storage, then 2) is bad choice. My vote: 4), 1) -- marko
Marko Kreen <markokr@gmail.com> writes: > On Wed, Aug 15, 2012 at 6:11 AM, Bruce Momjian <bruce@momjian.us> wrote: >> Is there a TODO here? > There is still open ToDecide here: [snip] The argument against moving crypto code into core remains the same as it was, ie export regulations. I don't see that that situation has changed at all. Thus, I think we should leave all the pgcrypto code where it is, in an extension that's easily separated out by anybody who's concerned about legal restrictions. The recent improvements in the ease of installing extensions have made it even less interesting than it used to be to merge extension-supported code into core --- if anything, we ought to be trying to move functionality the other way. If anybody's concerned about the security of our password storage, they'd be much better off working on improving the length and randomness of the salt string than replacing the md5 hash per se. regards, tom lane
On 08/15/2012 06:48 AM, Tom Lane wrote: >> On Wed, Aug 15, 2012 at 6:11 AM, Bruce Momjian <bruce@momjian.us> wrote: >>> Is there a TODO here? > > If anybody's concerned about the security of our password storage, > they'd be much better off working on improving the length and randomness > of the salt string than replacing the md5 hash per se. Or change to an md5 HMAC rather than straight md5 with salt. Last I checked (which admittedly was a while ago) there were still no known cryptographic weaknesses associated with an HMAC based on md5. Joe -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support
On 08/15/2012 11:22 AM, Joe Conway wrote: > On 08/15/2012 06:48 AM, Tom Lane wrote: >>> On Wed, Aug 15, 2012 at 6:11 AM, Bruce Momjian <bruce@momjian.us> wrote: >>>> Is there a TODO here? >> If anybody's concerned about the security of our password storage, >> they'd be much better off working on improving the length and randomness >> of the salt string than replacing the md5 hash per se. > Or change to an md5 HMAC rather than straight md5 with salt. Last I > checked (which admittedly was a while ago) there were still no known > cryptographic weaknesses associated with an HMAC based on md5. > Possibly. I still think the right time to revisit this whole area will be when the NIST Hash Function competition ends supposedly later this year. See <http://csrc.nist.gov/groups/ST/hash/timeline.html>. At that time we should probably consider moving our password handling to use the new standard function. cheers andrew
On Wed, Aug 15, 2012 at 11:37:04AM -0400, Andrew Dunstan wrote: > > On 08/15/2012 11:22 AM, Joe Conway wrote: > >On 08/15/2012 06:48 AM, Tom Lane wrote: > >>>On Wed, Aug 15, 2012 at 6:11 AM, Bruce Momjian <bruce@momjian.us> wrote: > >>>>Is there a TODO here? > >>If anybody's concerned about the security of our password storage, > >>they'd be much better off working on improving the length and randomness > >>of the salt string than replacing the md5 hash per se. > >Or change to an md5 HMAC rather than straight md5 with salt. Last I > >checked (which admittedly was a while ago) there were still no known > >cryptographic weaknesses associated with an HMAC based on md5. > > > > > > Possibly. I still think the right time to revisit this whole area > will be when the NIST Hash Function competition ends supposedly > later this year. See > <http://csrc.nist.gov/groups/ST/hash/timeline.html>. At that time we > should probably consider moving our password handling to use the new > standard function. Are we really going to be comforable with a algorithm that is new? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Wed, Aug 15, 2012 at 10:22 AM, Joe Conway <mail@joeconway.com> wrote: > On 08/15/2012 06:48 AM, Tom Lane wrote: >>> On Wed, Aug 15, 2012 at 6:11 AM, Bruce Momjian <bruce@momjian.us> wrote: >>>> Is there a TODO here? >> >> If anybody's concerned about the security of our password storage, >> they'd be much better off working on improving the length and randomness >> of the salt string than replacing the md5 hash per se. > > Or change to an md5 HMAC rather than straight md5 with salt. Last I > checked (which admittedly was a while ago) there were still no known > cryptographic weaknesses associated with an HMAC based on md5. There is no cryptographic with md5 either really. The best known attack IIRC is 2^123 (well outside of any practical brute force search) and the algorithm is very well studied. The main issue with md5 is that it's fast enough that you can search low entropy passwords (rainbow tables etc) which does not depend on the strength of the hashing algorithm. If the hacker has access to the salt, then it will only slow him/her down somewhat because the search will be have to be restarted for each password. The sha family are engineered to be fast and are therefore not meaningfully safer IMO. Ditto NIST hash function (upcoming sha-3) that Andrew is mentioning downthread (that might be a good idea for other reasons but I don't really think it's better in terms of securing user password). If you want to give the user good password security, I think you have only two choices: 1) allow use hmac as you suggest (but this forces user to maintain additional password or some token) 2) force or at least strongly encourage user to choose high entropy password A lot of people argue for 3) use a purposefully slow hashing function like bcrypt. but I disagree: I don't like any scheme that encourages use of low entropy passwords. merlin
On 08/15/2012 11:48 AM, Bruce Momjian wrote: > On Wed, Aug 15, 2012 at 11:37:04AM -0400, Andrew Dunstan wrote: >> On 08/15/2012 11:22 AM, Joe Conway wrote: >>> On 08/15/2012 06:48 AM, Tom Lane wrote: >>>>> On Wed, Aug 15, 2012 at 6:11 AM, Bruce Momjian <bruce@momjian.us> wrote: >>>>>> Is there a TODO here? >>>> If anybody's concerned about the security of our password storage, >>>> they'd be much better off working on improving the length and randomness >>>> of the salt string than replacing the md5 hash per se. >>> Or change to an md5 HMAC rather than straight md5 with salt. Last I >>> checked (which admittedly was a while ago) there were still no known >>> cryptographic weaknesses associated with an HMAC based on md5. >>> >> >> >> Possibly. I still think the right time to revisit this whole area >> will be when the NIST Hash Function competition ends supposedly >> later this year. See >> <http://csrc.nist.gov/groups/ST/hash/timeline.html>. At that time we >> should probably consider moving our password handling to use the new >> standard function. > Are we really going to be comforable with a algorithm that is new? > The only thing that will be new about it will be that it's the new standard. There is a reason these crypto function competitions runs for quite a few years. cheers andrew
On 08/15/2012 08:49 AM, Merlin Moncure wrote: > 1) allow use hmac as you suggest (but this forces user to maintain > additional password or some token) Not really. You would store a token and the HMAC of the token using the user password on the server side. You would need access to the hash function on the client side as well. On authentication the server sends the token to the client, and the client calculates the HMAC using the user provided password. The result is sent back to the server for comparison. This way the user's password is never actually sent over the wire. Now this is still susceptible to a replay attack, but you can fix that by adding another layer. On authentication the server generates a new nonce (random token) and sends it to the client along with the stored token, as well as calculating the HMAC of the nonce using the stored user HMAC as the key. On the client side the the process is repeated -- HMAC(nonce,HMAC(token,password)). This provides a one time calculation preventing replay and does not expose the user's password or token-HMAC over the wire. The final problem as you stated is weak passwords and some kind of dictionary attack against a stolen set of tokens and HMACs. Didn't we add a hook some time ago for user provided password checker? Joe -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support
On Wed, Aug 15, 2012 at 09:18:48AM -0700, Joe Conway wrote: > The final problem as you stated is weak passwords and some kind of > dictionary attack against a stolen set of tokens and HMACs. Didn't we > add a hook some time ago for user provided password checker? Yes, contrib/passwordcheck: http://www.postgresql.org/docs/9.2/static/passwordcheck.html -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Wed, Aug 15, 2012 at 4:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Marko Kreen <markokr@gmail.com> writes: >> On Wed, Aug 15, 2012 at 6:11 AM, Bruce Momjian <bruce@momjian.us> wrote: >>> Is there a TODO here? > >> There is still open ToDecide here: [snip] > > The argument against moving crypto code into core remains the same as it > was, ie export regulations. I don't see that that situation has changed > at all. Thus, I think we should leave all the pgcrypto code where it > is, in an extension that's easily separated out by anybody who's > concerned about legal restrictions. The recent improvements in the ease > of installing extensions have made it even less interesting than it used > to be to merge extension-supported code into core --- if anything, we > ought to be trying to move functionality the other way. Ok. > If anybody's concerned about the security of our password storage, > they'd be much better off working on improving the length and randomness > of the salt string than replacing the md5 hash per se. Sorry, I was not clear enough - by "password storage" I meant password storage for application-specific passwords, not postgres users. Postgres own usage of md5 is kind of fine, as: - only superusers can see the hashes (pg_shadow). - if somebody sees contents of pg_shadow, they don't need to crack them, they can use hashes to log in immediately. But for storage of application passwords, the hash needs to be one-way and hard to crack, to decrease any damage caused by leaks. -- marko
On 8/15/12 6:48 AM, Tom Lane wrote: > The argument against moving crypto code into core remains the same as it > was, ie export regulations. I don't see that that situation has changed > at all. Actually, I believe that it has, based on my experience getting an export certificate for Sun Postgres back in 2008. The US Federal government lifted restrictions on shipping well-known cryptographic algorithms to most countries several years ago, except to specific countries with embargoes (Iran, Burma, etc.). However, *all* exports of software to those embargoed countries are restricted, cryptographic or not. The USA does require an export certificate for any cryptographic-supporting software which is shipped from the USA. For that, however, MD5 and our support for SSL authentication already requires a certificate, whether we include SHA or not. So, my personal non-lawyer experience is that including SHA in core or not would make no difference whatsoever to our export status. The above is all secondhand legal knowledge, so if it really matters to our decisions on what algorithms we include in Core, we should ask SFLC for a real opinion. We certainly shouldn't make one based on assumptions. I think it's more significant, though, that nobody has been able to demonstrate that SHA hashing of passwords actually makes Postgres more secure. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 08/20/2012 03:10 PM, Josh Berkus wrote: > On 8/15/12 6:48 AM, Tom Lane wrote: >> The argument against moving crypto code into core remains the same as it >> was, ie export regulations. I don't see that that situation has changed >> at all. > Actually, I believe that it has, based on my experience getting an > export certificate for Sun Postgres back in 2008. > > The US Federal government lifted restrictions on shipping well-known > cryptographic algorithms to most countries several years ago, except to > specific countries with embargoes (Iran, Burma, etc.). However, *all* > exports of software to those embargoed countries are restricted, > cryptographic or not. > > The USA does require an export certificate for any > cryptographic-supporting software which is shipped from the USA. For > that, however, MD5 and our support for SSL authentication already > requires a certificate, whether we include SHA or not. So, my personal > non-lawyer experience is that including SHA in core or not would make no > difference whatsoever to our export status. > > The above is all secondhand legal knowledge, so if it really matters to > our decisions on what algorithms we include in Core, we should ask SFLC > for a real opinion. We certainly shouldn't make one based on assumptions. > > I think it's more significant, though, that nobody has been able to > demonstrate that SHA hashing of passwords actually makes Postgres more > secure. > I don't think US export regulations are the only issue. Some other countries (mostly the usual suspects) forbid the use of crypto software. If we build more crypto functions into the core we make it harder to use Postgres legally in those places. cheers andrew
> I don't think US export regulations are the only issue. Some other > countries (mostly the usual suspects) forbid the use of crypto software. > If we build more crypto functions into the core we make it harder to use > Postgres legally in those places. Again, that sounds like we need an actual legal opinion if we're going to make a decision on that basis. So let's make the decision on whether we even *want* SHA in core, and if we do, we can ask our attorneys/community if it's a legal problem. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 08/20/2012 01:21 PM, Josh Berkus wrote: > > >> I don't think US export regulations are the only issue. Some other >> countries (mostly the usual suspects) forbid the use of crypto software. >> If we build more crypto functions into the core we make it harder to use >> Postgres legally in those places. I fail to see how that is our problem. We shouldn't make the software less useful because of those places. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC @cmdpromptinc - 509-416-6579
On 08/20/2012 04:26 PM, Joshua D. Drake wrote: > > On 08/20/2012 01:21 PM, Josh Berkus wrote: >> >> >>> I don't think US export regulations are the only issue. Some other >>> countries (mostly the usual suspects) forbid the use of crypto >>> software. >>> If we build more crypto functions into the core we make it harder to >>> use >>> Postgres legally in those places. > > I fail to see how that is our problem. We shouldn't make the software > less useful because of those places. > > But there is absolutely no evidence that we are making it less useful. Postgres is designed top be extensible and we've just enhanced that. pgcrypto makes use of that. If we cen leverage that to make Postgres available to more people then why would we not do so? cheers andrew
On 08/20/2012 01:33 PM, Andrew Dunstan wrote: > But there is absolutely no evidence that we are making it less useful. > Postgres is designed top be extensible and we've just enhanced that. > pgcrypto makes use of that. If we cen leverage that to make Postgres > available to more people then why would we not do so? O.k. that is valid a valid argument. Let me counter. Everybody else does it, why don't we? PostgreSQL is extensible, modular and programmable, why are we limiting those features by not including them in core? Contrib, whether we like it or not, is not core. For some things it makes absolute sense to keep them in contrib or pgxn but cryptography is pretty much a basic core feature set at this point. MySQL, MSSQL, Oracle (not sure if integrated or as a pack) and not to mention Java and Python all have them integrated. Sincerely, Joshua D. Drake -- Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC @cmdpromptinc - 509-416-6579
On 20 August 2012 21:26, Joshua D. Drake <jd@commandprompt.com> wrote: > > On 08/20/2012 01:21 PM, Josh Berkus wrote: >> >> >> >>> I don't think US export regulations are the only issue. Some other >>> countries (mostly the usual suspects) forbid the use of crypto software. >>> If we build more crypto functions into the core we make it harder to use >>> Postgres legally in those places. > > > I fail to see how that is our problem. We shouldn't make the software less > useful because of those places. Agreed. I find the idea of some secret policeman urging the use of MySQL because it doesn't have a built-in SHA-1 cryptographic hash function seems extremely far-fetched. The BitTorrent protocol uses SHA-1 to validate chunks, and it has been variously estimated that 10% - 50% of all internet traffic is BitTorrent traffic. SHA-1 is also integral to the way that git makes content effectively tamper-proof: http://www.youtube.com/watch?v=4XpnKHJAok8#t=56m -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > If the hacker has access to the salt, then it will only slow > him/her down somewhat because the search will be have to be > restarted for each password. This. Further, anyone using MD5 or SHA* or any hash function for any serious storage of passwords is nuts, in this day and age. GPUs and rentable cloud computers means the ability to test billions of passwords per second is easy for anyone, salted or not. The issue is not Postgres' internal use of MD5 for passwords - that's a red herring, as it is basically no more relatively secure/insecure versus any other hashing algorithm that is not designed to be slow (e.g. bcrypt, scrypt, PBKDF2). The issue is simply exposing a more useful day to day algorithm by default. Much of the world uses SHA instead of MD5 these days for all sorts of purposes. So I am torn on this. On the one hand, having a few more things in core would be very nice, as it seems silly we have md5() as a builtin but sha256() requires a special module. But once you add sha* in, why not AES? Blowfish? Why not go the whole way and include some extremely useful ones such as bcrypt? At that point, we've deprecated pg_crypto and moved everything to core. Why I personally would love to see that someday (then we can boast "built-in crypto" :), I recognize that will be a very tough sell. So I will take the addition of whatever we can, including just a sha() as this thread asked for. > 3) use a purposefully slow hashing function like bcrypt. > > but I disagree: I don't like any scheme that encourages use of low > entropy passwords. Perhaps off-topic, but how to do you figure that? - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201208201849 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAlAywBwACgkQvJuQZxSWSsiS4QCbBC7X9MyQgVKC3DTKgjv0aj7D ik0AoNh1YBmhuaMXEKOP7z/GEBUR+EHe =54A2 -----END PGP SIGNATURE-----
"Joshua D. Drake" <jd@commandprompt.com> writes: > On 08/20/2012 01:33 PM, Andrew Dunstan wrote: >> But there is absolutely no evidence that we are making it less useful. >> Postgres is designed top be extensible and we've just enhanced that. >> pgcrypto makes use of that. If we cen leverage that to make Postgres >> available to more people then why would we not do so? > O.k. that is valid a valid argument. Let me counter. > Everybody else does it, why don't we? PostgreSQL is extensible, modular > and programmable, why are we limiting those features by not including > them in core? Contrib, whether we like it or not, is not core. Nonsense. By that argument, all the sweat we've expended on extensibility was wasted effort, and everything should be in core. pg_crypto's functionality is perfectly fine where it is. The fact that there might be some contexts where people actively don't want the functionality in core is just a small extra reason not to be in a hurry to merge it --- but even without that, I'd vote against this on overall project management grounds. We should be looking to push decouplable bits of functionality *out* of core, not bring them back in. The only reason I can see for pushing more crypto into core is if we needed to stop using MD5 for the core password authentication functionality. While that might come to pass eventually, I am aware of no evidence whatever that SHAn, per se, is an improvement over MD5 for password auth purposes. Moreover, as Josh just mentioned, anybody who thinks it might be insufficiently secure for their purposes has got plenty of alternatives available today (SSL certificates, PAM backed by whatever-you-want, etc). TBH, I think if we do anything at all about this in the near future, it'll be window dressing to shut up the people who heard once that MD5 was insecure and know nothing about it beyond that --- but if Postgres uses MD5 for passwords, it must be insecure. So I tend to agree with Andrew that we should wait till the NIST competition dust settles; but what I'll be looking for afterwards is which algorithm has the most street cred with the average slashdotter. Also, as I mentioned upthread, we need to do more than just drop in a new hashing algorithm. MD5 is far from being the weakest link in what we're doing today. regards, tom lane
On 08/20/2012 07:08 PM, Tom Lane wrote: > Moreover, as Josh just mentioned, anybody who > thinks it might be insufficiently secure for their purposes has got > plenty of alternatives available today (SSL certificates, PAM backed > by whatever-you-want, etc). > Yeah, I think we need to emphasize this lots more. Anyone who wants really secure authentication needs to be getting away from password based auth altogether. Another hash function will make very little difference. cheers andrew
On 08/20/2012 05:12 PM, Andrew Dunstan wrote: > > > On 08/20/2012 07:08 PM, Tom Lane wrote: > > >> Moreover, as Josh just mentioned, anybody who >> thinks it might be insufficiently secure for their purposes has got >> plenty of alternatives available today (SSL certificates, PAM backed >> by whatever-you-want, etc). >> > > Yeah, I think we need to emphasize this lots more. Anyone who wants > really secure authentication needs to be getting away from password > based auth altogether. Another hash function will make very little > difference. Actually, I concede here. If we were pushing our other abilities more visibly, I don't know that this argument would ever come up. Sincerely, Joshua D. Drake > > cheers > > andrew > > > > -- Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC @cmdpromptinc - 509-416-6579
On Mon, Aug 20, 2012 at 5:54 PM, Greg Sabino Mullane <greg@turnstep.com> wrote: >> 3) use a purposefully slow hashing function like bcrypt. >> >> but I disagree: I don't like any scheme that encourages use of low >> entropy passwords. > > Perhaps off-topic, but how to do you figure that? Yeah -- bcrypt's main claim to fame is that it's slow. I *lot* of people argue your'e better off with a slow hash and that's reasonable but I just don't like the speed/convenience tradeoff. I suppose I'm impatient. My take on this is that relying on hash speed to protect you if the attacker has the hash, the salt, and knows the algorithm is pretty weak sauce. At best it lowers the entropy requirements somewhat: an 80 bit entropy password is not brute forcible no matter how many server farmed GPUs you have. The mechanics of how the hash is calculated (see Joe C's excellent comments upthread) are much more important considerations than algorithm choice. If you have high security requirements and your users refuse to use high entropy passwords, I think you're better off going 2-factor then hoisting slowness on everything that needs to authenticate. merlin
On Mon, Aug 20, 2012 at 07:08:12PM -0400, Tom Lane wrote: > The only reason I can see for pushing more crypto into core is > if we needed to stop using MD5 for the core password authentication > functionality. While that might come to pass eventually, I am aware of > no evidence whatever that SHAn, per se, is an improvement over MD5 for > password auth purposes. Moreover, as Josh just mentioned, anybody who > thinks it might be insufficiently secure for their purposes has got > plenty of alternatives available today (SSL certificates, PAM backed > by whatever-you-want, etc). > > TBH, I think if we do anything at all about this in the near future, > it'll be window dressing to shut up the people who heard once that MD5 > was insecure and know nothing about it beyond that --- but if Postgres > uses MD5 for passwords, it must be insecure. So I tend to agree with > Andrew that we should wait till the NIST competition dust settles; but > what I'll be looking for afterwards is which algorithm has the most > street cred with the average slashdotter. > > Also, as I mentioned upthread, we need to do more than just drop in > a new hashing algorithm. MD5 is far from being the weakest link > in what we're doing today. If anyone believe Tom is inaccurate in "MD5 is far from being the weakest link", see this 2004 email from Greg Stark explaining the odds of salt reuse and password packet replay: http://archives.postgresql.org/pgsql-hackers/2004-08/msg01540.php -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +