Thread: New types for transparent encryption
Our manual says we can use pgcrypto functions or encrypted filesystems for data encryption. http://www.postgresql.org/docs/8.4/static/encryption-options.html However, they are not always the best approaches in some cases. For pgcrypto functions, user's SQL must contain keyword strings and they need to consider which column is encrypted. Users complaint that that they want to treat encrypted values as if not-encrypted. For encrypted filesystems, all of database will be encrypted and thare are considerable overheads. In addition, encrypted filesystems are not well-maintained on some platforms. I'd like to submit a proposal to add types that encryped or decrypted transparently to contrib/pgcrypto. It is a simple wrapper type of bytea. The pseudo code could be: CREATE TYPE encrypted_text ( INPUT = pgp_sym_encrypt_text(textin($1), passward(), options()), OUTPUT = textout(pgp_sym_decrypt_text($1,passward(), options())), LIKE bytea ); passward() and options() are SQL functions and we can re-define them if needed. The default implementations are to refer custom GUC variables (pgcrypto.password and pgcrypto.options) so that encryption are done only in database server and applications don't have to know the details. I hope this will be an enhancement of contrib/pgcrypto. Comments welcome. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Itagaki Takahiro wrote: > Our manual says we can use pgcrypto functions or encrypted filesystems > for data encryption. > http://www.postgresql.org/docs/8.4/static/encryption-options.html > > However, they are not always the best approaches in some cases. > > For pgcrypto functions, user's SQL must contain keyword strings > and they need to consider which column is encrypted. Users complaint > that that they want to treat encrypted values as if not-encrypted. > > For encrypted filesystems, all of database will be encrypted > and thare are considerable overheads. In addition, encrypted > filesystems are not well-maintained on some platforms. > > > I'd like to submit a proposal to add types that encryped or > decrypted transparently to contrib/pgcrypto. It is a simple > wrapper type of bytea. The pseudo code could be: > > CREATE TYPE encrypted_text ( > INPUT = pgp_sym_encrypt_text(textin($1), passward(), options()), > OUTPUT = textout(pgp_sym_decrypt_text($1, passward(), options())), > LIKE bytea > ); > > passward() and options() are SQL functions and we can re-define them > if needed. The default implementations are to refer custom GUC variables > (pgcrypto.password and pgcrypto.options) so that encryption are done > only in database server and applications don't have to know the details. What kind of attacks would this protect against? Seems a bit pointless to me if the password is being sent to the server anyway. If the attacker has superuser access to the server, he can harvest the passwords as the clients send them in. If he doesn't, the usual access controls with GRANT/REVOKE would be enough. I would see some value in this if the encryption was done in the client, and the server never saw any decrypted values. That would reduce the damage of a compromised server. A driver extension to do that transparently would be neat. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, Jul 7, 2009 at 10:09 AM, Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> wrote: > > What kind of attacks would this protect against? Seems a bit pointless > to me if the password is being sent to the server anyway. If the > attacker has superuser access to the server, he can harvest the > passwords as the clients send them in. If he doesn't, the usual access > controls with GRANT/REVOKE would be enough. It would still protect against offline attacks such as against backup files. -- greg http://mit.edu/~gsstark/resume.pdf
Greg Stark wrote: > On Tue, Jul 7, 2009 at 10:09 AM, Heikki > Linnakangas<heikki.linnakangas@enterprisedb.com> wrote: >> What kind of attacks would this protect against? Seems a bit pointless >> to me if the password is being sent to the server anyway. If the >> attacker has superuser access to the server, he can harvest the >> passwords as the clients send them in. If he doesn't, the usual access >> controls with GRANT/REVOKE would be enough. > > It would still protect against offline attacks such as against backup files. True, but filesystem-level encryption handles that scenario with less pain. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, Jul 07, 2009 at 05:35:28PM +0900, Itagaki Takahiro wrote: > Our manual says we can use pgcrypto functions or encrypted filesystems > for data encryption. > http://www.postgresql.org/docs/8.4/static/encryption-options.html As other posters have put it, I'd be very sceptical of server-side decryption. If the server "has" all the necessary bits to decrypt the data, all bets are off. [encryption might be OK, with an asymmetrical scheme in the vein of public key cryptography]. A client-side decryption (and maybe encryption as well) seems way more attractive. For that, libpqtypes[1],[2] might come in very handy. [1] <http://pgfoundry.org/projects/libpqtypes/> [2] <http://libpqtypes.esilo.com/> Regards - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFKUyC2Bcgs9XrR2kYRAiJoAJ9426t1bMtZ90690cwU9X+F4GJZkgCfZsJ2 YIon8ulaHI64l5GKbDwV4hM= =I9fS -----END PGP SIGNATURE-----
In response to Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp>: > Our manual says we can use pgcrypto functions or encrypted filesystems > for data encryption. > http://www.postgresql.org/docs/8.4/static/encryption-options.html > > However, they are not always the best approaches in some cases. > > For pgcrypto functions, user's SQL must contain keyword strings > and they need to consider which column is encrypted. Users complaint > that that they want to treat encrypted values as if not-encrypted. > > For encrypted filesystems, all of database will be encrypted > and thare are considerable overheads. In addition, encrypted > filesystems are not well-maintained on some platforms. > > > I'd like to submit a proposal to add types that encryped or > decrypted transparently to contrib/pgcrypto. It is a simple > wrapper type of bytea. The pseudo code could be: > > CREATE TYPE encrypted_text ( > INPUT = pgp_sym_encrypt_text(textin($1), passward(), options()), > OUTPUT = textout(pgp_sym_decrypt_text($1, passward(), options())), > LIKE bytea > ); > > passward() and options() are SQL functions and we can re-define them > if needed. The default implementations are to refer custom GUC variables > (pgcrypto.password and pgcrypto.options) so that encryption are done > only in database server and applications don't have to know the details. > > > I hope this will be an enhancement of contrib/pgcrypto. > Comments welcome. As others have noted, I doubt that the overall protection ability would be very great with such a feature. What I'd _really_ like to see is native PKI encryption implemented, so roles could have keys associated with them, and roles could encrypt data for other roles and sign data, thus allowing some really powerful data protection schemes. Just coming up with a methodology for this is non-trivial, though. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Itagaki Takahiro wrote: >> CREATE TYPE encrypted_text ( >> INPUT = pgp_sym_encrypt_text(textin($1), passward(), options()), >> OUTPUT = textout(pgp_sym_decrypt_text($1, passward(), options())), >> LIKE bytea >> ); >> >> passward() and options() are SQL functions and we can re-define them >> if needed. The default implementations are to refer custom GUC variables >> (pgcrypto.password and pgcrypto.options) so that encryption are done >> only in database server and applications don't have to know the details. > What kind of attacks would this protect against? I agree that this seems more like offering security theater than real security. I'm also pretty concerned about the implications of a datatype whose I/O operations fundamentally don't work without knowledge of values that are supposed to be kept secret. What is your expectation for how pg_dump will handle such columns? regards, tom lane
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Greg Stark wrote: > > It would still protect against offline attacks such as against backup files. > > True, but filesystem-level encryption handles that scenario with less pain. Yes, I intended offline attacks, and also agree that ilesystem-level encryption will be a solution. However, as I wrote in the first mail, standard users want to avoid encrypted filesystems that are not maintained or supported officially. I just suggested postgres to have an alternative solution of filesystem-level encryption or infrastructure providing easy way to define new types that have only difference in typin/typout attributes, such as CREATE TYPE INHERITS or CREATE DOMAIN WITH INPUT/OUTPUT. It is too difficult for standard users to define operators and index support methods. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
On Wed, Jul 8, 2009 at 1:49 AM, Itagaki Takahiro<itagaki.takahiro@oss.ntt.co.jp> wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > >> Greg Stark wrote: >> > It would still protect against offline attacks such as against backup files. >> >> True, but filesystem-level encryption handles that scenario with less pain. > > Yes, I intended offline attacks, and also agree that ilesystem-level > encryption will be a solution. However, as I wrote in the first mail, > standard users want to avoid encrypted filesystems that are not maintained > or supported officially. I don't see how filesystem encryption helps actually. Your backups are probably filesystem level backups so they will have the decrypted files. Also your archived logs will have the decrypted data, etc. Encrypting the data before it's ever written to disk means you don't have to worry about all the different places your data ends up. Actually pg_dump seems like it would be solvable if you had an escape syntax to indicate that a literal contains the encrypted value in hex. Perhaps something like the bytea syntax we're looking at adopting now. So '\xdeadbeaf'::encrypted_type would be a perfectly valid literal which the user could load even while not knowing what it represents, and that would be what you would get if you access the field, for example with pg_dump, with the guc unset -- just beware you don't run pg_dump set with the guc set to the *wrong* key :) However I have a different concern which hasn't been raised yet. Encrypting lots of small chunks of data with the same key is a very dangerous thing to do and it's very tricky to get right. Merely applying one of the standard stream or block ciphers directly to those short strings will *not* be secure. I think there are techniques for dealing with this but I'm not sure what tradeoffs they have. -- greg http://mit.edu/~gsstark/resume.pdf
> Encrypting lots of small chunks of data with the same key is a very > dangerous thing to do and it's very tricky to get right. Using an initialization vector (IV) is the way to go, recommend using CBC or CFB mode. Although, an IV is never supposed to be used more than once with the same key; that can leak hints about the plaintext. Where is the randomly generated IV stored for use during decryption? -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
Andrew Chernow wrote: > >> Encrypting lots of small chunks of data with the same key is a very >> dangerous thing to do and it's very tricky to get right. > > Using an initialization vector (IV) is the way to go, recommend using > CBC or CFB mode. Although, an IV is never supposed to be used more > than once with the same key; that can leak hints about the plaintext. > Where is the randomly generated IV stored for use during decryption? Well, you can store it along with the encrypted data. The IV doesn't need to be secret, just random. I do that for one of my clients. cheers andrew
Andrew Dunstan wrote: > > > Andrew Chernow wrote: >> >>> Encrypting lots of small chunks of data with the same key is a very >>> dangerous thing to do and it's very tricky to get right. >> >> Using an initialization vector (IV) is the way to go, recommend using >> CBC or CFB mode. Although, an IV is never supposed to be used more >> than once with the same key; that can leak hints about the plaintext. >> Where is the randomly generated IV stored for use during decryption? > > Well, you can store it along with the encrypted data. The IV doesn't > need to be secret, just random. I do that for one of my clients. > That's correct. Duh!?! Probably the first N bytes of the cipher text. Would the IV be regenerated every time the plaintext is updated, to avoid using it twice? For instace: update t set text = 'abc' where id = 1 . ISTM that the IV for OLD.text should be thrown away. Where would the key come from? Where would it be stored? What cipher is used? -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/
tomas@tuxteam.de wrote: > As other posters have put it, I'd be very sceptical of server-side > decryption. If the server "has" all the necessary bits to decrypt the > data, all bets are off. Server can access both encrypted data and its password, but we can put them in different disk drives. We cannot decrypt the data unless we have all copies of the drives. If postgres server is started manually, there might be another design that DBA sets password as a postmaster's startup parameter. If do so, the password is only in memory but not in disk drives. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
On Wed, Jul 8, 2009 at 6:43 AM, Itagaki Takahiro<itagaki.takahiro@oss.ntt.co.jp> wrote: > > tomas@tuxteam.de wrote: > >> As other posters have put it, I'd be very sceptical of server-side >> decryption. If the server "has" all the necessary bits to decrypt the >> data, all bets are off. > > Server can access both encrypted data and its password, but we can put > them in different disk drives. We cannot decrypt the data unless we have > all copies of the drives. I thought the proposal was to have a GUC variable with the password, so it would be purely in memory. I had assumed the application would have the password and call SET early in the connection. Perhaps only certain components of the database would actually have access to the password. That makes a lot more sense in these ajaxy soapy days of web 2.0 where perhaps only the payment processing soap service would actually have the password to encrypt credit card details. (Though in that case I would expect the soap service to do the encryption itself....) -- greg http://mit.edu/~gsstark/resume.pdf
gsstark@mit.edu (Greg Stark) writes: > However I have a different concern which hasn't been raised yet. > Encrypting lots of small chunks of data with the same key is a very > dangerous thing to do and it's very tricky to get right. Yeah, that's exactly the sort of thing that would be Most Useful for someone trying to do differential cryptanalysis. http://en.wikipedia.org/wiki/Differential_cryptanalysis It would provide an *exact* vector for differential attack if the attacker has the ability to add in a series of bits of data of their choosing before capturing the thus-encrypted dump. If you add some more-or-less-randomish salt, ala SSHA, that could be of some tiny help, maybe, arguably, but I doubt that's usable :-(. http://www.openldap.org/faq/data/cache/347.html -- let name="cbbrowne" and tld="acm.org" in String.concat "@" [name;tld];; http://linuxfinances.info/info/sap.html Why do scientists call it research when looking for something new?
ac@esilo.com (Andrew Chernow) writes: > Would the IV be regenerated every time the plaintext is updated, to > avoid using it twice? For instace: update t set text = 'abc' where id > = 1 . ISTM that the IV for OLD.text should be thrown away. > > Where would the key come from? Where would it be stored? What cipher is used? LDAP authentication systems tend to use SSHA these days... http://www.openldap.org/faq/data/cache/347.html With SSHA, the key used for hashing passwords is picked randomly; often by grabbing a few bytes from /dev/random. It's not important that it be cryptographically secure, as it is presented directly as part of the stored password. In python, SSH hashes thus: You need two inputs: 1. "password", which is the value that is to be hidden 2. "salt", a seed value. The point isn't for "salt" to need to be super-secure, just for it to not be frequently repeated. "Fairly random" seems to be generally good enough. import sha from base64 import b64encode ctx = sha.new( password ) ctx.update( salt ) hash = "{SSHA}" + b64encode(ctx.digest() + salt ) Sort-of-aside: FYI, I tried implementing SSHA in pl/pgsql, with mixed results. It interoperated fine with other SSHA implementations as long as the salt values were plain text. The SSHA implementation in OpenLDAP (slappasswd) uses 4 byte binary values (I think it grabs them from /dev/random or /dev/urandom); unfortunately that wouldn't "play OK" with my pl/pgsql implementation. I think having that work would be pretty keen, could share code if anyone is interested... -- output = reverse("ofni.secnanifxunil" "@" "enworbbc") http://linuxdatabases.info/info/unix.html Rules of the Evil Overlord #145. "My dungeon cell decor will not feature exposed pipes. While they add to the gloomy atmosphere, they are good conductors of vibrations and a lot of prisoners know Morse code." <http://www.eviloverlord.com/>
On Tue, Jul 07, 2009 at 05:35:28PM +0900, Itagaki Takahiro wrote: > Our manual says we can use pgcrypto functions or encrypted filesystems > for data encryption. > http://www.postgresql.org/docs/8.4/static/encryption-options.html > > However, they are not always the best approaches in some cases. > > For pgcrypto functions, user's SQL must contain keyword strings > and they need to consider which column is encrypted. Users complaint > that that they want to treat encrypted values as if not-encrypted. As others have said, handling encryption client side would seem to offer many more benefits (transparently within libpq offering easy adoption). > passward() and options() are SQL functions and we can re-define them > if needed. The default implementations are to refer custom GUC variables > (pgcrypto.password and pgcrypto.options) so that encryption are done > only in database server and applications don't have to know the details. Should the password be this widely shared? it would seem to make more sense if it was a write-only variable and never exposed outside the crypto module. You wouldn't even need to be a super-user to collect all the passwords otherwise, just create a function that has the name of something common and have it stash the password aware somewhere. -- Sam http://samason.me.uk/
Sam Mason <sam@samason.me.uk> wrote: > As others have said, handling encryption client side would seem to offer > many more benefits (transparently within libpq offering easy adoption). Libpq is not the only driver. Do we need to develop transparent decryption for each drivers? (libpq, JDBC, npgsql, py-postgresql, ...) Also, if you disallow server-side decode, you cannot create indexes on encrypted values because the same values are not always encrypted to the same codes. (Indexes will sort keys based on order of decoded values.) I think there is no difference between client-side decryption and clinet-supplied password as far as clinet-server communication is encrypted (i.e, SSL login). > Should the password be this widely shared? it would seem to make more > sense if it was a write-only variable and never exposed outside the > crypto module. We can use an user-defined GUC variables as a write-only variable. When we supply some show_hook function to GUC variable, SET still works but SHOW only shows '****' and hides real passwords. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Mon, Jul 13, 2009 at 01:22:30PM +0900, Itagaki Takahiro wrote: > > Sam Mason <sam@samason.me.uk> wrote: > > > As others have said, handling encryption client side would seem to offer > > many more benefits (transparently within libpq offering easy adoption). > > Libpq is not the only driver. Do we need to develop transparent decryption > for each drivers? (libpq, JDBC, npgsql, py-postgresql, ...) Just define a protocol. Of course there is more work in that, so yes, this is one point going against client-side. > Also, if you disallow server-side decode, you cannot create indexes on > encrypted values because the same values are not always encrypted to the > same codes. (Indexes will sort keys based on order of decoded values.) Definitely another point against client-side. *If* there is some random element in encryption (salt, IV, whatever), you can't index on an encrypted field. If there isn't, the encryption will be possibly weak (being amenable at least to a rainbow-table attack). > I think there is no difference between client-side decryption and > clinet-supplied password as far as clinet-server communication is > encrypted (i.e, SSL login). There definitely is a difference. If someone hi-jacks the running server (trojan, privilege escalation), s/he still doesn'nt get at the data if they only can be decrypted client-side. OTOH, with server-side decryption, all bets are off in this case, since the keys are lying around there (maybe somewhat obfuscated, but still accessible). But this has already been hashed out in another thread AFAIR. Regards - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFKWr5FBcgs9XrR2kYRAkbdAJ9JN4mXGE1uD5EGCzWgZh4dsfCPpwCfaTew 2uD3F59+Gm1wR/jnYChvF+M= =WWhL -----END PGP SIGNATURE-----
On Mon, Jul 13, 2009 at 01:22:30PM +0900, Itagaki Takahiro wrote: > Sam Mason <sam@samason.me.uk> wrote: > > As others have said, handling encryption client side would seem to offer > > many more benefits (transparently within libpq offering easy adoption). > > Libpq is not the only driver. Do we need to develop transparent decryption > for each drivers? (libpq, JDBC, npgsql, py-postgresql, ...) Yes, by definition if it's client side then it can do whatever is most appropriate for that specific client. If the libraries are in some specific language this may or may not make this easier. For example, adding transparent encryption appears to be somewhat trivial to something like haskelldb[1] because it has enough information exposed to it already to do "the right thing". Doing the right thing in languages that don't have this level of detail already exposed, i.e. libpq and most other libraries, is going to be different/harder. Note that I wasn't thinking of haskelldb when I proposed the idea, I just remembered it when trying to think of a good example. > Also, if you disallow server-side decode, you cannot create indexes on > encrypted values because the same values are not always encrypted to the > same codes. (Indexes will sort keys based on order of decoded values.) Yes, that's the point. Client side encryption is about protecting the plaintext from an attacker who has full control of the server. This may or may not be the problem you're trying to solve, but if the attacker doesn't have control over the server then encryption doesn't seem necessary--just use normal permissions and/or views ensure your security properties. > I think there is no difference between client-side decryption and > clinet-supplied password as far as clinet-server communication is > encrypted (i.e, SSL login). As tomas said, there's a massive difference here. In one, the server is part of your trusted computing base, in another it doesn't have to be. > > Should the password be this widely shared? it would seem to make more > > sense if it was a write-only variable and never exposed outside the > > crypto module. > > We can use an user-defined GUC variables as a write-only variable. > When we supply some show_hook function to GUC variable, > SET still works but SHOW only shows '****' and hides real passwords. OK, but I still don't understand what threat model you're trying to protect against. Which bits of the system are trusted and which bits are assumed malicious? "Trusted" being a technical term from the security world meaning if it goes wrong the security of the system will fail (for example you trust your doctor to respect your privacy and to not kill you), beware that PG uses the term differently with its embedded languages. It appears the main reason you're proposed this is to make it easier for new users to use crypto with their PG database---this seems to be the wrong way around. In my experience, security is hard because determining what's trusted and what's not is difficult. Once you've determined this it's reasonably easy to work the details out and get it implemented. -- Sam http://samason.me.uk/ [1] http://haskelldb.sourceforge.net/haskelldb.pdf