Thread: New types for transparent encryption

New types for transparent encryption

From
Itagaki Takahiro
Date:
Our manual says we can use pgcrypto functions or encrypted filesystems
for data encryption.
http://www.postgresql.org/docs/8.4/static/encryption-options.html

However, they are not always the best approaches in some cases.

For pgcrypto functions, user's SQL must contain keyword strings
and they need to consider which column is encrypted. Users complaint
that that they want to treat encrypted values as if not-encrypted.

For encrypted filesystems, all of database will be encrypted
and thare are considerable overheads. In addition, encrypted
filesystems are not well-maintained on some platforms.


I'd like to submit a proposal to add types that encryped or
decrypted transparently to contrib/pgcrypto. It is a simple
wrapper type of bytea. The pseudo code could be:
 CREATE TYPE encrypted_text (   INPUT = pgp_sym_encrypt_text(textin($1), passward(), options()),   OUTPUT =
textout(pgp_sym_decrypt_text($1,passward(), options())),   LIKE bytea );
 

passward() and options() are SQL functions and we can re-define them
if needed. The default implementations are to refer custom GUC variables
(pgcrypto.password and pgcrypto.options) so that encryption are done
only in database server and applications don't have to know the details.


I hope this will be an enhancement of contrib/pgcrypto.
Comments welcome.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



Re: New types for transparent encryption

From
Heikki Linnakangas
Date:
Itagaki Takahiro wrote:
> Our manual says we can use pgcrypto functions or encrypted filesystems
> for data encryption.
> http://www.postgresql.org/docs/8.4/static/encryption-options.html
> 
> However, they are not always the best approaches in some cases.
> 
> For pgcrypto functions, user's SQL must contain keyword strings
> and they need to consider which column is encrypted. Users complaint
> that that they want to treat encrypted values as if not-encrypted.
> 
> For encrypted filesystems, all of database will be encrypted
> and thare are considerable overheads. In addition, encrypted
> filesystems are not well-maintained on some platforms.
> 
> 
> I'd like to submit a proposal to add types that encryped or
> decrypted transparently to contrib/pgcrypto. It is a simple
> wrapper type of bytea. The pseudo code could be:
> 
>   CREATE TYPE encrypted_text (
>     INPUT = pgp_sym_encrypt_text(textin($1), passward(), options()),
>     OUTPUT = textout(pgp_sym_decrypt_text($1, passward(), options())),
>     LIKE bytea
>   );
> 
> passward() and options() are SQL functions and we can re-define them
> if needed. The default implementations are to refer custom GUC variables
> (pgcrypto.password and pgcrypto.options) so that encryption are done
> only in database server and applications don't have to know the details.

What kind of attacks would this protect against? Seems a bit pointless
to me if the password is being sent to the server anyway. If the
attacker has superuser access to the server, he can harvest the
passwords as the clients send them in. If he doesn't, the usual access
controls with GRANT/REVOKE would be enough.

I would see some value in this if the encryption was done in the client,
and the server never saw any decrypted values. That would reduce the
damage of a compromised server. A driver extension to do that
transparently would be neat.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: New types for transparent encryption

From
Greg Stark
Date:
On Tue, Jul 7, 2009 at 10:09 AM, Heikki
Linnakangas<heikki.linnakangas@enterprisedb.com> wrote:
>
> What kind of attacks would this protect against? Seems a bit pointless
> to me if the password is being sent to the server anyway. If the
> attacker has superuser access to the server, he can harvest the
> passwords as the clients send them in. If he doesn't, the usual access
> controls with GRANT/REVOKE would be enough.

It would still protect against offline attacks such as against backup files.

-- 
greg
http://mit.edu/~gsstark/resume.pdf


Re: New types for transparent encryption

From
Heikki Linnakangas
Date:
Greg Stark wrote:
> On Tue, Jul 7, 2009 at 10:09 AM, Heikki
> Linnakangas<heikki.linnakangas@enterprisedb.com> wrote:
>> What kind of attacks would this protect against? Seems a bit pointless
>> to me if the password is being sent to the server anyway. If the
>> attacker has superuser access to the server, he can harvest the
>> passwords as the clients send them in. If he doesn't, the usual access
>> controls with GRANT/REVOKE would be enough.
> 
> It would still protect against offline attacks such as against backup files.

True, but filesystem-level encryption handles that scenario with less pain.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: New types for transparent encryption

From
tomas@tuxteam.de
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Jul 07, 2009 at 05:35:28PM +0900, Itagaki Takahiro wrote:
> Our manual says we can use pgcrypto functions or encrypted filesystems
> for data encryption.
> http://www.postgresql.org/docs/8.4/static/encryption-options.html

As other posters have put it, I'd be very sceptical of server-side
decryption. If the server "has" all the necessary bits to decrypt the
data, all bets are off.

[encryption might be OK, with an asymmetrical scheme in the
vein of public key cryptography].

A client-side decryption (and maybe encryption as well) seems way more
attractive. For that, libpqtypes[1],[2] might come in very handy.

[1] <http://pgfoundry.org/projects/libpqtypes/>
[2] <http://libpqtypes.esilo.com/>

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFKUyC2Bcgs9XrR2kYRAiJoAJ9426t1bMtZ90690cwU9X+F4GJZkgCfZsJ2
YIon8ulaHI64l5GKbDwV4hM=
=I9fS
-----END PGP SIGNATURE-----


Re: New types for transparent encryption

From
Bill Moran
Date:
In response to Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp>:

> Our manual says we can use pgcrypto functions or encrypted filesystems
> for data encryption.
> http://www.postgresql.org/docs/8.4/static/encryption-options.html
> 
> However, they are not always the best approaches in some cases.
> 
> For pgcrypto functions, user's SQL must contain keyword strings
> and they need to consider which column is encrypted. Users complaint
> that that they want to treat encrypted values as if not-encrypted.
> 
> For encrypted filesystems, all of database will be encrypted
> and thare are considerable overheads. In addition, encrypted
> filesystems are not well-maintained on some platforms.
> 
> 
> I'd like to submit a proposal to add types that encryped or
> decrypted transparently to contrib/pgcrypto. It is a simple
> wrapper type of bytea. The pseudo code could be:
> 
>   CREATE TYPE encrypted_text (
>     INPUT = pgp_sym_encrypt_text(textin($1), passward(), options()),
>     OUTPUT = textout(pgp_sym_decrypt_text($1, passward(), options())),
>     LIKE bytea
>   );
> 
> passward() and options() are SQL functions and we can re-define them
> if needed. The default implementations are to refer custom GUC variables
> (pgcrypto.password and pgcrypto.options) so that encryption are done
> only in database server and applications don't have to know the details.
> 
> 
> I hope this will be an enhancement of contrib/pgcrypto.
> Comments welcome.

As others have noted, I doubt that the overall protection ability would
be very great with such a feature.

What I'd _really_ like to see is native PKI encryption implemented, so
roles could have keys associated with them, and roles could encrypt data
for other roles and sign data, thus allowing some really powerful data
protection schemes.

Just coming up with a methodology for this is non-trivial, though.

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/


Re: New types for transparent encryption

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Itagaki Takahiro wrote:
>> CREATE TYPE encrypted_text (
>> INPUT = pgp_sym_encrypt_text(textin($1), passward(), options()),
>> OUTPUT = textout(pgp_sym_decrypt_text($1, passward(), options())),
>> LIKE bytea
>> );
>> 
>> passward() and options() are SQL functions and we can re-define them
>> if needed. The default implementations are to refer custom GUC variables
>> (pgcrypto.password and pgcrypto.options) so that encryption are done
>> only in database server and applications don't have to know the details.

> What kind of attacks would this protect against?

I agree that this seems more like offering security theater than real
security.  I'm also pretty concerned about the implications of a
datatype whose I/O operations fundamentally don't work without knowledge
of values that are supposed to be kept secret.  What is your expectation
for how pg_dump will handle such columns?
        regards, tom lane


Re: New types for transparent encryption

From
Itagaki Takahiro
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

> Greg Stark wrote:
> > It would still protect against offline attacks such as against backup files.
> 
> True, but filesystem-level encryption handles that scenario with less pain.

Yes, I intended offline attacks, and also agree that ilesystem-level
encryption will be a solution. However, as I wrote in the first mail,
standard users want to avoid encrypted filesystems that are not maintained
or supported officially.

I just suggested postgres to have an alternative solution of filesystem-level
encryption or infrastructure providing easy way to define new types that
have only difference in typin/typout attributes, such as CREATE TYPE INHERITS
or CREATE DOMAIN WITH INPUT/OUTPUT. It is too difficult for standard users to
define operators and index support methods.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




Re: New types for transparent encryption

From
Greg Stark
Date:
On Wed, Jul 8, 2009 at 1:49 AM, Itagaki
Takahiro<itagaki.takahiro@oss.ntt.co.jp> wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
>
>> Greg Stark wrote:
>> > It would still protect against offline attacks such as against backup files.
>>
>> True, but filesystem-level encryption handles that scenario with less pain.
>
> Yes, I intended offline attacks, and also agree that ilesystem-level
> encryption will be a solution. However, as I wrote in the first mail,
> standard users want to avoid encrypted filesystems that are not maintained
> or supported officially.

I don't see how filesystem encryption helps actually. Your backups are
probably filesystem level backups so they will have the decrypted
files. Also your archived logs will have the decrypted data, etc.
Encrypting the data before it's ever written to disk means you don't
have to worry about all the different places your data ends up.

Actually pg_dump seems like it would be solvable if you had an escape
syntax to indicate that a literal contains the encrypted value in hex.
Perhaps something like the bytea syntax we're looking at adopting now.
So '\xdeadbeaf'::encrypted_type would be a perfectly valid literal
which the user could load even while not knowing what it represents,
and that would be what you would get if you access the field, for
example with pg_dump, with the guc unset -- just beware you don't run
pg_dump set with the guc set to the *wrong* key :)

However I have a different concern which hasn't been raised yet.
Encrypting lots of small chunks of data with the same key is a very
dangerous thing to do and it's very tricky to get right. Merely
applying one of the standard stream or block ciphers directly to those
short strings will *not* be secure. I think there are techniques for
dealing with this but I'm not sure what tradeoffs they have.


-- 
greg
http://mit.edu/~gsstark/resume.pdf


Re: New types for transparent encryption

From
Andrew Chernow
Date:
> Encrypting lots of small chunks of data with the same key is a very
> dangerous thing to do and it's very tricky to get right.

Using an initialization vector (IV) is the way to go, recommend using CBC or CFB 
mode.  Although, an IV is never supposed to be used more than once with the same 
key; that can leak hints about the plaintext.  Where is the randomly generated 
IV stored for use during decryption?

-- 
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/


Re: New types for transparent encryption

From
Andrew Dunstan
Date:

Andrew Chernow wrote:
>
>> Encrypting lots of small chunks of data with the same key is a very
>> dangerous thing to do and it's very tricky to get right.
>
> Using an initialization vector (IV) is the way to go, recommend using 
> CBC or CFB mode.  Although, an IV is never supposed to be used more 
> than once with the same key; that can leak hints about the plaintext.  
> Where is the randomly generated IV stored for use during decryption?

Well, you can store it along with the encrypted data. The IV doesn't 
need to be secret, just random. I do that for one of my clients.

cheers

andrew


Re: New types for transparent encryption

From
Andrew Chernow
Date:
Andrew Dunstan wrote:
> 
> 
> Andrew Chernow wrote:
>>
>>> Encrypting lots of small chunks of data with the same key is a very
>>> dangerous thing to do and it's very tricky to get right.
>>
>> Using an initialization vector (IV) is the way to go, recommend using 
>> CBC or CFB mode.  Although, an IV is never supposed to be used more 
>> than once with the same key; that can leak hints about the plaintext.  
>> Where is the randomly generated IV stored for use during decryption?
> 
> Well, you can store it along with the encrypted data. The IV doesn't 
> need to be secret, just random. I do that for one of my clients.
> 

That's correct.  Duh!?!  Probably the first N bytes of the cipher text.

Would the IV be regenerated every time the plaintext is updated, to avoid using 
it twice?  For instace: update t set text = 'abc' where id = 1 .  ISTM that the 
IV for OLD.text should be thrown away.

Where would the key come from?  Where would it be stored?  What cipher is used?

-- 
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/


Re: New types for transparent encryption

From
Itagaki Takahiro
Date:
tomas@tuxteam.de wrote:

> As other posters have put it, I'd be very sceptical of server-side
> decryption. If the server "has" all the necessary bits to decrypt the
> data, all bets are off.

Server can access both encrypted data and its password, but we can put
them in different disk drives. We cannot decrypt the data unless we have
all copies of the drives.

If postgres server is started manually, there might be another design
that DBA sets password as a postmaster's startup parameter. If do so,
the password is only in memory but not in disk drives.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




Re: New types for transparent encryption

From
Greg Stark
Date:
On Wed, Jul 8, 2009 at 6:43 AM, Itagaki
Takahiro<itagaki.takahiro@oss.ntt.co.jp> wrote:
>
> tomas@tuxteam.de wrote:
>
>> As other posters have put it, I'd be very sceptical of server-side
>> decryption. If the server "has" all the necessary bits to decrypt the
>> data, all bets are off.
>
> Server can access both encrypted data and its password, but we can put
> them in different disk drives. We cannot decrypt the data unless we have
> all copies of the drives.

I thought the proposal was to have a GUC variable with the password,
so it would be purely in memory. I had assumed the application would
have the password and call SET early in the connection. Perhaps only
certain components of the database would actually have access to the
password. That makes a lot more sense in these ajaxy soapy days of web
2.0 where perhaps only the payment processing soap service would
actually have the password to encrypt credit card details. (Though in
that case I would expect the soap service to do the encryption
itself....)

-- 
greg
http://mit.edu/~gsstark/resume.pdf


Re: New types for transparent encryption

From
Chris Browne
Date:
gsstark@mit.edu (Greg Stark) writes:
> However I have a different concern which hasn't been raised yet.
> Encrypting lots of small chunks of data with the same key is a very
> dangerous thing to do and it's very tricky to get right. 

Yeah, that's exactly the sort of thing that would be Most Useful for
someone trying to do differential cryptanalysis.
  http://en.wikipedia.org/wiki/Differential_cryptanalysis

It would provide an *exact* vector for differential attack if the
attacker has the ability to add in a series of bits of data of their
choosing before capturing the thus-encrypted dump.

If you add some more-or-less-randomish salt, ala SSHA, that could be
of some tiny help, maybe, arguably, but I doubt that's usable :-(.  http://www.openldap.org/faq/data/cache/347.html
-- 
let name="cbbrowne" and tld="acm.org" in String.concat "@" [name;tld];;
http://linuxfinances.info/info/sap.html
Why do scientists call it research when looking for something new?


Re: New types for transparent encryption

From
Chris Browne
Date:
ac@esilo.com (Andrew Chernow) writes:
> Would the IV be regenerated every time the plaintext is updated, to
> avoid using it twice?  For instace: update t set text = 'abc' where id
> = 1 .  ISTM that the IV for OLD.text should be thrown away.
>
> Where would the key come from?  Where would it be stored?  What cipher is used?

LDAP authentication systems tend to use SSHA these days...
http://www.openldap.org/faq/data/cache/347.html

With SSHA, the key used for hashing passwords is picked randomly;
often by grabbing a few bytes from /dev/random.  It's not important
that it be cryptographically secure, as it is presented directly as
part of the stored password.

In python, SSH hashes thus:

You need two inputs:

1. "password", which is the value that is to be hidden
2. "salt", a seed value.

The point isn't for "salt" to need to be super-secure, just for it to
not be frequently repeated.  "Fairly random" seems to be generally
good enough.
  import sha from base64   import b64encode  ctx = sha.new( password )   ctx.update( salt )   hash = "{SSHA}" +
b64encode(ctx.digest() + salt )
 

Sort-of-aside:

FYI, I tried implementing SSHA in pl/pgsql, with mixed results.  

It interoperated fine with other SSHA implementations as long as the
salt values were plain text.

The SSHA implementation in OpenLDAP (slappasswd) uses 4 byte binary
values (I think it grabs them from /dev/random or /dev/urandom);
unfortunately that wouldn't "play OK" with my pl/pgsql implementation.
I think having that work would be pretty keen, could share code if
anyone is interested...
-- 
output = reverse("ofni.secnanifxunil" "@" "enworbbc")
http://linuxdatabases.info/info/unix.html
Rules  of the  Evil Overlord  #145. "My  dungeon cell  decor  will not
feature exposed pipes.  While they add to the  gloomy atmosphere, they
are good  conductors of vibrations and  a lot of  prisoners know Morse
code." <http://www.eviloverlord.com/>


Re: New types for transparent encryption

From
Sam Mason
Date:
On Tue, Jul 07, 2009 at 05:35:28PM +0900, Itagaki Takahiro wrote:
> Our manual says we can use pgcrypto functions or encrypted filesystems
> for data encryption.
> http://www.postgresql.org/docs/8.4/static/encryption-options.html
> 
> However, they are not always the best approaches in some cases.
> 
> For pgcrypto functions, user's SQL must contain keyword strings
> and they need to consider which column is encrypted. Users complaint
> that that they want to treat encrypted values as if not-encrypted.

As others have said, handling encryption client side would seem to offer
many more benefits (transparently within libpq offering easy adoption).

> passward() and options() are SQL functions and we can re-define them
> if needed. The default implementations are to refer custom GUC variables
> (pgcrypto.password and pgcrypto.options) so that encryption are done
> only in database server and applications don't have to know the details.

Should the password be this widely shared? it would seem to make more
sense if it was a write-only variable and never exposed outside the
crypto module.  You wouldn't even need to be a super-user to collect all
the passwords otherwise, just create a function that has the name of
something common and have it stash the password aware somewhere.

--  Sam  http://samason.me.uk/


Re: New types for transparent encryption

From
Itagaki Takahiro
Date:
Sam Mason <sam@samason.me.uk> wrote:

> As others have said, handling encryption client side would seem to offer
> many more benefits (transparently within libpq offering easy adoption).

Libpq is not the only driver. Do we need to develop transparent decryption
for each drivers? (libpq, JDBC, npgsql, py-postgresql, ...)

Also, if you disallow server-side decode, you cannot create indexes on
encrypted values because the same values are not always encrypted to the
same codes. (Indexes will sort keys based on order of decoded values.)

I think there is no difference between client-side decryption and
clinet-supplied password as far as clinet-server communication is
encrypted (i.e, SSL login).

> Should the password be this widely shared? it would seem to make more
> sense if it was a write-only variable and never exposed outside the
> crypto module.

We can use an user-defined GUC variables as a write-only variable.
When we supply some show_hook function to GUC variable,
SET still works but SHOW only shows '****' and hides real passwords.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




Re: New types for transparent encryption

From
tomas@tuxteam.de
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, Jul 13, 2009 at 01:22:30PM +0900, Itagaki Takahiro wrote:
> 
> Sam Mason <sam@samason.me.uk> wrote:
> 
> > As others have said, handling encryption client side would seem to offer
> > many more benefits (transparently within libpq offering easy adoption).
> 
> Libpq is not the only driver. Do we need to develop transparent decryption
> for each drivers? (libpq, JDBC, npgsql, py-postgresql, ...)

Just define a protocol. Of course there is more work in that, so yes,
this is one point going against client-side.

> Also, if you disallow server-side decode, you cannot create indexes on
> encrypted values because the same values are not always encrypted to the
> same codes. (Indexes will sort keys based on order of decoded values.)

Definitely another point against client-side. *If* there is some random
element in encryption (salt, IV, whatever), you can't index on an
encrypted field. If there isn't, the encryption will be possibly weak
(being amenable at least to a rainbow-table attack).

> I think there is no difference between client-side decryption and
> clinet-supplied password as far as clinet-server communication is
> encrypted (i.e, SSL login).

There definitely is a difference. If someone hi-jacks the running server
(trojan, privilege escalation), s/he still doesn'nt get at the data if
they only can be decrypted client-side. OTOH, with server-side
decryption, all bets are off in this case, since the keys are lying
around there (maybe somewhat obfuscated, but still accessible).

But this has already been hashed out in another thread AFAIR.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFKWr5FBcgs9XrR2kYRAkbdAJ9JN4mXGE1uD5EGCzWgZh4dsfCPpwCfaTew
2uD3F59+Gm1wR/jnYChvF+M=
=WWhL
-----END PGP SIGNATURE-----


Re: New types for transparent encryption

From
Sam Mason
Date:
On Mon, Jul 13, 2009 at 01:22:30PM +0900, Itagaki Takahiro wrote:
> Sam Mason <sam@samason.me.uk> wrote:
> > As others have said, handling encryption client side would seem to offer
> > many more benefits (transparently within libpq offering easy adoption).
> 
> Libpq is not the only driver. Do we need to develop transparent decryption
> for each drivers? (libpq, JDBC, npgsql, py-postgresql, ...)

Yes, by definition if it's client side then it can do whatever is
most appropriate for that specific client.  If the libraries are in
some specific language this may or may not make this easier.  For
example, adding transparent encryption appears to be somewhat trivial to
something like haskelldb[1] because it has enough information exposed to
it already to do "the right thing".  Doing the right thing in languages
that don't have this level of detail already exposed, i.e. libpq and
most other libraries, is going to be different/harder.  Note that I
wasn't thinking of haskelldb when I proposed the idea, I just remembered
it when trying to think of a good example.

> Also, if you disallow server-side decode, you cannot create indexes on
> encrypted values because the same values are not always encrypted to the
> same codes. (Indexes will sort keys based on order of decoded values.)

Yes, that's the point.  Client side encryption is about protecting the
plaintext from an attacker who has full control of the server.

This may or may not be the problem you're trying to solve, but if the
attacker doesn't have control over the server then encryption doesn't
seem necessary--just use normal permissions and/or views ensure your
security properties.

> I think there is no difference between client-side decryption and
> clinet-supplied password as far as clinet-server communication is
> encrypted (i.e, SSL login).

As tomas said, there's a massive difference here.  In one, the server is
part of your trusted computing base, in another it doesn't have to be.

> > Should the password be this widely shared? it would seem to make more
> > sense if it was a write-only variable and never exposed outside the
> > crypto module.
> 
> We can use an user-defined GUC variables as a write-only variable.
> When we supply some show_hook function to GUC variable,
> SET still works but SHOW only shows '****' and hides real passwords.

OK, but I still don't understand what threat model you're trying to
protect against.  Which bits of the system are trusted and which bits
are assumed malicious?  "Trusted" being a technical term from the
security world meaning if it goes wrong the security of the system will
fail (for example you trust your doctor to respect your privacy and
to not kill you), beware that PG uses the term differently with its
embedded languages.

It appears the main reason you're proposed this is to make it easier
for new users to use crypto with their PG database---this seems to
be the wrong way around.  In my experience, security is hard because
determining what's trusted and what's not is difficult.  Once you've
determined this it's reasonably easy to work the details out and get it
implemented.

--  Sam  http://samason.me.uk/
[1] http://haskelldb.sourceforge.net/haskelldb.pdf