Thread: Looking for advice on database encryption
What are folks doing to protect sensitive data in their databases? We're running on the assumption that the _really_ sensitive data is too sensitive for us to just trust the front-end programs that connect to it. The decision coming down from on-high is that we need to encrypt certain fields. That's fine, looked at pgcrypto, but found the requirement to use pgp on the command line for key management to be a problem. So we're trying to implement the encryption in the front-end, but the problem we're having is searching on the encrypted fields. Since we have to decrypt each field to search on it, queries that previously took seconds now take minutes (or worse). We've tested a number of cryptographic accelerator products. In case nobody else has tried this, let me give away the ending: none that we've found are any faster than a typical server CPU. So, it's a pretty open-ended question, since we're still pretty open to different approaches, but how are others approaching this problem? The goal here is that if we're going to encrypt the data, it should be encrypted in such a way that if an attacker gets ahold of a dump of the database, they still can't access the data without the passphrases of the individuals who entered the data. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
Bill Moran wrote on 16.04.2009 21:40: > The goal here is that if we're going to encrypt the data, it should > be encrypted in such a way that if an attacker gets ahold of a dump > of the database, they still can't access the data without the > passphrases of the individuals who entered the data. I'm by far not an expert, but my naive attempt would be to store the the database files in an encrypted filesystem. Thomas
Bill Moran wrote: > What are folks doing to protect sensitive data in their databases? > I would probably do my encryption in the application layer, and only encrypt the sensitive fields. fields used as indexes probably should not be encrypted, unless the only index operation is EQ/NE, then you could use the encrypted index value as the search key. this would even work for foreign key relations. of course, if part of your cryptography regimen involves key expiration and rotation, there'd be the hellacious problem of decrypting/reencryption. it really all depends on what the security requirements are. -somewhere- there's a weak spot, in the above model, its the application server thats doing the cryptography, if it gets compromised, then the keys can be extracted, and all bets are off.
In response to Thomas Kellerer <spam_eater@gmx.net>: > Bill Moran wrote on 16.04.2009 21:40: > > The goal here is that if we're going to encrypt the data, it should > > be encrypted in such a way that if an attacker gets ahold of a dump > > of the database, they still can't access the data without the > > passphrases of the individuals who entered the data. > > I'm by far not an expert, but my naive attempt would be to store the the > database files in an encrypted filesystem. That was the first suggestion when we started brainstorming ideas. Unfortunately, it fails to protect us from the most likely attack vector: SQL Injection/application layer bugs. In an SQL Injection (for example) the fact that the filesystem is encrypted does zero to protect the sensitive data. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
On Apr 16, 2009, at 12:40 PM, Bill Moran wrote: (This is the traditional "you're asking the wrong question" response). > > What are folks doing to protect sensitive data in their databases? I don't think that's a useful way to look at it. Protecting sensitive data in the entire system, where the database is just one part of that system is likely to lead to a much better answer. > > We're running on the assumption that the _really_ sensitive data > is too sensitive for us to just trust the front-end programs that > connect to it. > > The decision coming down from on-high is that we need to encrypt > certain fields. If that's the mandate, then that's what you have to do. It's unlikely to make the system overall much more secure, and likely no more secure than some much less intrusive approaches, though. > That's fine, looked at pgcrypto, but found > the requirement to use pgp on the command line for key management > to be a problem. > > So we're trying to implement the encryption in the front-end, but > the problem we're having is searching on the encrypted fields. Since > we have to decrypt each field to search on it, queries that previously > took seconds now take minutes (or worse). > > We've tested a number of cryptographic accelerator products. In > case nobody else has tried this, let me give away the ending: none > that we've found are any faster than a typical server CPU. > > So, it's a pretty open-ended question, since we're still pretty open > to different approaches, but how are others approaching this problem? > > The goal here is that if we're going to encrypt the data, it should > be encrypted in such a way that if an attacker gets ahold of a dump > of the database, they still can't access the data without the > passphrases of the individuals who entered the data. If the concern is database dumps, then encrypting the output of pg_dump will pretty much solve the problem. But if the attack vector is the common one of compromising the front end, then encrypting data in the database, but allowing the front end to decrypt it is likely useless. If the concern is "what if an attacker got access to the server?" then physical security is likely to have much better ROI than some random encryption regime. Can you go back and ask your management what their actual security or compliance needs are? If it's a real business need you probably want to find a decent security guy and have him draft the questions that management need to answer and start from there, rather than trying to clean up after someone has already made 95% of the decisions, in an uninformed way, for you. Cheers, Steve
On Thu, April 16, 2009 13:20, Bill Moran wrote: > In response to Thomas Kellerer <spam_eater@gmx.net>: > >> Bill Moran wrote on 16.04.2009 21:40: >> > The goal here is that if we're going to encrypt the data, it should >> > be encrypted in such a way that if an attacker gets ahold of a dump >> > of the database, they still can't access the data without the >> > passphrases of the individuals who entered the data. >> >> I'm by far not an expert, but my naive attempt would be to store the the >> database files in an encrypted filesystem. > > That was the first suggestion when we started brainstorming ideas. > Unfortunately, it fails to protect us from the most likely attack > vector: SQL Injection/application layer bugs. In an SQL Injection > (for example) the fact that the filesystem is encrypted does zero > to protect the sensitive data. > > -- > Bill Moran > http://www.potentialtech.com > http://people.collaborativefusion.com/~wmoran/ > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > I'll chime in here, even though I probably shouldn't. A lot is dependent on what standard you're trying to meet. General Security (and Common Sense) vs PCI/DSS vs NSA/DoD vs some other standard. Do you need to decrypt the values once they're in the system? Do you need the items in an index? Do the values need to be part of a constraint / foreign key relationship (because a hashed value may cause you a lot of headaches!)? Look at these different scenarios and think about the data (both in encrypted format and unencrypted format) before you decide HOW you want to do it. Tim -- Timothy J. Bruce
Bill Moran wrote on 16.04.2009 22:20: >> I'm by far not an expert, but my naive attempt would be to store the the >> database files in an encrypted filesystem. > > That was the first suggestion when we started brainstorming ideas. > Unfortunately, it fails to protect us from the most likely attack > vector: SQL Injection/application layer bugs. In an SQL Injection > (for example) the fact that the filesystem is encrypted does zero > to protect the sensitive data. > Which is something different than your statement >> The goal here is that if we're going to encrypt the data, it should >> be encrypted in such a way that if an attacker gets ahold of a dump >> of the database, they still can't access the data without the >> passphrases of the individuals who entered the data. which only talks about someone getting hold of the contents of the server's harddisk. As you have to ultimately decrypt the data to display it to the user, he can always take a screenshot (or copy & paste the text from the web front end) and walk away. He doesn't even need to use some SQL injection. To prevent SQL injection there are pretty robust solutions for this (prepared statements, sanitizing and cleaning any user input, maybe even control the access to the data by stored procedures which can add an additional layer of security) I agree with Kenneth: you need to be more precise on which scenario you have to deal with. Thomas
In response to Steve Atkins <steve@blighty.com>: > > On Apr 16, 2009, at 12:40 PM, Bill Moran wrote: > > (This is the traditional "you're asking the wrong question" response). > > > > > What are folks doing to protect sensitive data in their databases? > > I don't think that's a useful way to look at it. Protecting sensitive > data in the entire system, where the database is just one > part of that system is likely to lead to a much better answer. <snip> I disagree. We're already addressing the issues of security on the application level through extensive testing, data validation out the wazoo (to prevent SQL Injection and other application breaches). All our servers are in highly secure data centers. We have VPNs and access restrictions at the IP and the user level to the 9s. It's still not enough. My task here is to develop a system to protect the data in the event that all of those fail. As a result, I'm looking for general advice. I already have a system in place. This is apparently another part that I should have described in more detail. So, here goes: To draw a parallel example on the application: Imagine that you're an employee in a business. When you're hired, you enter your SSN into the company database. Now, your department manager needs to have access to your SSN for various reasons, so the system grants access to your encryption key to the department manager. Based on system policy, the division manager has access to all the data in the department, and the company head has access to all divisions. As a result, the company head can get your SSN out of the database using the passphrase for his key. However, Joe, over in IT can not access your SSN. Even though he's the DBA and can pull a full text dump of the database at will, he can not decrypt your SSN unless he has a passphrase to one of the keys that can decrypt it. All that is pretty standard PKI stuff, and I've created the tables and the functions that implement it. The problem comes when the company head wants to search through the database to find out which employee has a specific SSN. He should be able to do so, since he has access to everything, but the logistics of doing so in a reasonable amount of time are rather complex and very time consuming. On a million rows with the SSN unencrypted, such a query would take less than a second with an appropriate index, but pulling those million rows into the application in order to decrypt each one and see if it matches can easily take a half hour or longer. That's where we're having difficulty. Our requirements are that the data must be strongly protected, but the appropriate people must be able to do (often complex) searches on it that complete in record time. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
In response to Thomas Kellerer <spam_eater@gmx.net>: > Bill Moran wrote on 16.04.2009 22:20: > >> I'm by far not an expert, but my naive attempt would be to store the the > >> database files in an encrypted filesystem. > > > > That was the first suggestion when we started brainstorming ideas. > > Unfortunately, it fails to protect us from the most likely attack > > vector: SQL Injection/application layer bugs. In an SQL Injection > > (for example) the fact that the filesystem is encrypted does zero > > to protect the sensitive data. > > Which is something different than your statement > > >> The goal here is that if we're going to encrypt the data, it should > >> be encrypted in such a way that if an attacker gets ahold of a dump > >> of the database, they still can't access the data without the > >> passphrases of the individuals who entered the data. > > which only talks about someone getting hold of the contents of the server's > harddisk. Not really. You're making an assumption that a pg_dump can only be run on the server itself. Let's chalk this up to miscommunication and allow me to rephrase: The data needs to be encrypted in such a way that if an attacker can get an offline copy of the data by any means, they have no greater access to the data that they would if they used the application to access it. I already have that using PKI. Again, it seems that I left too many details out of my description of the problem. See my post in response to Steve Atkins for a more detailed description of the problem, and I apologize for being too vague the first go-round. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
Bill Moran wrote on 16.04.2009 23:06: >> which only talks about someone getting hold of the contents of the server's >> harddisk. > > Not really. You're making an assumption that a pg_dump can only be > run on the server itself. Right, I forgot that. But then it's similar to the situation where the user displays the data and walks away with the screenshot... If you have an application server sitting in the middle you can limit connections to the database to the app server itself. Or even put the appserver on the same box as the database server and limit connections only to localhost. In that case the attacker needs to be able to log-in to the server directly. > and I apologize for being too vague the first go-round. No problem. This happens to me all the time. Once a discussion starts about a topic I find myself wondering how I could forget all the details that I'm being asked about ;) Thomas
Couldn't you just add a PGP based column (or similar encryption protocol) for authentication? This would protect you against injection attacks, would it not? You could also use PGP or similar for key management if I'm not mistaken. -Will -----Original Message----- In response to Thomas Kellerer <spam_eater@gmx.net>: That was the first suggestion when we started brainstorming ideas. Unfortunately, it fails to protect us from the most likely attack vector: SQL Injection/application layer bugs. In an SQL Injection (for example) the fact that the filesystem is encrypted does zero to protect the sensitive data.
Bill Moran wrote: > The problem comes when the company head wants to search through the > database to find out which employee has a specific SSN. He should > be able to do so, since he has access to everything, but the logistics of > doing so in a reasonable amount of time are rather complex and very > time consuming. On a million rows with the SSN unencrypted, such a > query would take less than a second with an appropriate index, but > pulling those million rows into the application in order to decrypt > each one and see if it matches can easily take a half hour or longer. > > That's where we're having difficulty. Our requirements are that the > data must be strongly protected, but the appropriate people must be > able to do (often complex) searches on it that complete in record > time. > an index on the encrypted SSN field would do this just fine. if authorized person needs to find the record with a specific SSN, they encrypt that SSN and then look up the ciphertext in the database... done.
If the purpose of encryption is for financial or medica data transmission security, or something of a higher order, you may want to implement a stronger type of security such as SSL or PGP or some other type of public/private key process.
You could create a schema that contains views of the data with out the sensitive data and have the users use that schema for their needs, assumes that it basically used to view or report on the data.
Just some thoughts.
Michael Black
> Date: Thu, 16 Apr 2009 15:40:12 -0400
> From: wmoran@potentialtech.com
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Looking for advice on database encryption
>
>
> What are folks doing to protect sensitive data in their databases?
>
> We're running on the assumption that the _really_ sensitive data
> is too sensitive for us to just trust the front-end programs that
> connect to it.
>
> The decision coming down from on-high is that we need to encrypt
> certain fields. That's fine, looked at pgcrypto, but found
> the requirement to use pgp on the command line for key management
> to be a problem.
>
> So we're trying to implement the encryption in the front-end, but
> the problem we're having is searching on the encrypted fields. Since
> we have to decrypt each field to search on it, queries that previously
> took seconds now take minutes (or worse).
>
> We've tested a number of cryptographic accelerator products. In
> case nobody else has tried this, let me give away the ending: none
> that we've found are any faster than a typical server CPU.
>
> So, it's a pretty open-ended question, since we're still pretty open
> to different approaches, but how are others approaching this problem?
>
> The goal here is that if we're going to encrypt the data, it should
> be encrypted in such a way that if an attacker gets ahold of a dump
> of the database, they still can't access the data without the
> passphrases of the individuals who entered the data.
>
> --
> Bill Moran
> http://www.potentialtech.com
> http://people.collaborativefusion.com/~wmoran/
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
On Thu Apr 16 05:06 PM, Bill Moran wrote: > > The problem comes when the company head wants to search through the > database to find out which employee has a specific SSN. He should be > able to do so, since he has access to everything, but the logistics of > doing so in a reasonable amount of time are rather complex and very > time consuming. On a million rows with the SSN unencrypted, such a > query would take less than a second with an appropriate index, but > pulling those million rows into the application in order to decrypt > each one and see if it matches can easily take a half hour or longer. > > That's where we're having difficulty. Our requirements are that the > data must be strongly protected, but the appropriate people must be > able to do (often complex) searches on it that complete in record time. > > -- Would storing a one-way hash of the SSN work for you? i.e. combine sha1 and/or md5, use a salt... SELECT ssn_encrypted FROM employees WHERE ssn_hash = yourhashmethod(SSN_PLAINTEXT) So you have both an encrypted version of the SSN and a one-way hash of it. That's how we store credit card numbers.
Eric Soroos wrote: >> an index on the encrypted SSN field would do this just fine. if >> authorized person needs to find the record with a specific SSN, they >> encrypt that SSN and then look up the ciphertext in the database... >> done. >> > > This will only work for e(lectronic?) code book ciphers, and not > chained block ciphers, since the initialization vector will randomize > the output of the encryption so that E(foo) != E(foo) just to prevent > this sort of attack. can those sorts of chained block ciphers decode blocks in a different order than they were originally encoded? for this sort of application, wouldn't each field or record pretty much have to be encrypted discretely so that they can be decrypted in any order, or any single record be decrypted on its own?
Thomas Kellerer <spam_eater@gmx.net> wrote: > > Bill Moran wrote on 16.04.2009 23:06: > >> which only talks about someone getting hold of the contents of the server's > >> harddisk. > > > > Not really. You're making an assumption that a pg_dump can only be > > run on the server itself. > > Right, I forgot that. > > But then it's similar to the situation where the user displays the data and > walks away with the screenshot... Actually, it's completely different. If a user walks away with a screenshot of data that they had access to anyway, then the application developer is not culpable. However, if a flaw is found in the application and a user can use it to gain escalated privs and access data that would normally not be available, the application developer is going out of business. If a user finds a flaw, but it simply result in an error because the layer of security behind it prevents an information leak, then the application developer doesn't look very bad at all. Layered security saves the day! > If you have an application server sitting in the middle you can limit > connections to the database to the app server itself. Or even put the appserver > on the same box as the database server and limit connections only to localhost. > In that case the attacker needs to be able to log-in to the server directly. You're assuming that the application is perfect. With the data we're protecting, we don't have that luxury. This isn't a particularly new view of security. CERT has hundreds or pages documented on how this is correct security practice. If it wasn't there wouldn't need to be firewalls between Windows servers and the Internet. The part that's unique (from my experience) is the demand that the data be so readily assessable. Usually, highly secure data is understood to be difficult to access, but that understanding doesn't exist in this market. It's an unreasonable expectation on the part of our clients, to be honest, but if we can find a way to meet it, we leave the competition in the dust. Thanks for the feedback so far. -- Bill Moran http://www.potentialtech.com
>> >> That's where we're having difficulty. Our requirements are that the >> data must be strongly protected, but the appropriate people must be >> able to do (often complex) searches on it that complete in record >> time. >> > > an index on the encrypted SSN field would do this just fine. if > authorized person needs to find the record with a specific SSN, they > encrypt that SSN and then look up the ciphertext in the database... > done. > This will only work for e(lectronic?) code book ciphers, and not chained block ciphers, since the initialization vector will randomize the output of the encryption so that E(foo) != E(foo) just to prevent this sort of attack. You're looking for a hash function, since that's a one way, stable function meant for comparing. eric
"Will Rutherdale (rutherw)" <rutherw@cisco.com> wrote: > > Couldn't you just add a PGP based column (or similar encryption > protocol) for authentication? This would protect you against injection > attacks, would it not? > > You could also use PGP or similar for key management if I'm not > mistaken. Thanks for the input, Will. We're already doing this, the problem we've had is that the time to decrypt the data is making access too slow. Basically, people administrators need to be able to say, "show me all the registrants whose personal medical information is x" and get results in a reasonable amount of time. Decrypting the data to do the matching is about 100x slower than a typical seq scan. To give you an idea of what we've tried, I've tried pgcrypto, openssl with rc4, des and 3des, using envelope encryption, and raw aes-128 symmetrical encryption. In addition, we've purchased two different hardware accelerators for crypto to find that both of them are slower than the CPU itself, and they're both the high-end "enterprise" class cards. -- Bill Moran http://www.potentialtech.com
Michael Black <michaelblack75052@hotmail.com> wrote: > > If the purpose of encryption is for financial or medica data transmission security, or something of a higher order, youmay want to implement a stronger type of security such as SSL or PGP or some other type of public/private key process. > > You could create a schema that contains views of the data with out the sensitive data and have the users use that schemafor their needs, assumes that it basically used to view or report on the data. Thanks for the input, Michael. We're already working on using PKI, the big problem we're having is the speed of access when an administrator needs to search through the encrypted data. -- Bill Moran http://www.potentialtech.com
John R Pierce <pierce@hogranch.com> wrote: > > Eric Soroos wrote: > >> an index on the encrypted SSN field would do this just fine. if > >> authorized person needs to find the record with a specific SSN, they > >> encrypt that SSN and then look up the ciphertext in the database... > >> done. > > > > This will only work for e(lectronic?) code book ciphers, and not > > chained block ciphers, since the initialization vector will randomize > > the output of the encryption so that E(foo) != E(foo) just to prevent > > this sort of attack. > > can those sorts of chained block ciphers decode blocks in a different > order than they were originally encoded? for this sort of > application, wouldn't each field or record pretty much have to be > encrypted discretely so that they can be decrypted in any order, or any > single record be decrypted on its own? Eric is right about CBC ciphers. The problem is that any function that will produce the same output for the same input (such as md5 or sha) leaves us open to brute force attacks if the number of choices is small, or pattern discovery attacks in other cases. And anything that protects us against such attacks (such as aes-cbc) will generate data that I can't pre-encrypt and search against. I haven't tried it, but I don't believe CBC ciphers can decrypt data out of order. In the implementation I've built, the IV is stored with the ciphertext, much the same way that crypt() stores the salt with the password hash. As a result, if you have the key, you then have all the data required to decrypt the field, but you can't easily brute force it or do any pattern analysis. -- Bill Moran http://www.potentialtech.com
"Jonathan Bond-Caron" <jbondc@openmv.com> wrote: > > On Thu Apr 16 05:06 PM, Bill Moran wrote: > > > > The problem comes when the company head wants to search through the > > database to find out which employee has a specific SSN. He should be > > able to do so, since he has access to everything, but the logistics of > > doing so in a reasonable amount of time are rather complex and very > > time consuming. On a million rows with the SSN unencrypted, such a > > query would take less than a second with an appropriate index, but > > pulling those million rows into the application in order to decrypt > > each one and see if it matches can easily take a half hour or longer. > > > > That's where we're having difficulty. Our requirements are that the > > data must be strongly protected, but the appropriate people must be > > able to do (often complex) searches on it that complete in record time. > > Would storing a one-way hash of the SSN work for you? i.e. combine sha1 > and/or md5, use a salt... > > SELECT ssn_encrypted FROM employees WHERE ssn_hash = > yourhashmethod(SSN_PLAINTEXT) > > So you have both an encrypted version of the SSN and a one-way hash of it. > > That's how we store credit card numbers. We're considering that for some fields. It does limit a lot ... we can't do partial matching for example. Other fields don't work so well. If I try to use that trick on a field that has too few choices (true/false is the worst, but anything with less than a few thousand possibilities, i.e. "county of residence" is a problem) then I've left it open to an easy dictionary attack. Thanks for the input. -- Bill Moran http://www.potentialtech.com
> What are folks doing to protect sensitive data in their databases?
>
> We're running on the assumption that the _really_ sensitive data
> is too sensitive for us to just trust the front-end programs that
> connect to it.
>
> The decision coming down from on-high is that we need to encrypt
> certain fields. That's fine, looked at pgcrypto, but found
> the requirement to use pgp on the command line for key management
> to be a problem.
>
> So we're trying to implement the encryption in the front-end, but
> the problem we're having is searching on the encrypted fields. Since
> we have to decrypt each field to search on it, queries that previously
> took seconds now take minutes (or worse).
>
> We've tested a number of cryptographic accelerator products. In
> case nobody else has tried this, let me give away the ending: none
> that we've found are any faster than a typical server CPU.
>
> So, it's a pretty open-ended question, since we're still pretty open
> to different approaches, but how are others approaching this problem?
>
> The goal here is that if we're going to encrypt the data, it should
> be encrypted in such a way that if an attacker gets ahold of a dump
> of the database, they still can't access the data without the
> passphrases of the individuals who entered the data.
Take the performance hit, If people on high want the data encrypted, then they have to suffer the perfromance penalty, however bad.
Could you not write some server extensions to encrypt / decrypt the data server side, coupled with a custom index implementation?
Can you use a global server side key or do you need fine grained encryption?
Is a database the correct tool for the job if you want this level of encryption and granularity?
Also, how secure are you communication channels, what stops me snooping the data in transit, ARP posioning and other techniques etc.
Chris Ellis
******************************************************************************
If you are not the intended recipient of this email please do not send it on
to others, open any attachments or file the email locally.
Please inform the sender of the error and then delete the original email.
For more information, please refer to http://www.shropshire.gov.uk/privacy.nsf
******************************************************************************
In response to Chris.Ellis@shropshire.gov.uk: > > What are folks doing to protect sensitive data in their databases? > > > > We're running on the assumption that the _really_ sensitive data > > is too sensitive for us to just trust the front-end programs that > > connect to it. > > > > The decision coming down from on-high is that we need to encrypt > > certain fields. That's fine, looked at pgcrypto, but found > > the requirement to use pgp on the command line for key management > > to be a problem. > > > > So we're trying to implement the encryption in the front-end, but > > the problem we're having is searching on the encrypted fields. Since > > we have to decrypt each field to search on it, queries that previously > > took seconds now take minutes (or worse). > > > > We've tested a number of cryptographic accelerator products. In > > case nobody else has tried this, let me give away the ending: none > > that we've found are any faster than a typical server CPU. > > > > So, it's a pretty open-ended question, since we're still pretty open > > to different approaches, but how are others approaching this problem? > > > > The goal here is that if we're going to encrypt the data, it should > > be encrypted in such a way that if an attacker gets ahold of a dump > > of the database, they still can't access the data without the > > passphrases of the individuals who entered the data. > > Take the performance hit, If people on high want the data encrypted, then > they have to suffer the performance penalty, however bad. As reasonable as that sounds, I don't think it's true. We've already brainstormed a dozen ways to work around the performance issue (creative hashing, backgrounding the decryption and using ajax to display the results as they're decrypted ...) Problem is that all of these methods complicate things in the application. I was hoping there were better approaches to the solution, but I'm starting to think that we're already on the right path. > Could you not write some server extensions to encrypt / decrypt the data > server side, coupled with a custom index implementation? Not sure how the index implementation would work. The server-side encryption doesn't really help much ... it's difficult to add more DB servers in order to improve throughput, but adding more web servers fits easily into our load balanced setup. In any event, the addition of processing cores (not matter where) doesn't speed up the decryption of individual items, it only allows us to do more in parallel. > Can you use a global server side key or do you need fine grained > encryption? > > Is a database the correct tool for the job if you want this level of > encryption and granularity? The global side key puts us in pretty much the same situation that filesystem encryption does, which is not quite as strong as we're looking for. I've considered the possibility of using something other than the DB, but I can't think of any storage method that gains us anything over the DB. Also, if we use something different than the DB, we then have to come up with a way to replicated it to the backup datacenter. If we put the data in the DB, slony is already set up to take care of that. > Also, how secure are you communication channels, what stops me snooping > the data in transit, ARP posioning and other techniques etc. We do what we can. Everything is transferred over HTTPS, and we log and monitor activity. We're constantly looking for ways to improve that side of things as well, but that's a discussion for a different forum. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
> > Take the performance hit, If people on high want the data encrypted, then
> > they have to suffer the performance penalty, however bad.
>
> As reasonable as that sounds, I don't think it's true. We've already
> brainstormed a dozen ways to work around the performance issue (creative
> hashing, backgrounding the decryption and using ajax to display the
> results as they're decrypted ...)
>
> Problem is that all of these methods complicate things in the
> application. I was hoping there were better approaches to the
> solution, but I'm starting to think that we're already on the
> right path.
>
> > Could you not write some server extensions to encrypt / decrypt the data
> > server side, coupled with a custom index implementation?
>
> Not sure how the index implementation would work. The server-side
> encryption doesn't really help much ... it's difficult to add more
> DB servers in order to improve throughput, but adding more web
> servers fits easily into our load balanced setup. In any event,
> the addition of processing cores (not matter where) doesn't speed
> up the decryption of individual items, it only allows us to do more
> in parallel.
Move all DB calls to stored procedures, let the stored procedures handle the encryption / decryption with a given key.
If your communication channels are secure then this is just as secure as decrypting the data in the application.
This also allows DB's to be clustered, with the likes of PL/Proxy.
You could create a custom datatype to hold the encrypted data, then functions to access it.
> > Can you use a global server side key or do you need fine grained
> > encryption?
> >
> > Is a database the correct tool for the job if you want this level of
> > encryption and granularity?
> The global side key puts us in pretty much the same situation that
> filesystem encryption does, which is not quite as strong as we're
> looking for.
> I've considered the possibility of using something other than the
> DB, but I can't think of any storage method that gains us anything over
> the DB. Also, if we use something different than the DB, we then have
> to come up with a way to replicated it to the backup datacenter. If
> we put the data in the DB, slony is already set up to take care of that.
File system, Leave the replication upto the SAN. Store your data in flat files which are encrypted with each key, an Index per user etc.
>
> > Also, how secure are you communication channels, what stops me snooping
> > the data in transit, ARP posioning and other techniques etc.
>
> We do what we can. Everything is transferred over HTTPS, and we log and
> monitor activity. We're constantly looking for ways to improve that
> side of things as well, but that's a discussion for a different forum.
>
> --
> Bill Moran
> http://www.potentialtech.com
> http://people.collaborativefusion.com/~wmoran/
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
******************************************************************************
If you are not the intended recipient of this email please do not send it on
to others, open any attachments or file the email locally.
Please inform the sender of the error and then delete the original email.
For more information, please refer to http://www.shropshire.gov.uk/privacy.nsf
******************************************************************************
On Thu, Apr 16, 2009 at 05:06:13PM -0400, Bill Moran wrote: > I disagree. We're already addressing the issues of security on the > application level through extensive testing, data validation out the > wazoo (to prevent SQL Injection and other application breaches). All > our servers are in highly secure data centers. We have VPNs and > access restrictions at the IP and the user level to the 9s. > > It's still not enough. > > My task here is to develop a system to protect the data in the event > that all of those fail. As a result, I'm looking for general advice. Mine would be to define what you do trust and not what you don't trust. I think you need to do that before you can get much further. At the moment the problem seems somewhat ill defined. For example; you say that you don't trust the application, yet the user must trust the application as they're entering their secret into it. How does the user ascertain that the application they're talking to is the "real" one and that it hasn't been replaced with a pretend one that sends their secret off to an attacker who has access to a real version of the program? Protecting against this in general is, as far as I know, is impossible. The get out clause is that you're not trying to solve the general case, you've got a specific set of use cases that you need to solve. -- Sam http://samason.me.uk/
In response to Sam Mason <sam@samason.me.uk>: > On Thu, Apr 16, 2009 at 05:06:13PM -0400, Bill Moran wrote: > > I disagree. We're already addressing the issues of security on the > > application level through extensive testing, data validation out the > > wazoo (to prevent SQL Injection and other application breaches). All > > our servers are in highly secure data centers. We have VPNs and > > access restrictions at the IP and the user level to the 9s. > > > > It's still not enough. > > > > My task here is to develop a system to protect the data in the event > > that all of those fail. As a result, I'm looking for general advice. > > Mine would be to define what you do trust and not what you don't trust. > I think you need to do that before you can get much further. At the > moment the problem seems somewhat ill defined. > > For example; you say that you don't trust the application, yet the user > must trust the application as they're entering their secret into it. > How does the user ascertain that the application they're talking to is > the "real" one and that it hasn't been replaced with a pretend one that > sends their secret off to an attacker who has access to a real version > of the program? > > Protecting against this in general is, as far as I know, is impossible. > The get out clause is that you're not trying to solve the general case, > you've got a specific set of use cases that you need to solve. The primary portal into the application right now is a web site. As a result, this part of it is handled by typical SSL certs and the like. As far as the trust factor, you've blurred the lines a bit. My job is to ensure that the user doesn't know or care about the lines between application and database, but trusts the system as a whole. However, I need to clearly define those lines and ensure that each part of the whole has enough security measures to withstand a flaw in one of the other parts. Think of the design of postfix, where each program (smtpd, qmgr, etc) doesn't trust the input of the other programs and runs in its own sandbox. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
On Fri, Apr 17, 2009 at 09:52:30AM -0400, Bill Moran wrote: > In response to Sam Mason <sam@samason.me.uk>: > > For example; you say that you don't trust the application, yet the user > > must trust the application as they're entering their secret into it. > > How does the user ascertain that the application they're talking to is > > the "real" one and that it hasn't been replaced with a pretend one that > > sends their secret off to an attacker who has access to a real version > > of the program? > > The primary portal into the application right now is a web site. As > a result, this part of it is handled by typical SSL certs and the like. OK, that defers the problem nicely. > As far as the trust factor, you've blurred the lines a bit. My job > is to ensure that the user doesn't know or care about the lines between > application and database, but trusts the system as a whole. However, > I need to clearly define those lines and ensure that each part of > the whole has enough security measures to withstand a flaw in one > of the other parts. Think of the design of postfix, where each > program (smtpd, qmgr, etc) doesn't trust the input of the other > programs and runs in its own sandbox. Sorry; my example of where to place trust was a bad one, lets try some other ones: The Postgres process; do you trust that the database engine is secure? This implies that the frontend program can send the user's secret to the database engine and the decryption will be done "inside" the database. I believe this to be the case, otherwise for the user to query on SSN, to pick an example you were using before, you would need to send *every* encrypted SSN to the client where they would decrypt it with their secret to find the one they wanted. Backups; you mentioned that if someone stole the backups they shouldn't be able to get any more information than if they were using the client interface. If every sensitive field is encrypted then you're protected against some attacks, but you'd be better encrypting the backup. Where is it OK to place the trust here? -- Sam http://samason.me.uk/
In response to Sam Mason <sam@samason.me.uk>: > > > As far as the trust factor, you've blurred the lines a bit. My job > > is to ensure that the user doesn't know or care about the lines between > > application and database, but trusts the system as a whole. However, > > I need to clearly define those lines and ensure that each part of > > the whole has enough security measures to withstand a flaw in one > > of the other parts. Think of the design of postfix, where each > > program (smtpd, qmgr, etc) doesn't trust the input of the other > > programs and runs in its own sandbox. > > Sorry; my example of where to place trust was a bad one, lets try some > other ones: > > The Postgres process; do you trust that the database engine is secure? > This implies that the frontend program can send the user's secret to the > database engine and the decryption will be done "inside" the database. > I believe this to be the case, otherwise for the user to query on SSN, > to pick an example you were using before, you would need to send *every* > encrypted SSN to the client where they would decrypt it with their secret > to find the one they wanted. > > Backups; you mentioned that if someone stole the backups they shouldn't > be able to get any more information than if they were using the client > interface. If every sensitive field is encrypted then you're protected > against some attacks, but you'd be better encrypting the backup. Where > is it OK to place the trust here? Nowhere, really. The goal is not to trust any one part of the system. As a result, we can protect the data across multiple security failures. For example, backups will actually be encrypted twice. In order to recover data from our backups, an attacker would have to physically acquire a tape, then steal or brute-force 2 different encryption keys before they could access the data. In the end, the only person that should be trusted with the data are the users who own the data. That person will explicitly grant access to their data to program administrators, the software will simply facilitate the process. We put as many layers as we can everywhere. Usually this is limited when we start hitting performance issues and have to remove a layer to keep performance where it needs to be. The goal is to have as many layers as possible while keeping the system as performant as the client expects. We only get one shot at this. If there's a data leak, a lot of people are going to be very upset and we're going to be out of business, so we're implementing the tightest security possible at every layer. This thread is only one part of the overall process as it specifically relates to the database layer. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
Bill Moran wrote: > John R Pierce <pierce@hogranch.com> wrote: > >> Eric Soroos wrote: >> >>>> an index on the encrypted SSN field would do this just fine. if >>>> authorized person needs to find the record with a specific SSN, they >>>> encrypt that SSN and then look up the ciphertext in the database... >>>> done. >>>> >>> This will only work for e(lectronic?) code book ciphers, and not >>> chained block ciphers, since the initialization vector will randomize >>> the output of the encryption so that E(foo) != E(foo) just to prevent >>> this sort of attack. >>> >> can those sorts of chained block ciphers decode blocks in a different >> order than they were originally encoded? for this sort of >> application, wouldn't each field or record pretty much have to be >> encrypted discretely so that they can be decrypted in any order, or any >> single record be decrypted on its own? >> > > Eric is right about CBC ciphers. The problem is that any function that > will produce the same output for the same input (such as md5 or sha) leaves > us open to brute force attacks if the number of choices is small, or > pattern discovery attacks in other cases. And anything that protects > us against such attacks (such as aes-cbc) will generate data that I > can't pre-encrypt and search against. > > I haven't tried it, but I don't believe CBC ciphers can decrypt data out > of order. > > In the implementation I've built, the IV is stored with the ciphertext, > much the same way that crypt() stores the salt with the password hash. > As a result, if you have the key, you then have all the data required > to decrypt the field, but you can't easily brute force it or do any > pattern analysis. > > Searching encrypted data is difficult in a situation like this. There is research (e.g. [1,2]) into encrypting relatively large _text_ fields so that the ciphertext is amenable to search, but in general all the schemes sacrifice functionality for security - partial matches, for example, are relatively difficult to achieve and regular expressions are virtually impossible without manual expansion. Plus you'd have to roll your own, which would be prone to error. My current university project is in this area, developing a system for PostgreSQL that allows secure search on encrypted _text_ fields (e.g. full documents) using [2], but for a field as small as a SSN, there isn't really anything obvious. Having said that, using a block cipher in ECB mode on the SSN should be enough to be able to perform fast exact-matches (based on my limited knowledge of SSNs). Assuming the 'user' table is normalised so all the SSNs in it will be unique, the possibility of frequency analysis on the ciphertext is slim, especially since a 9 digit SSN encoded as ASCII will easily fit into a single block of most recent ciphers (AES has a 16-byte block for example). Each SSN's ciphertext will be as unique as the original. Similarly for a one-way hash function. An SSN will only have 1billion possible combinations, so a brute force attack would be possible, but I don't see a way of avoiding this. You could look into stream ciphers for left-most matches, but these are almost always susceptible to statistical attacks when used incorrectly. Generally speaking, searching requires a pattern, which leads to possible attacks. I think you'll have to either put up with the inefficiency or sacrifice some amount of security. Cheers, Will. [1] http://www.cs.berkeley.edu/~dawnsong/papers/se.pdf [2] http://gnunet.org/papers/secureindex.pdf
On Fri, Apr 17, 2009 at 10:33:15AM -0400, Bill Moran wrote: > The goal is not to trust any one part of the system. > As a result, we can protect the data across multiple security failures. As far as I know this isn't a good way to go about designing secure systems. You're better off defining a set of plausible attack vectors and design the system so it's not vulnerable to them. > For example, backups will actually be encrypted twice. In order to > recover data from our backups, an attacker would have to physically > acquire a tape, then steal or brute-force 2 different encryption > keys before they could access the data. OK, so there would be two vectors here, each entity would have their own secret they use the encrypt the backup. Breaking the backup requires the collusion (remember security is normally much easier to break from the inside) or compromise of both entities. The question then becomes what happens with the backup before it's been encrypted by both entities and how can you ensure that one can't compromise the entire system? Your trust would therefore be that at most one entity will be compromised. Once you've said that you can ask yourself whether this is appropriate. Saying that you're encrypting data twice is, from a security perspective, vacuous because you're leaving open the possibility of encrypting it twice in series on the same server and hence an attacker just needs to compromise that one server to recover both keys and break the system. If you're defining this single server as trusted then you're assuming you're capable of protecting this server from attacks. Other definitions would partition the server into parts (say different user accounts, and then you need to worry about privilege escalation) and say that some parts are trusted. The "root" user is by definition a trusted entity in any Unix system; Microsoft tried to break this assumption in Windows but I think they gave up in the end. The "root" user will normally carry more authority[1] than normal users, but it should not need absolute authority as is does in conventional operating systems. > In the end, the only person that should be trusted with the data are > the users who own the data. That person will explicitly grant access > to their data to program administrators, the software will simply > facilitate the process. OK, but more definitions are required to understand what you mean by this statement. You don't need to send them all here, but if I was designing something similar to what you are I'd want to know what I was up against. > We put as many layers as we can everywhere. Usually this is limited > when we start hitting performance issues and have to remove a layer > to keep performance where it needs to be. The goal is to have as > many layers as possible while keeping the system as performant as > the client expects. I, personally, wouldn't recommend doing things this way. I'd define where you put your trust and design around that. You'll get a much stronger system and you'll know where to invest time in protecting it because you know what you trust and if those items are compromised everything falls down. If you don't know what you're trusting you don't know where to expend the effort. > We only get one shot at this. If there's a data leak, a lot of > people are going to be very upset and we're going to be out of > business, so we're implementing the tightest security possible > at every layer. This thread is only one part of the overall > process as it specifically relates to the database layer. Yes, that sounds reasonable and to be expected. Hope that's all somewhat helpful! -- Sam http://samason.me.uk/ [1] I'm using "authority" in a technical sense meaning the set of actions that an entity can directly, or indirectly by talking to other actor in the system, cause to occur. Strictly speaking from a security perspective there is no difference between a normal user and root in UNIX system because of the presence of various privilege escalating mechanisms (i.e. set suid bit on executables). In practise it's OK, but not safe, to assume that the system is configured in a way that this doesn't occur and any occurrences are "bugs".