Re: [FEATURE REQUEST] Encrypted indexes over encrypted data - Mailing list pgsql-hackers

From Bear Giles
Subject Re: [FEATURE REQUEST] Encrypted indexes over encrypted data
Date
Msg-id CALBNtw4c=MRPXfCPeEnFnb44_5pBOBGYFpp7THj0QLKyaSBc9A@mail.gmail.com
Whole thread Raw
In response to Re: [FEATURE REQUEST] Encrypted indexes over encrypted data  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
There are alternatives. If you know what you want to find, e.g., a search by username or email address, you can store a strong hash of the value as an indexed column. By "strong hash" I mean don't just use md5 or sha1, or even one round with a salt. I can give you more details about how and why offline.

So you might have a record with:

   id serial primary key,
   email_hash text not null indexed,
   first_name_hash text indexed,
   last_name_hash text indexed,
   phone_number_hash text indexed ,
   'wallet'_containing_all_encrypted_values text

and that allows you to search on email, first name, last name, or phone number, or some combination on them. But no expressions. The hashing would be done in your app, not the database. You also probably want to convert everything to lowercase, maybe remove spaces, etc., before computing the hash.

You should be prepared to handle multiple matches. It's unlikely that an email or phone number hash won't be unique but it's safest to always be prepared for more than one match, decrypt the 'wallet', and then do a final comparison. That also gives you a bit of protection from an attacker creating an account and then changing the hash values to match someone else. You can use that to support very limited expressions, e.g., also keep a hash on the first three letters of their last name, but that will compromise your security a bit since it allows an attacker to perform some statistical analysis on the data.

Finally there's the general advice that hashes (and encrypted values) should always have a version number of some sort. It could be something as simple as 3$hash, or it could be a composite column or even a user-defined type. The # indicates is a lookup into a table, perhaps in your app, that tells you which hashing algorithm and salt to use. It makes life a lot easier if the security audit tells you that you need to change your cipher/salt/key/whatever but you can't do it immediately since you don't know everything you need in order to do it, e.g., the password that you need in order to recompute the hash value. With that version number it's easy to continue to accept the existing password so they can log in, and in the background you quietly recompute the hash using the new salt/algorithm/whatever and update their record. I've worked for some pretty knowledgeable companies that have overlooked this.

On Thu, Aug 9, 2018 at 6:05 AM, Andres Freund <andres@anarazel.de> wrote:


On August 9, 2018 5:30:26 PM GMT+05:30, Danylo Hlynskyi <abcz2.uprola@gmail.com> wrote:
> ?Is it hard to implement soluition 2?

Yes.

To the point that I'm fairly certain that an implementation would be considered to costly to maintain (vs benefit) of proposed.

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: [HACKERS] proposal - Default namespaces for XPath expressions(PostgreSQL 11)
Next
From: Tom Lane
Date:
Subject: Re: libpq should not look up all host addresses at once