Re: Anonymized database dumps - Mailing list pgsql-general

From Marko Kreen
Subject Re: Anonymized database dumps
Date
Msg-id 20120319094116.GA14674@gmail.com
Whole thread Raw
In response to Re: Anonymized database dumps  (hari.fuchs@gmail.com)
List pgsql-general
On Mon, Mar 19, 2012 at 10:12:01AM +0100, hari.fuchs@gmail.com wrote:
> Janning Vygen <vygen@kicktipp.de> writes:
> > pgcrypto does not work for this scenario as far as i know.
> >
> > pgcrypto enables me to encrypt my data and let only a user with the
> > right password (or key or whatever) decrypt it, right? So if i run it
> > in a test environment without this password the application is broken.
> >
> > I still want to use these table columns in my test environment but
> > instead of real email addresses i want addresses like
> > random_number@example.org.
> >
> > You might be right that it is a good idea to additional encrypt this data.
>
> Maybe you could change your application so that it doesn't access the
> critical tables directly and instead define views for them which, based
> on current_user, either do decryption or return randim strings.

Encryption is wrong tool for "anonymization".

The right tool is hmac() which gives you one-way hash that
is protected by key, which means other side can't even
calcutate the hashes unless they have same key.

You can calculate it with pgcrypto when dumping,
or later post-processing the dumps.

But it produces random values, if you need something
realistic-looking you need custom mapping logic.

--
marko


pgsql-general by date:

Previous
From: John R Pierce
Date:
Subject: Re: Multi server query
Next
From: Florent THOMAS
Date:
Subject: Re: Multi server query