Re: Anonymized database dumps - Mailing list pgsql-general

From Janning Vygen
Subject Re: Anonymized database dumps
Date
Msg-id 4F66E9CD.80702@kicktipp.de
Whole thread Raw
In response to Re: Anonymized database dumps  (Kiriakos Georgiou <kg.postgresql@olympiakos.com>)
List pgsql-general
pgcrypto does not work for this scenario as far as i know.

pgcrypto enables me to encrypt my data and let only a user with the
right password (or key or whatever) decrypt it, right? So if i run it in
a test environment without this password the application is broken.

I still want to use these table columns in my test environment but
instead of real email addresses i want addresses like
random_number@example.org.

You might be right that it is a good idea to additional encrypt this data.

regards
Janning

Am 19.03.2012 06:24, schrieb Kiriakos Georgiou:
> I would store sensitive data encrypted in the database.  Check the pgcrypto module.
>
> Kiriakos
>
>
> On Mar 18, 2012, at 1:00 PM, Janning Vygen wrote:
>
>> Hi,
>>
>> I am working on postgresql 9.1 and loving it!
>>
>> Sometimes we need a full database dump to test some performance issues with real data.
>>
>> Of course we don't like to have sensible data like bunches of e-mail addresses on our development machines as they
areof no interest for developers and should be kept secure. 
>>
>> So we need an anonymized database dump. I thought about a few ways to achieve this.
>>
>> 1. Best solution would be a special db user and some rules which fire on reading some tables and replace privacy
datawith some random data. Now doing a dump as this special user doesn't even copy the sensible data at all. The user
justhas a different view on this database even when he calls pg_dump. 
>>
>> But as rules are not fired on COPY it can't work, right?
>>
>> 2. The other solution I can think of is something like
>>
>> pg_dump | sed>  pgdump_anon
>>
>> where 'sed' does a lot of magical replace operations on the content of the dump. I don't think this is going to work
reliable.
>>
>> 3. More reliable would be to dump the database, restore it on a different server, run some sql script which
randomizesome data, and dump it again. hmm, seems to be the only reliable way so far. But it is no fun when dumping and
restoringtakes an hour. 
>>
>> Does anybody has a better idea how to achieve an anonymized database dump?
>>
>> regards
>> Janning
>>
>>
>>
>>
>>
>> --
>> Kicktipp GmbH
>>
>> Venloer Straße 8, 40477 Düsseldorf
>> Sitz der Gesellschaft: Düsseldorf
>> Geschäftsführung: Janning Vygen
>> Handelsregister Düsseldorf: HRB 55639
>>
>> http://www.kicktipp.de/
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>

--
Kicktipp GmbH

Venloer Straße 8, 40477 Düsseldorf
Sitz der Gesellschaft: Düsseldorf
Geschäftsführung: Janning Vygen
Handelsregister Düsseldorf: HRB 55639

http://www.kicktipp.de/

pgsql-general by date:

Previous
From: Kiriakos Georgiou
Date:
Subject: Re: How to isolate the result of SELECT's?
Next
From: "Albe Laurenz"
Date:
Subject: Re: WHERE IN (subselect) versus WHERE IN (1,2,3,)