Re: Anonymized database dumps - Mailing list pgsql-general

From Bill Moran
Subject Re: Anonymized database dumps
Date
Msg-id 20120319175527.5eb99af57c91b932f561f97a@potentialtech.com
Whole thread Raw
In response to Re: Anonymized database dumps  (Kiriakos Georgiou <kg.postgresql@olympiakos.com>)
Responses Re: Anonymized database dumps
List pgsql-general
In response to Kiriakos Georgiou <kg.postgresql@olympiakos.com>:

> The data anonymizer process is flawed because you are one misstep away from data spillage.

In our case, it's only one layer.

Other layers that exist:
* The systems where this test data is instantiated can't send email
* The systems where this exist have limited access (i.e., not all
  developers can access it, and it's not used for typical testing --
  only for specific testing that requires production-like data)

You are correct, however, in that there's always the danger of
spillage if new sensitive data is added and the sanitation script
is not properly updated.  It's part of the ongoing overhead of
maintaining such a system.

> Sensitive data should be stored encrypted to begin.  For test databases you or your developers can invoke a process
thatreplaces the real encrypted data with fake encrypted data (for which everybody has the key/password.)  Or if the
overheadis too much (ie billions of rows), you can have different decrypt() routines on your test databases that return
fakedata without touching the real encrypted columns. 

The thing is, this process has the same potential data spillage
issues as sanitizing the data.  I find it intriguing, however, and
I'm going to see if there are places where this approach might
have advantages over our current one.

Since much of our sensitive data is already de-identified, it
provides an additional level of protection on that level as well.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

pgsql-general by date:

Previous
From: Kiriakos Georgiou
Date:
Subject: Re: Anonymized database dumps
Next
From: Jeff Davis
Date:
Subject: Re: pg_upgrade + streaming replication ?