Re: Run statements before pg_dump in same transaction? - Mailing list pgsql-general

From Greg Sabino Mullane
Subject Re: Run statements before pg_dump in same transaction?
Date
Msg-id 8a622e83cd86f2d35a79f4f8d5c65cd8@biglumber.com
Whole thread Raw
In response to [GENERAL] Run statements before pg_dump in same transaction?  (François Beausoleil <francois@teksol.info>)
List pgsql-general
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


François Beausoleil asked:
> To that end, we'd like to anonymize the data before it leaves the database server.
>
> One solution we thought of would be to run statements prior to pg_dump, but within
> the same transaction, something like this:
>
> BEGIN;
> UPDATE users SET email = 'dev+' || id || '@example.com', password_hash = '/* hash of "password" */', ...;
> -- launch pg_dump as usual, ensuring a ROLLBACK at the end
...
> Is there a ready-made solution for this?

No - at least not with generating a dump and scrubbing *before* the data comes
out. Some other ideas:

* Periodically do a full dump to another database under your control, sanitize the
data, and make all dev dumps come from *that* database. Process roughly becomes:

* pg_dump herokudb | psql dev_temp
* <sanitize dev_temp>
* drop existing dev_old; rename devdb to dev_old; rename dev_temp to devdb
* Devs can pg_dump devdb at will

That still moves your sensitive data to another server though, even temporarily.
Another approach is to use the -T flag of pg_dump to exclude certain tables.
Make modified copies of them on the server, then rename them after the dump
(or simply put them in a new namespace):

* (create a dev.users identical (including indexes, etc.) to public.users)
* truncate table dev.users;
* insert into dev.users select * from public.users;
* update dev.users set email = 'dev' + || ...etc. <e.g. sanitize data>
* pg_dump -d <herokus DATABASE_URL> -T public.users > devs_use_this.pg

Then dev could do: set schema = dev, public;

Or you could simply move the sanitized table back:

alter table dev.users set schema public;

If you are going to rename, it may be simpler to not make an identical copy
of the affected tables (i.e. with indexes) but just a data-only copy:

create table dev.users as select * from public.users;
<sanitize dev.users>
pg_dump herokudb --schema-only | psql devdb
pg_dump herokudb --data-only -T public.users | psql devdb
psql devdb -c 'insert into public.users select * from dev.users'


- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201703240911
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAljVHHsACgkQvJuQZxSWSshUbgCg7TzCkAzT4wKoKd5/2rruzLte
TJcAoI7AvGdGzlNp5b3N+LFJ9DWIZ8/C
=7heB
-----END PGP SIGNATURE-----




pgsql-general by date:

Previous
From: lin
Date:
Subject: [GENERAL] postgres source code function "internal_ping" may be not right insome conditions
Next
From: Tom Lane
Date:
Subject: Re: postgres source code function "internal_ping" may be not right in some conditions