Re: [GENERAL] Run statements before pg_dump in same transaction? - Mailing list pgsql-general

From Adrian Klaver
Subject Re: [GENERAL] Run statements before pg_dump in same transaction?
Date
Msg-id 8d8a3652-032a-b89d-40ca-c5f7c7ab44b6@aklaver.com
Whole thread Raw
In response to [GENERAL] Run statements before pg_dump in same transaction?  (François Beausoleil <francois@teksol.info>)
List pgsql-general
On 03/23/2017 12:06 PM, François Beausoleil wrote:
> Hi all!
>
> For development purposes, we dump the production database to local. It's fine because the DB is small enough. The
company'sgrowing and we want to reduce risks. To that end, we'd like to anonymize the data before it leaves the
databaseserver. 
>
> One solution we thought of would be to run statements prior to pg_dump, but within the same transaction, something
likethis: 
>
> BEGIN;
> UPDATE users SET email = 'dev+' || id || '@example.com', password_hash = '/* hash of "password" */', ...;
> -- launch pg_dump as usual, ensuring a ROLLBACK at the end
> -- pg_dump must run with the *same* connection, obviously
>
> -- if not already done by pg_dump
> ROLLBACK;
>
> Is there a ready-made solution for this? Our DB is hosted on Heroku, and we don't have 100% flexibility in how we
dump.

The only thing I could find is:

https://devcenter.heroku.com/articles/heroku-postgres-backups

Direct database-to-database copies

So copy your production server to a second server on Heroku, anonymize
the data on the second server and then dump that data.

>
> I searched for "postgresql anonymize data dump before download"[1] and variations, but I didn't see anything highly
relevant.
>
> Thanks!
> François
>
> PS: Cross-posted to http://dba.stackexchange.com/q/168023/3935
>
>   [1]: https://duckduckgo.com/?q=postgresql+anonymize+data+dump+before+download
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: [GENERAL] audit function and old.column
Next
From: John DeSoi
Date:
Subject: Re: [GENERAL] pg_last_xact_replay_timestamp() sometimes reports unlikely, very large delays