Hi,
On Sat, Apr 22, 2023 at 12:53:23PM -0400, Isaac Morland wrote:
>
> I have an application for this: creating various dev/test versions of data
> from production.
>
> Start by restoring a copy of production from backup. Then successively
> create several altered versions of the data and save them to a place where
> developers can pick them up. For example, you might have one version which
> has all data old than 1 year deleted, and another where 99% of the
> students/customers/whatever are deleted. Anonymization could also be
> applied. This would give you realistic (because it ultimately originates
> from production) test data.
>
> This could be done by starting a non-committable transaction, making the
> adjustments, then doing a pg_dump in the same transaction (using --snapshot
> to allow it to see that transaction). Then rollback, and repeat for the
> other versions. This saves repeatedly restoring the (probably very large)
> production data each time.
There already are tools to handle those use cases. Looks for instance at
https://github.com/mla/pg_sample to backup a consistent subset of the data, or
https://github.com/rjuju/pg_anonymize to transparently pg_dump (or
interactively query) anonymized data.
Both tool also works when connected on a physical standby, while trying to
update data before dumping them wouldn't.