Thread: automated refresh of dev from prod

automated refresh of dev from prod

From
Julie Nishimura
Date:
Hello everybody, I am new to postgresql environment, but trying to get up to speed.
Can you please share your experience on how you can automate refreshment of dev environment on regular basis (desirably weekly), taking for consideration some of prod dbs can be very large (like 20+ TB

Any suggestions?

Thank you!

RE: automated refresh of dev from prod

From
Scot Kreienkamp
Date:

My method is complex and not so good for newbies, but it is incredibly fast and should scale to almost any size database.  Mine are not nearly as large though.

 

I use two methods… the normal backup/restore for longer lived development environments, and for shorter lived environments I use postgres native mirroring from a secondary prod server to all my dev environments, then use LVM snapshots to take a snapshot of the postgres mount.  Mount the snapshot and startup a second postgres instance in it and you have a mirror of production ready for use.  That only lasts for a finite amount of time though (until you fill the space dedicated to the snapshot) before it becomes unusable,

that’s the downside… it can’t be long lived (hours, 1-2 days maybe).  The upside is that the refresh from production in my environment for a 400G database is 3 seconds.  It is a trade-off and not fit for every use, that’s why we also use the traditional backup/restore in some cases.

 

 

Scot Kreienkamp |Senior Systems Engineer | La-Z-Boy Corporate
One La-Z-Boy Drive| Monroe, Michigan 48162 | Office: 734-384-6403 | | Mobile: 7349151444 | Email: Scot.Kreienkamp@la-z-boy.com

From: Julie Nishimura [mailto:juliezain@hotmail.com]
Sent: Wednesday, February 27, 2019 4:16 PM
To: pgsql-general@postgresql.org
Subject: automated refresh of dev from prod

 


ATTENTION:   This email was sent to La-Z-Boy from an external source.     Be vigilant when opening attachments or clicking links.

Hello everybody, I am new to postgresql environment, but trying to get up to speed.

Can you please share your experience on how you can automate refreshment of dev environment on regular basis (desirably weekly), taking for consideration some of prod dbs can be very large (like 20+ TB

 

Any suggestions?

 

Thank you!

This message is intended only for the individual or entity to which it is addressed.  It may contain privileged, confidential information which is exempt from disclosure under applicable laws.  If you are not the intended recipient, you are strictly prohibited from disseminating or distributing this information (other than to the intended recipient) or copying this information.  If you have received this communication in error, please notify us immediately by e-mail or by telephone at the above number. Thank you.

Re: automated refresh of dev from prod

From
Ron
Date:
On 2/27/19 3:15 PM, Julie Nishimura wrote:
P {margin-top:0;margin-bottom:0;}
Hello everybody, I am new to postgresql environment, but trying to get up to speed.
Can you please share your experience on how you can automate refreshment of dev environment on regular basis (desirably weekly), taking for consideration some of prod dbs can be very large (like 20+ TB

Any suggestions?

Weekly refreshes of 20TB from prod to dev seems a bit excessive.


--
Angular momentum makes the world go 'round.

Re: automated refresh of dev from prod

From
Stephen Frost
Date:
Greetings,

* Julie Nishimura (juliezain@hotmail.com) wrote:
> Hello everybody, I am new to postgresql environment, but trying to get up to speed.
> Can you please share your experience on how you can automate refreshment of dev environment on regular basis
(desirablyweekly), taking for consideration some of prod dbs can be very large (like 20+ TB 
>
> Any suggestions?

The approach that I like to recommend is to have your backup/restore
solution be involved in this refreshing process, so that you're also
testing that your backup/restore process works correctly.  For dealing
with larger databases, using a backup tool which has parallel backup,
parallel restore, and is able to restore just the files which are
different from the backup can make the restore take much less time (this
is what the 'delta-restore' option in pgbackrest does, and it was
specifically written to support exactly this kind of prod->dev periodic
refresh, though other tools may also support that these days).

As mentioned elsewhere on this thread, using snapshots can also be a
good approach though you have to be sure that the snapshot is completely
atomic across all filesystems that PostgreSQL is using, or you have to
deal with running pg_start/stop_backup and putting a backup_label into
place for the restored snapshot and a recovery.conf to provide a way for
PG to get at any WAL which was generated while the snapshot (or
snapshots) was being taken.

Thanks!

Stephen

Attachment

Re: automated refresh of dev from prod

From
Ben Chobot
Date:
On Feb 28, 2019, at 8:04 AM, Stephen Frost <sfrost@snowman.net> wrote:
>
> Greetings,
>
> * Julie Nishimura (juliezain@hotmail.com) wrote:
>> Hello everybody, I am new to postgresql environment, but trying to get up to speed.
>> Can you please share your experience on how you can automate refreshment of dev environment on regular basis
(desirablyweekly), taking for consideration some of prod dbs can be very large (like 20+ TB 
>>
>> Any suggestions?
>
> The approach that I like to recommend is to have your backup/restore
> solution be involved in this refreshing process, so that you're also
> testing that your backup/restore process works correctly.  For dealing
> with larger databases, using a backup tool which has parallel backup,
> parallel restore, and is able to restore just the files which are
> different from the backup can make the restore take much less time (this
> is what the 'delta-restore' option in pgbackrest does, and it was
> specifically written to support exactly this kind of prod->dev periodic
> refresh, though other tools may also support that these days).
>
> As mentioned elsewhere on this thread, using snapshots can also be a
> good approach though you have to be sure that the snapshot is completely
> atomic across all filesystems that PostgreSQL is using, or you have to
> deal with running pg_start/stop_backup and putting a backup_label into
> place for the restored snapshot and a recovery.conf to provide a way for
> PG to get at any WAL which was generated while the snapshot (or
> snapshots) was being taken.

Very much yes to everything Stephen says. Regularly refreshing nonprod via your normal backup/restore process is an
efficientway to test your backups, and snapshots are a great way to do backups when your data volume is greater than
yourchurn between backups. (And at 20+ TB, I hope that's the case for you.)