Thread: Fastest way to duplicate a quite large database
Hi! I have a database "Customer" with about 60Gb of data. I know I can backup and restore, but this seems too slow. Is there any other option to duplicate this database as "CustomerTest" as fast as possible (even fastar than backup/restore) - better if in one operation (something like "copy database A to B")? I would like to run this everyday, overnight, with minimal impact to prepare a test environment based on production data. Thanks, -- Atenciosamente, Edson Carlos Ericksson Richter
On 4/12/2016 7:25 AM, Edson Richter wrote: > > I have a database "Customer" with about 60Gb of data. > I know I can backup and restore, but this seems too slow. > > Is there any other option to duplicate this database as "CustomerTest" > as fast as possible (even fastar than backup/restore) - better if in > one operation (something like "copy database A to B")? > I would like to run this everyday, overnight, with minimal impact to > prepare a test environment based on production data. copy to the same machine, or copy to a different test server? different answers. -- john r pierce, recycling bits in santa cruz
Same machine, same cluster - just different database name. Atenciosamente, Edson Carlos Ericksson Richter Em 12/04/2016 11:46, John R Pierce escreveu: > On 4/12/2016 7:25 AM, Edson Richter wrote: >> >> I have a database "Customer" with about 60Gb of data. >> I know I can backup and restore, but this seems too slow. >> >> Is there any other option to duplicate this database as >> "CustomerTest" as fast as possible (even fastar than backup/restore) >> - better if in one operation (something like "copy database A to B")? >> I would like to run this everyday, overnight, with minimal impact to >> prepare a test environment based on production data. > > > copy to the same machine, or copy to a different test server? > different answers. > > >
Hi!
I have a database "Customer" with about 60Gb of data.
I know I can backup and restore, but this seems too slow.
Is there any other option to duplicate this database as "CustomerTest" as fast as possible (even fastar than backup/restore) - better if in one operation (something like "copy database A to B")?
I would like to run this everyday, overnight, with minimal impact to prepare a test environment based on production data.
Hum, I don't know exactly how to do it, but on Linux, you could put the "Customer" database in a tablespace which resides on a BTRFS filesystem. BTRFS can do a quick "snapshot" of the filesystem and you can then set things for "incremental backup", as talked about here: https://btrfs.wiki.kernel.org/index.php/Incremental_Backup . From some reading, btrfs is a performance dog compared to others.
interesting take using various filesystems for PostgreSQL: http://www.slideshare.net/fuzzycz/postgresql-on-ext4-xfs-btrfs-and-zfs
another on btrfs + PostgreSQL http://www.cybertec.at/2015/01/forking-databases-the-art-of-copying-without-copying/
<quote from above>
...
So we managed to take fork a 15 GB database in 6 seconds with only a small hiccup in performance. We are ready to start up the forked database.
...
<quote/>
I got a number of hits searching on "postgresql btrfs" using Google search.
Thanks,
--
Atenciosamente,
Edson Carlos Ericksson Richter
How many surrealists does it take to screw in a lightbulb? One to hold the griffon and one to fill the bathtub with brightly colored LEDs.
Maranatha! <><
John McKown
> On Apr 12, 2016, at 10:51 AM, Edson Richter <edsonrichter@hotmail.com> wrote: > > Same machine, same cluster - just different database name. > > Atenciosamente, > > Edson Carlos Ericksson Richter > > Em 12/04/2016 11:46, John R Pierce escreveu: >> On 4/12/2016 7:25 AM, Edson Richter wrote: >>> >>> I have a database "Customer" with about 60Gb of data. >>> I know I can backup and restore, but this seems too slow. >>> >>> Is there any other option to duplicate this database as "CustomerTest" as fast as possible (even fastar than backup/restore)- better if in one operation (something like "copy database A to B")? >>> I would like to run this everyday, overnight, with minimal impact to prepare a test environment based on production data. >> >> >> copy to the same machine, or copy to a different test server? different answers. >> >> >> > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > Not sure how fast is “fast” for your system. You could try: create database customer_test with template customer; I’m able to duplicate a 20GB in a couple minutes with the above command. Couple caveats: 1. No active connections to customer are allowed during the create. 2. You’ll likely have to recreate the search_path and reissue connect grants to the newly created database.
On 04/12/2016 07:51 AM, Edson Richter wrote: > Same machine, same cluster - just different database name. Hmm, running tests against the same cluster you are running the production database would seem to be a performance hit against the production database and potentially dangerous should the tests trip a bug that crashes the server. > > Atenciosamente, > > Edson Carlos Ericksson Richter > > Em 12/04/2016 11:46, John R Pierce escreveu: >> On 4/12/2016 7:25 AM, Edson Richter wrote: >>> >>> I have a database "Customer" with about 60Gb of data. >>> I know I can backup and restore, but this seems too slow. >>> >>> Is there any other option to duplicate this database as >>> "CustomerTest" as fast as possible (even fastar than backup/restore) >>> - better if in one operation (something like "copy database A to B")? >>> I would like to run this everyday, overnight, with minimal impact to >>> prepare a test environment based on production data. >> >> >> copy to the same machine, or copy to a different test server? >> different answers. >> >> >> > > > -- Adrian Klaver adrian.klaver@aklaver.com
On 4/12/2016 7:55 AM, John McKown wrote: > Hum, I don't know exactly how to do it, but on Linux, you could put > the "Customer" database in a tablespace which resides on a BTRFS > filesystem. BTRFS can do a quick "snapshot" of the filesystem.... except, tablespaces aren't standalone, and there's no provision for importing the contents of the tablespace. all the metadata remains in the default tablespace, which leaves all sorts of room for problems if you do this. the /best/ way to achieve what the OP is asking for would likely be to run the tests on a seperate server (or at least seperate postgres instance aka cluster), and use pg_basebackup to rebuild this test instance. -- john r pierce, recycling bits in santa cruz
On 4/12/2016 7:55 AM, John McKown wrote:Hum, I don't know exactly how to do it, but on Linux, you could put the "Customer" database in a tablespace which resides on a BTRFS filesystem. BTRFS can do a quick "snapshot" of the filesystem....
except, tablespaces aren't standalone, and there's no provision for importing the contents of the tablespace. all the metadata remains in the default tablespace, which leaves all sorts of room for problems if you do this.
the /best/ way to achieve what the OP is asking for would likely be to run the tests on a seperate server (or at least seperate postgres instance aka cluster), and use pg_basebackup to rebuild this test instance.
Good point. The reference, http://www.cybertec.at/2015/01/forking-databases-the-art-of-copying-without-copying/ , actually uses btrfs to make a copy using btrfs facilities, which is then used in a _different_ PostgreSQL server instance on the same machine. Actually, the data base _name_ would stay "Customer", but would reside in a separate PostgreSQL server's PGDATA area instead of the production server's PGDATA. Basically, any productions program or scripts could be used, as is, with the test data simply by using the test server's port or unix socket instead of the production server's. Not quite as good as being on a separate Linux image, but not too shabby either. IMO.
--
john r pierce, recycling bits in santa cruz
How many surrealists does it take to screw in a lightbulb? One to hold the griffon and one to fill the bathtub with brightly colored LEDs.
Maranatha! <><
John McKown
> On Apr 12, 2016, at 11:14 AM, John R Pierce <pierce@hogranch.com> wrote: > > On 4/12/2016 7:55 AM, John McKown wrote: >> Hum, I don't know exactly how to do it, but on Linux, you could put the "Customer" database in a tablespace which resideson a BTRFS filesystem. BTRFS can do a quick "snapshot" of the filesystem.... > > except, tablespaces aren't standalone, and there's no provision for importing the contents of the tablespace. all themetadata remains in the default tablespace, which leaves all sorts of room for problems if you do this. > > the /best/ way to achieve what the OP is asking for would likely be to run the tests on a seperate server (or at leastseperate postgres instance aka cluster), and use pg_basebackup to rebuild this test instance. > > > > > -- > john r pierce, recycling bits in santa cruz > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > I agree with John’s post. I should have mentioned that my template database is never production. It’s an obfuscated copyof the production data on separate hardware. I use the "create with template” to spin up copies for developers/testersto provide a representative data set (not identical to production). And, since the create doesn’t copytable statistics, I have to kick off a post-copy background process to gather them: nohup vacuumdb --analyze-only --quiet --dbname=${DATABASE} &>/dev/null & Still, with all that, users are still able to drop and recreate a test database within a coffee break.