Thread: Large Database Restore
I am restoring a 51GB backup file that has been running for almost 26 hours. There have been no errors and things are still working. I have turned fsync off, but that still did not speed things up. Can anyone provide me with the optimal settings for restoring a large database?
Thanks in advance!
Lee
On Thu, May 17, 2007 at 08:19:08AM -0500, Lee Keel wrote: > I am restoring a 51GB backup file that has been running for almost 26 hours. > There have been no errors and things are still working. I have turned fsync > off, but that still did not speed things up. Can anyone provide me with the > optimal settings for restoring a large database? Well, fsync=off is the biggest one. The others to look at: maintenance_work_mem = 1GB (I generally have problems setting it over 1GB, and I'm not sure it would add much benefit) checkpoint_timeout = 1 hour checkpoint_segments = huge (you want to reduce the frequency of checkpoints... probably to at least less than every 20 minutes) Finally, if you've got a multi-CPU machine, you might want to build all the indexes and table constraints as a separate step, and run them through perl or something so that you'll utilize more than just one CPU. Of course the downside to all of this is that it would mean starting over from scratch. Ultimately though, once your database gets past a certain size, you really want to be using PITR and not pg_dump as your main recovery strategy. -- Jim Nasby decibel@decibel.org EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 05/17/07 11:04, Jim C. Nasby wrote: [snip] > > Ultimately though, once your database gets past a certain size, you > really want to be using PITR and not pg_dump as your main recovery > strategy. But doesn't that just replay each transaction? It must manage the index nodes during each update/delete/insert, and multiple UPDATE statements means that you hit the same page over and over again. - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGTL+3S9HxQb37XmcRAqGyAKDYxtahXCuZD0WkNV8fY8p48Wcn2gCgk3hQ ExVOZQBDuVVafTqB1XD/Gno= =6Pzi -----END PGP SIGNATURE-----
Yes, but the implication is that large databases probably don't update every row between backup periods. On Thu, 17 May 2007, Ron Johnson wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 05/17/07 11:04, Jim C. Nasby wrote: > [snip] >> >> Ultimately though, once your database gets past a certain size, you >> really want to be using PITR and not pg_dump as your main recovery >> strategy. > > But doesn't that just replay each transaction? It must manage the > index nodes during each update/delete/insert, and multiple UPDATE > statements means that you hit the same page over and over again. > > - -- > Ron Johnson, Jr. > Jefferson LA USA > > Give a man a fish, and he eats for a day. > Hit him with a fish, and he goes away for good! > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.6 (GNU/Linux) > > iD8DBQFGTL+3S9HxQb37XmcRAqGyAKDYxtahXCuZD0WkNV8fY8p48Wcn2gCgk3hQ > ExVOZQBDuVVafTqB1XD/Gno= > =6Pzi > -----END PGP SIGNATURE----- > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend >
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Yes, but that's not always a valid assumption. And still PITR must update the index at each "insert", which is much slower than the "bulk load then create index" of pg_dump. On 05/17/07 16:01, Ben wrote: > Yes, but the implication is that large databases probably don't update > every row between backup periods. > > On Thu, 17 May 2007, Ron Johnson wrote: > > On 05/17/07 11:04, Jim C. Nasby wrote: > [snip] >>>> >>>> Ultimately though, once your database gets past a certain size, you >>>> really want to be using PITR and not pg_dump as your main recovery >>>> strategy. > > But doesn't that just replay each transaction? It must manage the > index nodes during each update/delete/insert, and multiple UPDATE > statements means that you hit the same page over and over again. - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGTMZGS9HxQb37XmcRAuMPAKCMfQxwJhGoVVKw/VGM4rai7pBnTwCgliwc CfnCseBnXep4prffuqnQPNc= =xE0J -----END PGP SIGNATURE-----
--
Mike Nolan
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 05/17/07 16:49, Michael Nolan wrote: > I don't know if my database is typical (as there probably is no such > thing), but to restore a full dump (pg_dumpall) takes over 4 hours on my > backup server, but to restore a low level backup (about 35GB) Meaning a tarball of $PGDATA? > and then > process 145 WAL files (since Tuesday morning when the last low level > backup was run) took me around 2 hours today - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGTN0BS9HxQb37XmcRArY4AKCzTzoe2XP1COyjpdWD+wGp5ACYoACfd9fx 336CjyTBXkgu/lXZewagcOA= =+/58 -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 05/17/07 16:49, Michael Nolan wrote:
> I don't know if my database is typical (as there probably is no such
> thing), but to restore a full dump (pg_dumpall) takes over 4 hours on my
> backup server, but to restore a low level backup (about 35GB)
Meaning a tarball of $PGDATA?
It works very well, though I still don't understand why, if there are no changes to the warm standby server tables, only queries, it isn't possible to keep restoring WAL files to keep the warm standby server in parallel with the live server. (I'm guessing there must be SOMETHING that happens at the end of the recovery process, or some time after that, to make the next WAL unprocessable., but I can't figure it out from the docs.)
--
Mike Nolan
Thanks to everyone for their input on this. After reading all the emails and some of the documentation (section 23.3), I think this is all a little more than what I need. My database is basically read-only and all I was looking to do is to be able to take snap-shots of it and be able to restore on a developer's machine and not take 30 hours. So I was thinking I would zip the data directories associated to my database, then the developer could just copy the zip file and unzip in their own data directory. My question now is: what file would a developer need to change to add this new directory to their database list, or will it just automatically show up when they refresh the service?
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Michael Nolan
Sent: Thursday, May 17, 2007 7:03 PM
To: Ron Johnson; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Large Database Restore
On 5/17/07, Ron Johnson <ron.l.johnson@cox.net> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 05/17/07 16:49, Michael Nolan wrote:
> I don't know if my database is typical (as there probably is no such
> thing), but to restore a full dump (pg_dumpall) takes over 4 hours on my
> backup server, but to restore a low level backup (about 35GB)
Meaning a tarball of $PGDATA?
Actually, it's two different areas because I've got a second tablespace on a separate physical drive for indexes, but yes, it's a tarball of all the files that PG uses, following the procedures in section 23.3 of the documentation.
It works very well, though I still don't understand why, if there are no changes to the warm standby server tables, only queries, it isn't possible to keep restoring WAL files to keep the warm standby server in parallel with the live server. (I'm guessing there must be SOMETHING that happens at the end of the recovery process, or some time after that, to make the next WAL unprocessable., but I can't figure it out from the docs.)
--
Mike Nolan
Lee Keel wrote: > Thanks to everyone for their input on this. After reading all the emails > and some of the documentation (section 23.3), I think this is all a little > more than what I need. My database is basically read-only and all I was > looking to do is to be able to take snap-shots of it and be able to restore > on a developer's machine and not take 30 hours. So I was thinking I would > zip the data directories associated to my database, then the developer could > just copy the zip file and unzip in their own data directory. My question > now is: what file would a developer need to change to add this new directory > to their database list, or will it just automatically show up when they > refresh the service? You can't do a file-level backup of one database I'm afraid, only all databases in one installation. -- Richard Huxton Archonet Ltd
So then the best way to do this kind of backup\restore is to use pg_dump? Is there any plan in the future to be able to do some sort of file-level backup like SqlServer? -LK -----Original Message----- From: Richard Huxton [mailto:dev@archonet.com] Sent: Friday, May 18, 2007 10:35 AM To: Lee Keel Cc: Michael Nolan; Ron Johnson; pgsql-general@postgresql.org Subject: Re: [GENERAL] Large Database Restore Lee Keel wrote: > Thanks to everyone for their input on this. After reading all the emails > and some of the documentation (section 23.3), I think this is all a little > more than what I need. My database is basically read-only and all I was > looking to do is to be able to take snap-shots of it and be able to restore > on a developer's machine and not take 30 hours. So I was thinking I would > zip the data directories associated to my database, then the developer could > just copy the zip file and unzip in their own data directory. My question > now is: what file would a developer need to change to add this new directory > to their database list, or will it just automatically show up when they > refresh the service? You can't do a file-level backup of one database I'm afraid, only all databases in one installation. -- Richard Huxton Archonet Ltd This email and any files transmitted with it are confidential and intended solely for the use of the individual or entityto whom they are addressed. If you have received this email in error please notify the sender. This message containsconfidential information and is intended only for the individual named. If you are not the named addressee you shouldnot disseminate, distribute or copy this e-mail.
Lee Keel escribió: > So then the best way to do this kind of backup\restore is to use pg_dump? > Is there any plan in the future to be able to do some sort of file-level > backup like SqlServer? Actually you can do single databases, but you must also include some other directories besides the database directory. You would need to include everything, excluding things in the "base" directory, but not exclude your databases directory in "base". -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote: > Lee Keel escribió: >> So then the best way to do this kind of backup\restore is to use pg_dump? >> Is there any plan in the future to be able to do some sort of file-level >> backup like SqlServer? > > Actually you can do single databases, but you must also include some > other directories besides the database directory. You would need to > include everything, excluding things in the "base" directory, but not > exclude your databases directory in "base". Will that not cause trouble if you've added users though? -- Richard Huxton Archonet Ltd
Richard Huxton escribió: > Alvaro Herrera wrote: > >Lee Keel escribió: > >>So then the best way to do this kind of backup\restore is to use pg_dump? > >>Is there any plan in the future to be able to do some sort of file-level > >>backup like SqlServer? > > > >Actually you can do single databases, but you must also include some > >other directories besides the database directory. You would need to > >include everything, excluding things in the "base" directory, but not > >exclude your databases directory in "base". > > Will that not cause trouble if you've added users though? Huh, maybe it will, but then I don't see how. Can you be more specific? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera wrote: > Richard Huxton escribió: >> Alvaro Herrera wrote: >>> Lee Keel escribió: >>>> So then the best way to do this kind of backup\restore is to use pg_dump? >>>> Is there any plan in the future to be able to do some sort of file-level >>>> backup like SqlServer? >>> Actually you can do single databases, but you must also include some >>> other directories besides the database directory. You would need to >>> include everything, excluding things in the "base" directory, but not >>> exclude your databases directory in "base". >> Will that not cause trouble if you've added users though? > > Huh, maybe it will, but then I don't see how. Can you be more specific? Well, user info is stored in "global" (as is the list of databases, and it looks like language handler definitions too). Run strings over the files and you'll see. -- Richard Huxton Archonet Ltd
Richard Huxton escribió: > Alvaro Herrera wrote: > >Richard Huxton escribió: > >>Alvaro Herrera wrote: > >>>Lee Keel escribió: > >>>>So then the best way to do this kind of backup\restore is to use > >>>>pg_dump? > >>>>Is there any plan in the future to be able to do some sort of file-level > >>>>backup like SqlServer? > >>>Actually you can do single databases, but you must also include some > >>>other directories besides the database directory. You would need to > >>>include everything, excluding things in the "base" directory, but not > >>>exclude your databases directory in "base". > >>Will that not cause trouble if you've added users though? > > > >Huh, maybe it will, but then I don't see how. Can you be more specific? > > Well, user info is stored in "global" (as is the list of databases, and > it looks like language handler definitions too). Run strings over the > files and you'll see. Oh, the fear is that you would overwrite the new user files with the data from the backup. Yeah, that's correct, you would. Also on restore you would overwrite the pg_xlog and pg_clog areas and the control file, which while good for your newly restored database, would render the other databases corrupted. Absolutely true. The only way I see to make this work would be to freeze the involved database (with vacuum freeze), then stop the postmaster cleanly, then make the tarball of just the DB directory. But then, if shared catalog state changed between the backup and the restore (say, because you create a table, which gets registered in the shared catalog pg_shdepend), said changes would not be rolled back either leading to possible problems later on. I hadn't noticed this was so difficult to do! (You can solve the pg_shdepend problem by copying that database's entries elsewhere and then restoring them along the tarball, but I'm not sure it's workable for the other shared catalogs). A lot more involved that you probably want anyway. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Lee Keel wrote: > So then the best way to do this kind of backup\restore is to use pg_dump? > Is there any plan in the future to be able to do some sort of file-level > backup like SqlServer? Oh you *can* do a file-level backup, but only of the entire cluster. If you have information shared between databases (users, transaction IDs etc) then I don't see any way for single-database file-copying to work. -- Richard Huxton Archonet Ltd