Home > mailing lists

Thread: Large Database Restore

Large Database Restore

From

Lee Keel

Date:

17 May 2007, 10:34:31

I am restoring a 51GB backup file that has been running for almost 26 hours. There have been no errors and things are still working. I have turned fsync off, but that still did not speed things up. Can anyone provide me with the optimal settings for restoring a large database?

Thanks in advance!

Lee

This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail.

Re: Large Database Restore

From

"Jim C. Nasby"

Date:

17 May 2007, 13:04:42

On Thu, May 17, 2007 at 08:19:08AM -0500, Lee Keel wrote:
> I am restoring a 51GB backup file that has been running for almost 26 hours.
> There have been no errors and things are still working.  I have turned fsync
> off, but that still did not speed things up.  Can anyone provide me with the
> optimal settings for restoring a large database?

Well, fsync=off is the biggest one. The others to look at:

maintenance_work_mem = 1GB (I generally have problems setting it over
1GB, and I'm not sure it would add much benefit)
checkpoint_timeout = 1 hour
checkpoint_segments = huge (you want to reduce the frequency of
checkpoints... probably to at least less than every 20 minutes)

Finally, if you've got a multi-CPU machine, you might want to build all
the indexes and table constraints as a separate step, and run them
through perl or something so that you'll utilize more than just one CPU.

Of course the downside to all of this is that it would mean starting
over from scratch.

Ultimately though, once your database gets past a certain size, you
really want to be using PITR and not pg_dump as your main recovery
strategy.
--
Jim Nasby                                      decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Large Database Restore

From

Ron Johnson

Date:

17 May 2007, 17:49:13

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/17/07 11:04, Jim C. Nasby wrote:
[snip]
>
> Ultimately though, once your database gets past a certain size, you
> really want to be using PITR and not pg_dump as your main recovery
> strategy.

But doesn't that just replay each transaction?  It must manage the
index nodes during each update/delete/insert, and multiple UPDATE
statements means that you hit the same page over and over again.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGTL+3S9HxQb37XmcRAqGyAKDYxtahXCuZD0WkNV8fY8p48Wcn2gCgk3hQ
ExVOZQBDuVVafTqB1XD/Gno=
=6Pzi
-----END PGP SIGNATURE-----

Re: Large Database Restore

From

Ben

Date:

17 May 2007, 18:01:16

Yes, but the implication is that large databases probably don't update
every row between backup periods.

On Thu, 17 May 2007, Ron Johnson wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 05/17/07 11:04, Jim C. Nasby wrote:
> [snip]
>>
>> Ultimately though, once your database gets past a certain size, you
>> really want to be using PITR and not pg_dump as your main recovery
>> strategy.
>
> But doesn't that just replay each transaction?  It must manage the
> index nodes during each update/delete/insert, and multiple UPDATE
> statements means that you hit the same page over and over again.
>
> - --
> Ron Johnson, Jr.
> Jefferson LA  USA
>
> Give a man a fish, and he eats for a day.
> Hit him with a fish, and he goes away for good!
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
>
> iD8DBQFGTL+3S9HxQb37XmcRAqGyAKDYxtahXCuZD0WkNV8fY8p48Wcn2gCgk3hQ
> ExVOZQBDuVVafTqB1XD/Gno=
> =6Pzi
> -----END PGP SIGNATURE-----
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

Re: Large Database Restore

From

Ron Johnson

Date:

17 May 2007, 18:17:07

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Yes, but that's not always a valid assumption.

And still PITR must update the index at each "insert", which is much
slower than the "bulk load then create index" of pg_dump.

On 05/17/07 16:01, Ben wrote:
> Yes, but the implication is that large databases probably don't update
> every row between backup periods.
>
> On Thu, 17 May 2007, Ron Johnson wrote:
>
> On 05/17/07 11:04, Jim C. Nasby wrote:
> [snip]
>>>>
>>>> Ultimately though, once your database gets past a certain size, you
>>>> really want to be using PITR and not pg_dump as your main recovery
>>>> strategy.
>
> But doesn't that just replay each transaction?  It must manage the
> index nodes during each update/delete/insert, and multiple UPDATE
> statements means that you hit the same page over and over again.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGTMZGS9HxQb37XmcRAuMPAKCMfQxwJhGoVVKw/VGM4rai7pBnTwCgliwc
CfnCseBnXep4prffuqnQPNc=
=xE0J
-----END PGP SIGNATURE-----

Re: Large Database Restore

From

"Michael Nolan"

Date:

17 May 2007, 18:50:12

I don't know if my database is typical (as there probably is no such thing), but to restore a full dump (pg_dumpall) takes over 4 hours on my backup server, but to restore a low level backup (about 35GB) and then process 145 WAL files (since Tuesday morning when the last low level backup was run) took me around 2 hours today
--
Mike Nolan

Re: Large Database Restore

From

Ron Johnson

Date:

17 May 2007, 19:54:03

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/17/07 16:49, Michael Nolan wrote:
> I don't know if my database is typical (as there probably is no such
> thing), but to restore a full dump (pg_dumpall) takes over 4 hours on my
> backup server, but to restore a low level backup (about 35GB)

Meaning a tarball of $PGDATA?

>                                                               and then
> process 145 WAL files (since Tuesday morning when the last low level
> backup was run) took me around 2 hours today

- --
Ron Johnson, Jr.
Jefferson LA  USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFGTN0BS9HxQb37XmcRArY4AKCzTzoe2XP1COyjpdWD+wGp5ACYoACfd9fx
336CjyTBXkgu/lXZewagcOA=
=+/58
-----END PGP SIGNATURE-----

Re: Large Database Restore

From

"Michael Nolan"

Date:

17 May 2007, 21:03:09

On 5/17/07, Ron Johnson <ron.l.johnson@cox.net> wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/17/07 16:49, Michael Nolan wrote:
> I don't know if my database is typical (as there probably is no such
> thing), but to restore a full dump (pg_dumpall) takes over 4 hours on my
> backup server, but to restore a low level backup (about 35GB)

Meaning a tarball of $PGDATA?

Actually, it's two different areas because I've got a second tablespace on a separate physical drive for indexes, but yes, it's a tarball of all the files that PG uses, following the procedures in section 23.3 of the documentation.

It works very well, though I still don't understand why, if there are no changes to the warm standby server tables, only queries, it isn't possible to keep restoring WAL files to keep the warm standby server in parallel with the live server. (I'm guessing there must be SOMETHING that happens at the end of the recovery process, or some time after that, to make the next WAL unprocessable., but I can't figure it out from the docs.)
--
Mike Nolan

Re: Large Database Restore

From

Lee Keel

Date:

18 May 2007, 12:10:21

Thanks to everyone for their input on this. After reading all the emails and some of the documentation (section 23.3), I think this is all a little more than what I need. My database is basically read-only and all I was looking to do is to be able to take snap-shots of it and be able to restore on a developer's machine and not take 30 hours. So I was thinking I would zip the data directories associated to my database, then the developer could just copy the zip file and unzip in their own data directory. My question now is: what file would a developer need to change to add this new directory to their database list, or will it just automatically show up when they refresh the service?

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Michael Nolan
Sent: Thursday, May 17, 2007 7:03 PM
To: Ron Johnson; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Large Database Restore

On 5/17/07, Ron Johnson <ron.l.johnson@cox.net> wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/17/07 16:49, Michael Nolan wrote:
> I don't know if my database is typical (as there probably is no such
> thing), but to restore a full dump (pg_dumpall) takes over 4 hours on my
> backup server, but to restore a low level backup (about 35GB)

Meaning a tarball of $PGDATA?

Actually, it's two different areas because I've got a second tablespace on a separate physical drive for indexes, but yes, it's a tarball of all the files that PG uses, following the procedures in section 23.3 of the documentation.

It works very well, though I still don't understand why, if there are no changes to the warm standby server tables, only queries, it isn't possible to keep restoring WAL files to keep the warm standby server in parallel with the live server. (I'm guessing there must be SOMETHING that happens at the end of the recovery process, or some time after that, to make the next WAL unprocessable., but I can't figure it out from the docs.)
--
Mike Nolan

This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail.

Re: Large Database Restore

From

Richard Huxton

Date:

18 May 2007, 12:37:47

Lee Keel wrote:
> Thanks to everyone for their input on this.  After reading all the emails
> and some of the documentation (section 23.3), I think this is all a little
> more than what I need.  My database is basically read-only and all I was
> looking to do is to be able to take snap-shots of it and be able to restore
> on a developer's machine and not take 30 hours.  So I was thinking I would
> zip the data directories associated to my database, then the developer could
> just copy the zip file and unzip in their own data directory.  My question
> now is: what file would a developer need to change to add this new directory
> to their database list, or will it just automatically show up when they
> refresh the service?

You can't do a file-level backup of one database I'm afraid, only all
databases in one installation.

--
   Richard Huxton
   Archonet Ltd

Re: Large Database Restore

From

Lee Keel

Date:

18 May 2007, 13:31:05

So then the best way to do this kind of backup\restore is to use pg_dump?
Is there any plan in the future to be able to do some sort of file-level
backup like SqlServer?

-LK

-----Original Message-----
From: Richard Huxton [mailto:dev@archonet.com]
Sent: Friday, May 18, 2007 10:35 AM
To: Lee Keel
Cc: Michael Nolan; Ron Johnson; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Large Database Restore

Lee Keel wrote:
> Thanks to everyone for their input on this.  After reading all the emails
> and some of the documentation (section 23.3), I think this is all a little
> more than what I need.  My database is basically read-only and all I was
> looking to do is to be able to take snap-shots of it and be able to
restore
> on a developer's machine and not take 30 hours.  So I was thinking I would
> zip the data directories associated to my database, then the developer
could
> just copy the zip file and unzip in their own data directory.  My question
> now is: what file would a developer need to change to add this new
directory
> to their database list, or will it just automatically show up when they
> refresh the service?

You can't do a file-level backup of one database I'm afraid, only all
databases in one installation.

--
   Richard Huxton
   Archonet Ltd
This email and any files transmitted with it are confidential and intended solely for the use of the individual or
entityto whom they are addressed. If you have received this email in error please notify the sender. This message
containsconfidential information and is intended only for the individual named. If you are not the named addressee you
shouldnot disseminate, distribute or copy this e-mail.

Re: Large Database Restore

From

Alvaro Herrera

Date:

18 May 2007, 13:38:37

Lee Keel escribió:
> So then the best way to do this kind of backup\restore is to use pg_dump?
> Is there any plan in the future to be able to do some sort of file-level
> backup like SqlServer?

Actually you can do single databases, but you must also include some
other directories besides the database directory.  You would need to
include everything, excluding things in the "base" directory, but not
exclude your databases directory in "base".

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Large Database Restore

From

Richard Huxton

Date:

18 May 2007, 14:43:04

Alvaro Herrera wrote:
> Lee Keel escribió:
>> So then the best way to do this kind of backup\restore is to use pg_dump?
>> Is there any plan in the future to be able to do some sort of file-level
>> backup like SqlServer?
>
> Actually you can do single databases, but you must also include some
> other directories besides the database directory.  You would need to
> include everything, excluding things in the "base" directory, but not
> exclude your databases directory in "base".

Will that not cause trouble if you've added users though?

--
   Richard Huxton
   Archonet Ltd

Re: Large Database Restore

From

Alvaro Herrera

Date:

18 May 2007, 14:55:38

Richard Huxton escribió:
> Alvaro Herrera wrote:
> >Lee Keel escribió:
> >>So then the best way to do this kind of backup\restore is to use pg_dump?
> >>Is there any plan in the future to be able to do some sort of file-level
> >>backup like SqlServer?
> >
> >Actually you can do single databases, but you must also include some
> >other directories besides the database directory.  You would need to
> >include everything, excluding things in the "base" directory, but not
> >exclude your databases directory in "base".
>
> Will that not cause trouble if you've added users though?

Huh, maybe it will, but then I don't see how.  Can you be more specific?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Large Database Restore

From

Richard Huxton

Date:

18 May 2007, 15:28:50

Alvaro Herrera wrote:
> Richard Huxton escribió:
>> Alvaro Herrera wrote:
>>> Lee Keel escribió:
>>>> So then the best way to do this kind of backup\restore is to use pg_dump?
>>>> Is there any plan in the future to be able to do some sort of file-level
>>>> backup like SqlServer?
>>> Actually you can do single databases, but you must also include some
>>> other directories besides the database directory.  You would need to
>>> include everything, excluding things in the "base" directory, but not
>>> exclude your databases directory in "base".
>> Will that not cause trouble if you've added users though?
>
> Huh, maybe it will, but then I don't see how.  Can you be more specific?

Well, user info is stored in "global" (as is the list of databases, and
it looks like language handler definitions too). Run strings over the
files and you'll see.

--
   Richard Huxton
   Archonet Ltd

Re: Large Database Restore

From

Alvaro Herrera

Date:

19 May 2007, 00:54:17

Richard Huxton escribió:
> Alvaro Herrera wrote:
> >Richard Huxton escribió:
> >>Alvaro Herrera wrote:
> >>>Lee Keel escribió:
> >>>>So then the best way to do this kind of backup\restore is to use
> >>>>pg_dump?
> >>>>Is there any plan in the future to be able to do some sort of file-level
> >>>>backup like SqlServer?
> >>>Actually you can do single databases, but you must also include some
> >>>other directories besides the database directory.  You would need to
> >>>include everything, excluding things in the "base" directory, but not
> >>>exclude your databases directory in "base".
> >>Will that not cause trouble if you've added users though?
> >
> >Huh, maybe it will, but then I don't see how.  Can you be more specific?
>
> Well, user info is stored in "global" (as is the list of databases, and
> it looks like language handler definitions too). Run strings over the
> files and you'll see.

Oh, the fear is that you would overwrite the new user files with the
data from the backup.  Yeah, that's correct, you would.  Also on restore
you would overwrite the pg_xlog and pg_clog areas and the control file,
which while good for your newly restored database, would render the
other databases corrupted.

Absolutely true.

The only way I see to make this work would be to freeze the involved
database (with vacuum freeze), then stop the postmaster cleanly, then
make the tarball of just the DB directory.  But then, if shared catalog
state changed between the backup and the restore (say, because you
create a table, which gets registered in the shared catalog
pg_shdepend), said changes would not be rolled back either leading to
possible problems later on.

I hadn't noticed this was so difficult to do!  (You can solve the
pg_shdepend problem by copying that database's entries elsewhere and
then restoring them along the tarball, but I'm not sure it's workable
for the other shared catalogs).

A lot more involved that you probably want anyway.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Large Database Restore

From

Richard Huxton

Date:

21 May 2007, 05:26:19

Lee Keel wrote:
> So then the best way to do this kind of backup\restore is to use pg_dump?
> Is there any plan in the future to be able to do some sort of file-level
> backup like SqlServer?

Oh you *can* do a file-level backup, but only of the entire cluster. If
you have information shared between databases (users, transaction IDs
etc) then I don't see any way for single-database file-copying to work.

--
   Richard Huxton
   Archonet Ltd