Thread: Fastest way to restore a database

Fastest way to restore a database

From
William Garrison
Date:
I know that PostgreSQL is slow at restoring databases. But there are
some tricks to use when speeding it up. Here is a brief list I compiled
from reading the docs and reading some forums. Is there a definitive
list of things to do?

* Turn off fsync
So it won’t flush after every commit
* Turn off full_page_writes
Don’t write the WAL archives in a safe way. But we don’t need WAL
archives during a restore. Future versions of postgres will let you turn
off WAL archives entirely
* Turn off archive_mode
This disables the archiving of WAL logs
* Increase the checkpoint_segments parameter (the default is 3 – so...
maybe 10?)
Increases the number of transactions that can happen before a WAL checkpoint
* The --single-transaction paremeter to pg_restore *might* speed it up
One transaction is more efficient, but an entire DB restore might be too
big of a transaction. So I’m not so sure about this option
* Increase the maintenance_work_mem setting to 512MB
Gives more memory to CREATE_INDEX commands, which is part of the restore
process
* (PostgreSql 8.3 only) Turn off synchronous_commit
This makes it so that the database won’t wait for the WAL checkpoint to
be completed before moving on to the next operation. Again, we don’t
want WAL archiving during a restore anyway.

Are any of the above items not going to help? Anything I'm missing? Is
there a way to disable foreign key constraints during the restore since
I know it is already a good database?


I am using postgreSQL 8.2.9 on Win32

Re: Fastest way to restore a database

From
Greg Smith
Date:
On Fri, 12 Sep 2008, William Garrison wrote:

> Is there a definitive list of things to do?

That section of the documention is pretty good:
http://www.postgresql.org/docs/current/static/populate.html

The main thing it's missing is a discussion of how to cut down on disk
commit overhead by either usinc async commit or turning fsync off.  If
you've got a good caching controller that may not be needed though.

The other large chunk of information it doesn't really go into is what
server tuning you could do to improve general performance, which obviously
would then help with loading as well.
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server goes over
much of that.

> * Turn off full_page_writes
> Don’t write the WAL archives in a safe way. But we don’t need WAL archives
> during a restore. Future versions of postgres will let you turn off WAL
> archives entirely

Ideally you'd be using COPY such that the table was just created or
truncated before loading, which (if archive_mode is off) keeps them from
being WAL logged, as described in 14.4.7.  If you do that and vastly
increase checkpoint_segments, full_page_writes has minimal impact.

> * Increase the checkpoint_segments parameter (the default is 3 – so... maybe
> 10?)

64-256 is the usual range you'll see people using for bulk loading.

> * Increase the maintenance_work_mem setting to 512MB

I haven't really seen any real improvement setting that over 256MB.  If
you've got RAM to waste it doesn't really matter if you set it too high
though.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Fastest way to restore a database

From
William Garrison
Date:
Thanks so much!

So... if I am using pg_dump and pg_restore with a compressed backup,
then it is using COPY, correct?  And I think that would follow a CREATE
TABLE statement as mentioned in the first link... so no WAL files written?

Greg Smith wrote:
> On Fri, 12 Sep 2008, William Garrison wrote:
>
>> Is there a definitive list of things to do?
>
> That section of the documention is pretty good:
> http://www.postgresql.org/docs/current/static/populate.html
>
> The main thing it's missing is a discussion of how to cut down on disk
> commit overhead by either usinc async commit or turning fsync off.  If
> you've got a good caching controller that may not be needed though.
>
> The other large chunk of information it doesn't really go into is what
> server tuning you could do to improve general performance, which
> obviously would then help with loading as well.
> http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server goes
> over much of that.
>
>> * Turn off full_page_writes
>> Don’t write the WAL archives in a safe way. But we don’t need WAL
>> archives during a restore. Future versions of postgres will let you
>> turn off WAL archives entirely
>
> Ideally you'd be using COPY such that the table was just created or
> truncated before loading, which (if archive_mode is off) keeps them
> from being WAL logged, as described in 14.4.7.  If you do that and
> vastly increase checkpoint_segments, full_page_writes has minimal impact.
>
>> * Increase the checkpoint_segments parameter (the default is 3 –
>> so... maybe 10?)
>
> 64-256 is the usual range you'll see people using for bulk loading.
>
>> * Increase the maintenance_work_mem setting to 512MB
>
> I haven't really seen any real improvement setting that over 256MB.
> If you've got RAM to waste it doesn't really matter if you set it too
> high though.
>
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


Re: Fastest way to restore a database

From
Tom Lane
Date:
William Garrison <postgres@mobydisk.com> writes:
> So... if I am using pg_dump and pg_restore with a compressed backup,
> then it is using COPY, correct?  And I think that would follow a CREATE
> TABLE statement as mentioned in the first link... so no WAL files written?

Only if you use --single-transaction.

The worry expressed upthread about the transaction being "too large" is
unfounded, btw.  Unlike some other DBs, PG doesn't have a finite-size
undo log.

            regards, tom lane

Re: Fastest way to restore a database

From
Robert Treat
Date:
On Friday 12 September 2008 14:32:07 Greg Smith wrote:
> On Fri, 12 Sep 2008, William Garrison wrote:
> > Is there a definitive list of things to do?
>
> That section of the documention is pretty good:
> http://www.postgresql.org/docs/current/static/populate.html
>
> The main thing it's missing is a discussion of how to cut down on disk
> commit overhead by either usinc async commit or turning fsync off.  If
> you've got a good caching controller that may not be needed though.
>
> The other large chunk of information it doesn't really go into is what
> server tuning you could do to improve general performance, which obviously
> would then help with loading as well.
> http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server goes over
> much of that.
>
> > * Turn off full_page_writes
> > Don’t write the WAL archives in a safe way. But we don’t need WAL
> > archives during a restore. Future versions of postgres will let you turn
> > off WAL archives entirely
>
> Ideally you'd be using COPY such that the table was just created or
> truncated before loading, which (if archive_mode is off) keeps them from
> being WAL logged, as described in 14.4.7.  If you do that and vastly
> increase checkpoint_segments, full_page_writes has minimal impact.
>
> > * Increase the checkpoint_segments parameter (the default is 3 – so...
> > maybe 10?)
>
> 64-256 is the usual range you'll see people using for bulk loading.
>

Don't forget to bump up checkpoint_timeout along with that... actually, I
blogged a couple of times on this topic:


http://people.planetpostgresql.org/xzilla/index.php?/archives/133-Getting-faster-database-restores-on-postgresql-8.1.html
http://people.planetpostgresql.org/xzilla/index.php?/archives/223-Measuring-database-restore-times.html

A little old, but might be helpful.

--
Robert Treat
http://www.omniti.com
Database: Scalability: Consulting:

Re: Fastest way to restore a database

From
Scott Ribe
Date:
> The worry expressed upthread about the transaction being "too large" is
> unfounded, btw.  Unlike some other DBs, PG doesn't have a finite-size
> undo log.

Sure, it won't fail. But would there be some point at which it would become
slower than multiple transactions? Or is it always faster (or at least as
fast)?


--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice



Re: Fastest way to restore a database

From
Tom Lane
Date:
Scott Ribe <scott_ribe@killerbytes.com> writes:
>> The worry expressed upthread about the transaction being "too large" is
>> unfounded, btw.  Unlike some other DBs, PG doesn't have a finite-size
>> undo log.

> Sure, it won't fail. But would there be some point at which it would become
> slower than multiple transactions? Or is it always faster (or at least as
> fast)?

I can't think of any reason it would be slower.

There are certainly issues you could run into with very long
transactions, like vacuum not being able to remove bloat elsewhere.

            regards, tom lane

Re: Fastest way to restore a database

From
Greg Smith
Date:
On Fri, 12 Sep 2008, Robert Treat wrote:

> Don't forget to bump up checkpoint_timeout along with that... actually, I
> blogged a couple of times on this topic

And with that there's enough information to start a dedicated page on this
topic:  http://wiki.postgresql.org/wiki/Bulk_Loading_and_Restores

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Fastest way to restore a database

From
Robert Treat
Date:
On Friday 12 September 2008 15:55:46 Tom Lane wrote:
> Scott Ribe <scott_ribe@killerbytes.com> writes:
> >> The worry expressed upthread about the transaction being "too large" is
> >> unfounded, btw.  Unlike some other DBs, PG doesn't have a finite-size
> >> undo log.
> >
> > Sure, it won't fail. But would there be some point at which it would
> > become slower than multiple transactions? Or is it always faster (or at
> > least as fast)?
>
> I can't think of any reason it would be slower.
>
> There are certainly issues you could run into with very long
> transactions, like vacuum not being able to remove bloat elsewhere.
>

Which reminds me (and not seeing it elsewhere), on full restores you will
probably want to disable autovacuum entirely, as it will compete for
reasources and can lead to locking issues as well. Note, this can sometimes
apply to more narrow restore scenarios, but it isnt as cut and dried.  (Ie,
with multiple database in a cluster, you dont want to disable it for all
databases, though it'd be nice to disable it for the one you're restoring)

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL