Re: [PERFORMANCE] Buying hardware - Mailing list pgsql-performance

From Craig Ringer
Subject Re: [PERFORMANCE] Buying hardware
Date
Msg-id 497E5E55.4020601@postnewspapers.com.au
Whole thread Raw
In response to Re: [PERFORMANCE] Buying hardware  (Jeff <threshar@torgo.978.org>)
List pgsql-performance
Jeff wrote:

> If you use PITR + rsync you can create a binary snapshot of the db, so
> restore time is simply how long it takes to untar / whatever it into
> place.  Our backup script basically does:
>
> archive backup directory
> pg_start_backup
> rsync
> pg_stop_backup
>
> voila. I have 2 full copies of the db.

Note that this does NOT protect you against a Pg bug, a filesystem
issue, or a RAID / disk media issue that results in corruption or damage
to the database that isn't immediately detected.

I personally wouldn't want to rely on PITR alone. I take periodic SQL
dumps as well, using PITR more for disaster recovery in case of
software/user error, so I can roll back to when the error occurred
rather than the potentially much earlier time of the last backup.

A dump is much more likely to trip over a problem, so it's good to run
periodically just to make sure the entire DB is readable. It's also more
likely to be restorable if something goes horribly wrong.

Personally I might be OK with WAL-archiving based backups if I was using
a warm-standby server in continuous recovery mode that'd notify me if
anything went wrong with the restore. I'm much more comfortable having
an SQL dump around, though.

> You could even expand it a bit
> and after the rsync & friends have it fire up the instance and run
> pg_dump against it for a pg_restore compatible dump "just in case".

Yep, that's what I do.

> It takes a long time to restore a 300GB db, even if you cheat and
> parallelify some of it. 8.4 may get a  pg_restore that can load in
> parallel - which will help somewhat.

It's a great pity that pg_dump can't dump in parallel, though. It makes
sense given that Pg has no mechanism for one backend to "join" anothers'
existing transaction, and without that you'd risk inconsistent
snapshots, but it's still a bit of a pity.

Is it actually that hard to let one backend join another's (strictly
read-only) transaction? The first backend will be ensuring that tuples
in the xmin/xmax range required by the transaction aren't VACUUMed away etc.

--
Craig Ringer

pgsql-performance by date:

Previous
From: "M. Edward (Ed) Borasky"
Date:
Subject: Re: postgresql 8.3 tps rate
Next
From: david@lang.hm
Date:
Subject: Re: [PERFORMANCE] Buying hardware