Need to replace SAN, best method with least downtime? (8.4.4) - Mailing list pgsql-general

From Marinos Yannikos
Subject Need to replace SAN, best method with least downtime? (8.4.4)
Date
Msg-id 4DB2DB8C.4070509@geizhals.at
Whole thread Raw
Responses Re: Need to replace SAN, best method with least downtime? (8.4.4)
Re: Need to replace SAN, best method with least downtime? (8.4.4)
List pgsql-general
Hi,

I have a beefy server with 2 SANs, 1 "fast" (A) and 1 "slow" (B) and 1.3TB worth
of 8.4.4 databases on A. A needs to be replaced/wiped completely with as little
downtime as possible. It's flash-based and the modules need to be replaced, so
no "swapping the SAN and keeping the disks". The databases are relatively busy,
generating 8-50 16MB WAL segments per minute).

Several methods spring to mind:

a) pg_dumpall, wipe, restore (alternatively pg_dump global objects and all
databases in parallel)

This will probably be 100% safe but take a long time (pg_dumpall takes ~440
minutes currently), so it's not useful unless the other methods are all too
risky. Access to DB needs to be prevented during backup to avoid data loss.

b) set up a PITR slave (warm standby) on the same box, fail over to it, replace
SAN A, then set up a PITR slave on A and fail over to it eventually

This would probably reduce my downtime to nearly nothing (except waiting for
slave to read in archived WAL before restarting it as master, if there is some
backlog). I cannot judge how risky it is in terms of data integrity. Also, it
means running at reduced performance for a long time (1.3TB "hot backup" needs
to be performed for fail over back to SAN A).

c) set up a tablespace on B and move as many tables/databases over to it as
possible without severe service degradation. Then shut down Postgres, perform a
filesystem-level backup of the remaining data on A, replace A, restore, then
move things back to the default tablespace.

Moving big tables/databases will cause service degradation or interruption, but
only few objects are really big and those aren't critical. I am hoping to end up
with <=150GB of data to back up/restore, which should take 20-30 minutes
(possibly less with rsync).

What would you do and why? I am considering c) at the moment because I am unsure
about b): I cannot check the integrity of the slave's datadir quickly before I
wipe the SAN (or can I?) and I don't know how well the slow SAN will hold up if
all busy tables are moved to it, also it has to be done very carefully with no
mistakes in recovery.conf etc. or I might trash my datadir or WAL archive dir.

Is there anything unsafe about c) that I am missing here? Looking at a few 100
tables and indices to classify and eventually move them is a lot of work, but it
seems worth it to me.

Thanks,
  Marinos

pgsql-general by date:

Previous
From: Joshua Tolley
Date:
Subject: Re: Cross-schema view issue/question
Next
From: Tom Lane
Date:
Subject: Re: Should I free this memory?