Thread: Need to replace SAN, best method with least downtime? (8.4.4)

Need to replace SAN, best method with least downtime? (8.4.4)

From
Marinos Yannikos
Date:
Hi,

I have a beefy server with 2 SANs, 1 "fast" (A) and 1 "slow" (B) and 1.3TB worth
of 8.4.4 databases on A. A needs to be replaced/wiped completely with as little
downtime as possible. It's flash-based and the modules need to be replaced, so
no "swapping the SAN and keeping the disks". The databases are relatively busy,
generating 8-50 16MB WAL segments per minute).

Several methods spring to mind:

a) pg_dumpall, wipe, restore (alternatively pg_dump global objects and all
databases in parallel)

This will probably be 100% safe but take a long time (pg_dumpall takes ~440
minutes currently), so it's not useful unless the other methods are all too
risky. Access to DB needs to be prevented during backup to avoid data loss.

b) set up a PITR slave (warm standby) on the same box, fail over to it, replace
SAN A, then set up a PITR slave on A and fail over to it eventually

This would probably reduce my downtime to nearly nothing (except waiting for
slave to read in archived WAL before restarting it as master, if there is some
backlog). I cannot judge how risky it is in terms of data integrity. Also, it
means running at reduced performance for a long time (1.3TB "hot backup" needs
to be performed for fail over back to SAN A).

c) set up a tablespace on B and move as many tables/databases over to it as
possible without severe service degradation. Then shut down Postgres, perform a
filesystem-level backup of the remaining data on A, replace A, restore, then
move things back to the default tablespace.

Moving big tables/databases will cause service degradation or interruption, but
only few objects are really big and those aren't critical. I am hoping to end up
with <=150GB of data to back up/restore, which should take 20-30 minutes
(possibly less with rsync).

What would you do and why? I am considering c) at the moment because I am unsure
about b): I cannot check the integrity of the slave's datadir quickly before I
wipe the SAN (or can I?) and I don't know how well the slow SAN will hold up if
all busy tables are moved to it, also it has to be done very carefully with no
mistakes in recovery.conf etc. or I might trash my datadir or WAL archive dir.

Is there anything unsafe about c) that I am missing here? Looking at a few 100
tables and indices to classify and eventually move them is a lot of work, but it
seems worth it to me.

Thanks,
  Marinos

Re: Need to replace SAN, best method with least downtime? (8.4.4)

From
Gabriele Bartolini
Date:
Hi Marinos,

Il 23/04/11 16:00, Marinos Yannikos ha scritto:
> b) set up a PITR slave (warm standby) on the same box, fail over to
> it, replace SAN A, then set up a PITR slave on A and fail over to it
> eventually

Based on what you said (on a professional basis this would require a
thorough assessment), I would recommend using warm standby then
switchover to the new server. This operation will give you the shortest
downtime. When it is restored, switchback to the original server using
the same procedure.

You can find information on this procedure in the documentation and in
PostgreSQL 9 administration cookbook
(http://www.postgresql.org/docs/books/).

If you are not sure you can do this by yourself, you can still contact
professional companies that regularly do this kind of services around
the world (mine is one of them but for a complete list:
http://www.postgresql.org/support/professional_support) and let them do
the job for you. Probably cheaper and safer.

Cheers,
Gabriele

--
  Gabriele Bartolini - 2ndQuadrant Italia
  PostgreSQL Training, Services and Support
  gabriele.bartolini@2ndQuadrant.it | www.2ndQuadrant.it


Re: Need to replace SAN, best method with least downtime? (8.4.4)

From
John R Pierce
Date:
On 04/23/11 7:00 AM, Marinos Yannikos wrote:
> Hi,
>
> I have a beefy server with 2 SANs, 1 "fast" (A) and 1 "slow" (B) and
> 1.3TB worth of 8.4.4 databases on A. A needs to be replaced/wiped
> completely with as little downtime as possible. It's flash-based and
> the modules need to be replaced, so no "swapping the SAN and keeping
> the disks". The databases are relatively busy, generating 8-50 16MB
> WAL segments per minute).
>
> Several methods spring to mind:

f)  setup a mdraid mirror between your SAN A logical volumes and new
storage.   when its finished mirroring, remove the SAN A LUN(s) from the
mirror(s)