Thread: Least intrusive way to move primary data

Least intrusive way to move primary data

From
Armand du Plessis
Date:
We're looking into options for the least intrusive way of moving our pg_data onto faster storage. The basic setup is as follows :

6 disk RAID-0 array of EBS volumes used for primary data storage
2 disk RAID-0 array of EBS volumes used for transaction logs
RAID arrays are xfs

It's the primary data volumes we'd like to switch onto a faster RAID array. 

The options I'm looking at are the following: 
  • Setup new slave on similar hardware, do a normal master -> slave failover when it's ready. Upside, minimal downtime. Downside, quite a lot of work for something that's essentially just a copy of data
  • rsync the data storage to new RAID-0 array. stop Postgres. rsync again to apply changes. Change mount points and restart Postgres. Longer downtime but seems to be the most straight forward. 
  • Go the full route of stopping Postgres, make consistent snapshots of all the volumes and recreate a new RAID-0 array from them and restarting. https://github.com/alestic/ec2-consistent-snapshot
The rsync route sounds like the most straight-forward option, however I'm a little worried about it's consistency on a busy filesystem despite it being stopped when the second sync happens.

Scripting the snapshotting, and reassembling of the RAID array sounds like the most reliable way of doing it and possibly the quickest as well. 

Any pitfalls, gotchas or suggestions for making the switch?

Kind regards,

Armand

Re: Least intrusive way to move primary data

From
"Andrew W. Gibbs"
Date:
Going with your first option, a master->slave replication, has the
added benefit that you build the expertise for doing Continuous Point
In Time Recovery, and after you do this storage system migration you
can use that knowledge to put in a place a permanent standby server.
Yes, it is a bit of work, but you'd kill two birds with one stone, and
these are worthwhile birds.  If you've got a busy server, and you want
to do regular back-ups, and you don't have a much more
expensive/sophisticated solution at your behest (or you want to
support load balancing of reads in the future), this is probably
something you want to do anyway.

I don't see why your rsync option wouldn't work.  Unless there is
something about rsync that I don't know (and there might be), I would
think that you would end up with consistent copies of the file
systems, and so the question in my mind is whether the downtime for
shutting down Postgres and doing a second rsync will be acceptable.

If you go the "snapshot" route, the most important thing is ensuring
that you're "crash consistent" across all volumes together.  You'll
want to make sure that whatever is implementing the snapshot can
support lumping together all of the volumes into a management group of
some sort.  It's up to you to figure out the implementation details of
whatever solution you're using.  Just take care not to end up in a
situation such that all of the intra-volume writes are consistent but
the inter-volume writes are not, which would mean a corrupted
Postgres.

  -- AWG

On Thu, May 30, 2013 at 10:58:10AM +0200, Armand du Plessis wrote:
> We're looking into options for the least intrusive way of moving our
> pg_data onto faster storage. The basic setup is as follows :
>
> 6 disk RAID-0 array of EBS volumes used for primary data storage
> 2 disk RAID-0 array of EBS volumes used for transaction logs
> RAID arrays are xfs
>
> It's the primary data volumes we'd like to switch onto a faster RAID array.
>
> The options I'm looking at are the following:
>
>    - Setup new slave on similar hardware, do a normal master -> slave
>    failover when it's ready. Upside, minimal downtime. Downside, quite a lot
>    of work for something that's essentially just a copy of data
>    - rsync the data storage to new RAID-0 array. stop Postgres. rsync again
>    to apply changes. Change mount points and restart Postgres. Longer downtime
>    but seems to be the most straight forward.
>    - Go the full route of stopping Postgres, make consistent snapshots of
>    all the volumes and recreate a new RAID-0 array from them and restarting.
>    https://github.com/alestic/ec2-consistent-snapshot
>
> The rsync route sounds like the most straight-forward option, however I'm a
> little worried about it's consistency on a busy filesystem despite it being
> stopped when the second sync happens.
>
> Scripting the snapshotting, and reassembling of the RAID array sounds like
> the most reliable way of doing it and possibly the quickest as well.
>
> Any pitfalls, gotchas or suggestions for making the switch?
>
> Kind regards,
>
> Armand


Re: Least intrusive way to move primary data

From
Armand du Plessis
Date:

On Thu, May 30, 2013 at 12:19 PM, Andrew W. Gibbs <awgibbs@awgibbs.com> wrote:
Going with your first option, a master->slave replication, has the
added benefit that you build the expertise for doing Continuous Point
In Time Recovery, and after you do this storage system migration you
can use that knowledge to put in a place a permanent standby server.
Yes, it is a bit of work, but you'd kill two birds with one stone, and
these are worthwhile birds.  If you've got a busy server, and you want
to do regular back-ups, and you don't have a much more
expensive/sophisticated solution at your behest (or you want to
support load balancing of reads in the future), this is probably
something you want to do anyway.

Thanks Andrew, actually I have a streaming slave running for backups at the moment. It's just not as powerful as the primary. It would probably actually be an option to upgrade it and do the failover as well.