Thread: Question on placing database on a network attached storage device

Hi,
 
I'm putting together a mission critical application which will use a Postgresql database, and I am trying to decide if I should place my Postgresql database on a network attached storage appliance or on local disks. 
 
If I place the database on the NAS, it will get "backed up" by the NAS device's "snap" type of images.  That means the NAS will make a copy of the delta, or difference, between the current version of the files on it, and the version from the last time the "snap" was done.  The "snap" images will be dumped to tape at some interval of time.
 
I'm concerned about using a "snap" of an open and active database to restore the database if the need arises.  Now, postgresql is pretty good about keeping checkpoints and recovering, but I'm still concerned about data integrity, (who knows what state any internal pointers are in when the "snap is taken), and as my application will need to archive groups of files that have to be kept together in order for the data in those files to be meaningful, (a group of files could be in the process of being archived to the database, the NAS takes a "snap" when only some of the files have been written in the database, and then the rest of the files are written in the database).  There is also the question of speed, Network attached storage vs. local disks, (network speed is pretty good 100 Meg and will be going to 1 Gig in the future).
 
I suppose I could place the database on the NAS, shutdown the database when a backup of the database is desired, have Postgresql create the database re-creation script, let the NAS take it's "snap", and then bring the database back up.
 
Any suggestions, experiences, or observations?
Chris,

The NAS is a Network Appliance 920c.  I understand that an individual file
while be "intact" during a snap, (not sure I believe it, but that's what I'm
told).  The NAS may be able to monitor the state of an individual file and
not "snap" it if it is on an "open" state or has a pending write, or some
such, but as the NAS has no knowledge of groups of files that need to exist
in a known state, (it deals on the file level), it cannot guarantee a
consistent state across a group of files.  (The database will be on the NAS
and will be NFS mounted filesystem to a Linux Redhat 3 running the
application.)
----- Original Message -----
From: "Chris Travers" <chris@verkiel.metatrontech.com>
To: "N/A" <srichardjanet@charter.net>
Cc: <pgsql-admin@postgresql.org>
Sent: Saturday, July 30, 2005 1:44 PM
Subject: Re: [ADMIN] Question on placing database on a network attached
storage device


> N/A wrote:
>
> > Hi,
> >
> > I'm putting together a mission critical application which will use a
> > Postgresql database, and I am trying to decide if I should place my
> > Postgresql database on a network attached storage appliance or on
> > local disks.
>
> Wherever possible, I think that local storage is almost always best as
> it is a less complex system (hence less can go wrong).  However, you
> will probably get better answers if you tell us what sort of NAS you are
> looking at using, how it will connect to your system, etc.
>
> >
> > If I place the database on the NAS, it will get "backed up" by the NAS
> > device's "snap" type of images.  That means the NAS will make a copy
> > of the delta, or difference, between the current version of the files
> > on it, and the version from the last time the "snap" was done.  The
> > "snap" images will be dumped to tape at some interval of time.
> >
>
> How internally consistant is this snap?  I.e. if the file is changed
> while the snap is being taken, what happens?  This behavior may be
> product-dependnant and has serious ramifications for your data.
>
> > I'm concerned about using a "snap" of an open and active database to
> > restore the database if the need arises.  Now, postgresql is pretty
> > good about keeping checkpoints and recovering, but I'm still concerned
> > about data integrity, (who knows what state any internal pointers are
> > in when the "snap is taken), and as my application will need to
> > archive groups of files that have to be kept together in order for the
> > data in those files to be meaningful, (a group of files could be in
> > the process of being archived to the database, the NAS takes a "snap"
> > when only some of the files have been written in the database, and
> > then the rest of the files are written in the database).  There is
> > also the question of speed, Network attached storage vs. local disks,
> > (network speed is pretty good 100 Meg and will be going to 1 Gig in
> > the future).
>
> Personally I think that there are better approaches than NAS for this,
> but again I am unwilling to say "never use it."
>
> >
> > I suppose I could place the database on the NAS, shutdown the database
> > when a backup of the database is desired, have Postgresql create the
> > database re-creation script, let the NAS take it's "snap", and then
> > bring the database back up.
> >
> > Any suggestions, experiences, or observations?
>
> Best Wishes,
> Chris Travers
> Metatron Technology Consulting
>

Re: Question on placing database on a network attached storage

From
Chris Travers
Date:
N/A wrote:

>Chris,
>
>The NAS is a Network Appliance 920c.  I understand that an individual file
>while be "intact" during a snap, (not sure I believe it, but that's what I'm
>told).
>
Depending on the way that data is journalled, this may be a possibility
(istr XFS is able to do this).

>  The NAS may be able to monitor the state of an individual file and
>not "snap" it if it is on an "open" state or has a pending write, or some
>such, but as the NAS has no knowledge of groups of files that need to exist
>in a known state, (it deals on the file level), it cannot guarantee a
>consistent state across a group of files.
>
That is a serious concern.  Every table in PostgreSQL consists of at
least one physical file.  Transactions may touch several of these
files.  While it is not impossible to solve this problem (as I
mentioned, I think XFS has this capability), it is hardly trivial or
something you would normally expect.

Also I wonder what the impact of several Postmasters trying to access
data over NFS would be.

Best Wishes,
Chris Travers

Re: Question on placing database on a network attached

From
Jeff Frost
Date:
If you are using postgresql-8.0.x, then you can use PITR and base backups to
achieve a fairly consistent backup on a NAS which restores rather quickly.  If
the NAS supports hard links, then you can use the --link-dest option of rsync
to make the base backups quite fast and still retain more than one base backup
at a time.  The only problem with this solution is that you need at least as
much disk space as your PGDATA directory contains, plus enough space for the
WAL archives between base backups.

If the NAS does not support hard links, but does do snapshotting as you
indicate, then using rsync with the --partial flag against the previous
base backup would also be quite fast and bandwidth efficient.  The
snapshotting on the NAS would take care of the prior version base backup for
you and all you need do is rotate out the WAL archives according to how many
snapshots you care about.

Using PITR in this way has the added benefit of allowing you to restore the DB
to some point in the past if need be.

--
Jeff Frost, Owner     <jeff@frostconsultingllc.com>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

On Sat, 30 Jul 2005, N/A wrote:

> The NAS is a Network Appliance 920c.  I understand that an individual file
> while be "intact" during a snap, (not sure I believe it, but that's what I'm
> told).  The NAS may be able to monitor the state of an individual file and
> not "snap" it if it is on an "open" state or has a pending write, or some
> such, but as the NAS has no knowledge of groups of files that need to exist
> in a known state, (it deals on the file level), it cannot guarantee a
> consistent state across a group of files.  (The database will be on the NAS
> and will be NFS mounted filesystem to a Linux Redhat 3 running the
> application.)


Re: Question on placing database on a network attached storage

From
Alvaro Herrera
Date:
On Sat, Jul 30, 2005 at 02:59:30PM -0700, Chris Travers wrote:

> Also I wonder what the impact of several Postmasters trying to access
> data over NFS would be.

That will lead to corruption very quickly.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"Estoy de acuerdo contigo en que la verdad absoluta no existe...
El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama)

Re: Question on placing database on a network attached storage

From
Chris Travers
Date:
N/A wrote:

> Hi,
>
> I'm putting together a mission critical application which will use a
> Postgresql database, and I am trying to decide if I should place my
> Postgresql database on a network attached storage appliance or on
> local disks.

Wherever possible, I think that local storage is almost always best as
it is a less complex system (hence less can go wrong).  However, you
will probably get better answers if you tell us what sort of NAS you are
looking at using, how it will connect to your system, etc.

>
> If I place the database on the NAS, it will get "backed up" by the NAS
> device's "snap" type of images.  That means the NAS will make a copy
> of the delta, or difference, between the current version of the files
> on it, and the version from the last time the "snap" was done.  The
> "snap" images will be dumped to tape at some interval of time.
>

How internally consistant is this snap?  I.e. if the file is changed
while the snap is being taken, what happens?  This behavior may be
product-dependnant and has serious ramifications for your data.

> I'm concerned about using a "snap" of an open and active database to
> restore the database if the need arises.  Now, postgresql is pretty
> good about keeping checkpoints and recovering, but I'm still concerned
> about data integrity, (who knows what state any internal pointers are
> in when the "snap is taken), and as my application will need to
> archive groups of files that have to be kept together in order for the
> data in those files to be meaningful, (a group of files could be in
> the process of being archived to the database, the NAS takes a "snap"
> when only some of the files have been written in the database, and
> then the rest of the files are written in the database).  There is
> also the question of speed, Network attached storage vs. local disks,
> (network speed is pretty good 100 Meg and will be going to 1 Gig in
> the future).

Personally I think that there are better approaches than NAS for this,
but again I am unwilling to say "never use it."

>
> I suppose I could place the database on the NAS, shutdown the database
> when a backup of the database is desired, have Postgresql create the
> database re-creation script, let the NAS take it's "snap", and then
> bring the database back up.
>
> Any suggestions, experiences, or observations?

Best Wishes,
Chris Travers
Metatron Technology Consulting

Attachment