Thread: File system snapshots for multiple file systems

File system snapshots for multiple file systems

From
Bruce Momjian
Date:
Right now it isn't possible to use file system snapshots a reliable
backup if you are using multiple file systems for tablespaces because
most systems don't allow the simultaneous snapshoting of multiple file
system.  Our documentation mentions this:
If your database is spread across multiple file systems, there might notbe any way to obtain exactly-simultaneous
frozensnapshots of all thevolumes. For example, if your data files and WAL log are on differentdisks, or if tablespaces
areon different file systems, it might not bepossible to use snapshot backup because the snapshots must besimultaneous.
Readyour file system documentation very carefully beforetrusting to the consistent-snapshot technique in such
situations.Thesafest approach is to shut down the database server for long enough toestablish all the frozen
snapshots.

However, it occurred to me that if someone turned on continuous arciving
during the file system snapshots, then you could use PITR to recover
from file system snapshots that were not simultaneous.

Should this be documented?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: File system snapshots for multiple file systems

From
Heikki Linnakangas
Date:
Bruce Momjian wrote:
> Right now it isn't possible to use file system snapshots a reliable
> backup if you are using multiple file systems for tablespaces because
> most systems don't allow the simultaneous snapshoting of multiple file
> system.  Our documentation mentions this:
> 
>     If your database is spread across multiple file systems, there might not
>     be any way to obtain exactly-simultaneous frozen snapshots of all the
>     volumes. For example, if your data files and WAL log are on different
>     disks, or if tablespaces are on different file systems, it might not be
>     possible to use snapshot backup because the snapshots must be
>     simultaneous. Read your file system documentation very carefully before
>     trusting to the consistent-snapshot technique in such situations. The
>     safest approach is to shut down the database server for long enough to
>     establish all the frozen snapshots.
> 
> However, it occurred to me that if someone turned on continuous arciving
> during the file system snapshots, then you could use PITR to recover
> from file system snapshots that were not simultaneous.
> 
> Should this be documented?

If you use continuous archiving, the snapshot indeed doesn't need to be 
atomic. In fact, you can use tar instead of filesystem snapshots. And in 
fact, that's exactly how you take the base backup with PITR, and that is 
documented.

Incidentally, I looked at this stuff just a couple of days ago, and it 
occurred to me that we really should make it easier to take a hot backup 
with that mechanism. We shouldn't require setting up archive_command, 
and WAL archiving, if all you want is to take a backup from a live 
system. From user point of view, it should be a matter of:

1. call pg_start_backup('foo')
2. tar/etc. the whole data directory, except for pg_xlog
3. tar pg_xlog
4. call pg_stop_backup()

If we just made sure that we don't delete or recycle any WAL files while 
the backup is being taken, that would work, right?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: File system snapshots for multiple file systems

From
"Jonah H. Harris"
Date:
On Mon, Apr 7, 2008 at 2:58 PM, Heikki Linnakangas
<heikki@enterprisedb.com> wrote:
>  Incidentally, I looked at this stuff just a couple of days ago, and it
> occurred to me that we really should make it easier to take a hot backup
> with that mechanism. We shouldn't require setting up archive_command, and
> WAL archiving, if all you want is to take a backup from a live system. From
> user point of view, it should be a matter of:
>
>  1. call pg_start_backup('foo')
>  2. tar/etc. the whole data directory, except for pg_xlog
>  3. tar pg_xlog
>  4. call pg_stop_backup()
>
>  If we just made sure that we don't delete or recycle any WAL files while
> the backup is being taken, that would work, right?

Or checkpoint, yes?  I don't see tar backing up large (100+GB)
databases in < 5 minutes.

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com
Edison, NJ 08837 | http://www.enterprisedb.com/


Re: File system snapshots for multiple file systems

From
Bruce Momjian
Date:
Heikki Linnakangas wrote:
> > However, it occurred to me that if someone turned on continuous arciving
> > during the file system snapshots, then you could use PITR to recover
> > from file system snapshots that were not simultaneous.
> > 
> > Should this be documented?
> 
> If you use continuous archiving, the snapshot indeed doesn't need to be 
> atomic. In fact, you can use tar instead of filesystem snapshots. And in 
> fact, that's exactly how you take the base backup with PITR, and that is 
> documented.

Right.  My point is that for people who don't want continuous archiving,
doing it _only_ during the snapshots allows multi-filesystem snapshots
to be reliable.

> Incidentally, I looked at this stuff just a couple of days ago, and it 
> occurred to me that we really should make it easier to take a hot backup 
> with that mechanism. We shouldn't require setting up archive_command, 
> and WAL archiving, if all you want is to take a backup from a live 
> system. From user point of view, it should be a matter of:
> 
> 1. call pg_start_backup('foo')
> 2. tar/etc. the whole data directory, except for pg_xlog
> 3. tar pg_xlog
> 4. call pg_stop_backup()
> 
> If we just made sure that we don't delete or recycle any WAL files while 
> the backup is being taken, that would work, right?

Yes, agreed.  This is exactly the issue I was raising.  You are showing
it as tar, but I am showing it as multi-volume snapshots.  To do what
you want to do above, you would have to stop doing checkpoints during
the start/stop, which I am a little afraid of with 'tar --- file system
snapshots are much faster.  Also, you would have to file system snapshot
the /pg_xlog file system at the end but it is the same logic.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: File system snapshots for multiple file systems

From
Fujii Masao
Date:
Heikki Linnakangas wrote:
> 1. call pg_start_backup('foo')
> 2. tar/etc. the whole data directory, except for pg_xlog
> 3. tar pg_xlog
> 4. call pg_stop_backup()
> 
> If we just made sure that we don't delete or recycle any WAL files while 
> the backup is being taken, that would work, right?

When is the backup history file for PITR backed up?
Just after pg_stop_backup()? Next backup?

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
TEL (03)5860-5115
FAX (03)5463-5490


Re: File system snapshots for multiple file systems

From
Heikki Linnakangas
Date:
Jonah H. Harris wrote:
> On Mon, Apr 7, 2008 at 2:58 PM, Heikki Linnakangas
> <heikki@enterprisedb.com> wrote:
>>  Incidentally, I looked at this stuff just a couple of days ago, and it
>> occurred to me that we really should make it easier to take a hot backup
>> with that mechanism. We shouldn't require setting up archive_command, and
>> WAL archiving, if all you want is to take a backup from a live system. From
>> user point of view, it should be a matter of:
>>
>>  1. call pg_start_backup('foo')
>>  2. tar/etc. the whole data directory, except for pg_xlog
>>  3. tar pg_xlog
>>  4. call pg_stop_backup()
>>
>>  If we just made sure that we don't delete or recycle any WAL files while
>> the backup is being taken, that would work, right?
> 
> Or checkpoint, yes?  I don't see tar backing up large (100+GB)
> databases in < 5 minutes.

Right.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: File system snapshots for multiple file systems

From
tomas@tuxteam.de
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, Apr 07, 2008 at 03:03:41PM -0400, Jonah H. Harris wrote:
> On Mon, Apr 7, 2008 at 2:58 PM, Heikki Linnakangas
> <heikki@enterprisedb.com> wrote:
> >  Incidentally, I looked at this stuff just a couple of days ago, and it

[...]

> Or checkpoint, yes?  I don't see tar backing up large (100+GB)
> databases in < 5 minutes.

Checkpoinitng is definitely coolest. If your file system doesn't do
that, rsync is a good poor man's replacement:
 first rsync (takes long)             (or work from an older backup) pg_start_backup(...) rsync (should be much faster)
rsyncWAL pg_stop_backup()
 

I regularly rsync moderately active 500GB filesystems on fairly feeble
hardware in about 5-10 minutes (for daily backups).

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFH+yEnBcgs9XrR2kYRAg00AJ4+RXKN9apDd1y6EcP+fkJbIRL0uwCfRcD/
KEDesKvCvSA4CzhomRJJ9Zg=
=Nw8x
-----END PGP SIGNATURE-----


Re: File system snapshots for multiple file systems

From
Heikki Linnakangas
Date:
tomas@tuxteam.de wrote:
> Checkpoinitng is definitely coolest. If your file system doesn't do
> that, rsync is a good poor man's replacement:
> 
>   first rsync (takes long)
>               (or work from an older backup)
>   pg_start_backup(...)
>   rsync (should be much faster)
>   rsync WAL
>   pg_stop_backup()
> 
> I regularly rsync moderately active 500GB filesystems on fairly feeble
> hardware in about 5-10 minutes (for daily backups).

That will *not* get you a consistent, safe backup. If a PostgreSQL 
checkpoint happens while you're rsyncing the WAL, the data files won't 
(necessarily) contain updates that were made between the rsync of the 
data finished and the checkpoint.

To do that, you'd need to set up continuous archiving, and do a PITR 
restore.

What I was complaining/suggesting is that we should make what you did to 
actually work, because it's a lot simpler. And as Jonah pointed out, 
we'd need to inhibit checkpoints between pg_start_backup() and 
pg_stop_backup() to make it work.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: File system snapshots for multiple file systems

From
tomas@tuxteam.de
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Apr 08, 2008 at 08:52:03AM +0100, Heikki Linnakangas wrote:
> tomas@tuxteam.de wrote:

[...]

> What I was complaining/suggesting is that we should make what you did to 
> actually work, because it's a lot simpler. And as Jonah pointed out, 
> we'd need to inhibit checkpoints between pg_start_backup() and 
> pg_stop_backup() to make it work.

Thanks -- I think I got that.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFH+3w6Bcgs9XrR2kYRAmuAAJ9M96i0mdFbwQYf3cS6NaDoipICpwCfTSsf
2efd+LqM27MAQkEgCCjFPFk=
=8m0L
-----END PGP SIGNATURE-----


Re: File system snapshots for multiple file systems

From
Tom Lane
Date:
Heikki Linnakangas <heikki@enterprisedb.com> writes:
> What I was complaining/suggesting is that we should make what you did to 
> actually work, because it's a lot simpler. And as Jonah pointed out, 
> we'd need to inhibit checkpoints between pg_start_backup() and 
> pg_stop_backup() to make it work.

I don't think that follows --- what you'd need is to prevent the
checkpoints from removing/recycling old WAL files, but you can still
allow a checkpoint to occur.  Any subsequent recovery from the backup
would need to replay from the checkpoint identified by the backup label
file anyway.

Whether it's a good idea or not is a bit debatable though.  I'm
concerned about the WAL partition filling up (--> PANIC), especially
if you forget to pg_stop_backup after getting your backup.
        regards, tom lane


Re: File system snapshots for multiple file systems

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Whether it's a good idea or not is a bit debatable though.  I'm
> concerned about the WAL partition filling up (--> PANIC), especially
> if you forget to pg_stop_backup after getting your backup.

We check if pg_start_backup in effect when we an ENOSPC error on the WAL
partition and if so turn it off, clean old WAL segments, and march on.

The major concern being that someone might have a bad backup. pg_stop_backup()
could scream but they might not notice. Not sure how much more we could do
about that.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!


Re: File system snapshots for multiple file systems

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> Whether it's a good idea or not is a bit debatable though.  I'm
>> concerned about the WAL partition filling up (--> PANIC), especially
>> if you forget to pg_stop_backup after getting your backup.

> We check if pg_start_backup in effect when we an ENOSPC error on the WAL
> partition and if so turn it off, clean old WAL segments, and march on.

> The major concern being that someone might have a bad backup. pg_stop_backup()
> could scream but they might not notice. Not sure how much more we could do
> about that.

Not putting in the foot-gun in the first place is what we could do about
it.
        regards, tom lane


Re: File system snapshots for multiple file systems

From
Magnus Hagander
Date:
Tom Lane wrote:
> Gregory Stark <stark@enterprisedb.com> writes:
> > "Tom Lane" <tgl@sss.pgh.pa.us> writes:
> >> Whether it's a good idea or not is a bit debatable though.  I'm
> >> concerned about the WAL partition filling up (--> PANIC),
> >> especially if you forget to pg_stop_backup after getting your
> >> backup.
> 
> > We check if pg_start_backup in effect when we an ENOSPC error on
> > the WAL partition and if so turn it off, clean old WAL segments,
> > and march on.
> 
> > The major concern being that someone might have a bad backup.
> > pg_stop_backup() could scream but they might not notice. Not sure
> > how much more we could do about that.
> 
> Not putting in the foot-gun in the first place is what we could do
> about it.

AFAIK, this is a foot-gun that other databases provide, simply because
it's *very* useful when used right. But if you leave a hanging backup
process, it *will* fill your disk and eventually shut down the
database. Making sure that does not happen is a function of the backup
software and of the monitoring software.

One way I think at least MSSQL deals with it is by issuing both the
start and end backup pieces (actually they run as a single command, but
internally it's split I'm sure) over the same connection, and if that
connection goes away, it'll automatically consider the backup aborted.
I would assume the others do something similar in the caes of a crash,
but if the backup process just hangs, it will fill up the disk (I've
had this happen back when backups were actually made to tape, and the
tape ran out, for example)

//Magnus


Re: File system snapshots for multiple file systems

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> What I was complaining/suggesting is that we should make what you did to 
>> actually work, because it's a lot simpler. And as Jonah pointed out, 
>> we'd need to inhibit checkpoints between pg_start_backup() and 
>> pg_stop_backup() to make it work.
> 
> I don't think that follows --- what you'd need is to prevent the
> checkpoints from removing/recycling old WAL files, but you can still
> allow a checkpoint to occur.  Any subsequent recovery from the backup
> would need to replay from the checkpoint identified by the backup label
> file anyway.

I was thinking that the restore would be a normal non-PITR recovery, but 
if we do it as a PITR restore, that's true.

> Whether it's a good idea or not is a bit debatable though.  I'm
> concerned about the WAL partition filling up (--> PANIC), especially
> if you forget to pg_stop_backup after getting your backup.

Yep, that would suck. We already have that problem if you set up 
continuous archiving, and archive_command starts failing, don't we?

As a simple safeguard, we could have user-settable max. number of 
segments, and give up on the backup after that. Though failing the 
backup isn't nice either..

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: File system snapshots for multiple file systems

From
Bruce Momjian
Date:
I have applied the following patch to document the use of continuous
archiving backups to allow non-simultaneous snapshots.

I don't think we want to go any farther to stop WAL recycling during
back backups.

---------------------------------------------------------------------------

Bruce Momjian wrote:
> Right now it isn't possible to use file system snapshots a reliable
> backup if you are using multiple file systems for tablespaces because
> most systems don't allow the simultaneous snapshoting of multiple file
> system.  Our documentation mentions this:
>
>     If your database is spread across multiple file systems, there might not
>     be any way to obtain exactly-simultaneous frozen snapshots of all the
>     volumes. For example, if your data files and WAL log are on different
>     disks, or if tablespaces are on different file systems, it might not be
>     possible to use snapshot backup because the snapshots must be
>     simultaneous. Read your file system documentation very carefully before
>     trusting to the consistent-snapshot technique in such situations. The
>     safest approach is to shut down the database server for long enough to
>     establish all the frozen snapshots.
>
> However, it occurred to me that if someone turned on continuous arciving
> during the file system snapshots, then you could use PITR to recover
> from file system snapshots that were not simultaneous.
>
> Should this be documented?
>
> --
>   Bruce Momjian  <bruce@momjian.us>        http://momjian.us
>   EnterpriseDB                             http://enterprisedb.com
>
>   + If your life is a hard drive, Christ can be your backup. +
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/backup.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/backup.sgml,v
retrieving revision 2.117
diff -c -c -r2.117 backup.sgml
*** doc/src/sgml/backup.sgml    5 Apr 2008 01:34:05 -0000    2.117
--- doc/src/sgml/backup.sgml    9 Apr 2008 02:39:44 -0000
***************
*** 386,394 ****
     not be possible to use snapshot backup because the snapshots
     <emphasis>must</> be simultaneous.
     Read your file system documentation very carefully before trusting
!    to the consistent-snapshot technique in such situations.  The safest
!    approach is to shut down the database server for long enough to
!    establish all the frozen snapshots.
    </para>

    <para>
--- 386,402 ----
     not be possible to use snapshot backup because the snapshots
     <emphasis>must</> be simultaneous.
     Read your file system documentation very carefully before trusting
!    to the consistent-snapshot technique in such situations.
!   </para>
!
!   <para>
!    If simultaneous snapshots are not possible, one option is to shut down
!    the database server long enough to establish all the frozen snapshots.
!    Another option is perform a continuous archiving base backup (<xref
!    linkend="backup-base-backup">) because such backups are immune to file
!    system changes during the backup.  This requires enabling continuous
!    archiving just during the backup process; restore is done using
!    continuous archive recovery (<xref linkend="backup-pitr-recovery">).
    </para>

    <para>