Thread: Seeking information about backup/recovery

Seeking information about backup/recovery

From

Mary Edie Meredith

Date:

04 September 2003, 14:54:20

Dear PostgreSQL admin'ers,


Our group at OSDL have been  porting our DBT test kits to PostgreSQL.
In getting up to speed on PostgreSQL, we have not found a way to recover
from a serious database failure (disk corruption, disk/volume failure).

The following scenario described in the 7.3 docs and the "PostgreSQL
7.4devel Documentation" is exactly what we are looking for:


"WAL offers the opportunity for a new method for database on-line backup
and restore (BAR). To use this method, one would have to make periodic
saves of data files to another disk, a tape or another host and also
archive the WAL log files. The database file copy and the archived log
files could be used to restore just as if one were restoring after a
crash. Each time a new database file copy was made the old log files
could be removed. Implementing this facility will require the logging of
data file and index creation and deletion; it will also require
development of a method for copying the data files (operating system
copy commands are not suitable). "

Since it states that WAL, "offers the opportunity" for what we need, we
concluded this is _not supported at 7.3 and the 7.4 beta releases.

Is this conclusion correct?

Of course we would like to do even more - point in time recovery,
incremental backup, but for now just the basics.

Point in Time recovery is listed as "Urgent" on the TODO list,
incremental backups are listed under "Admin", so it appears that those
items are recognized as important.  What we cannot understand is why the
basic backup/restore described above is not on the TODO list.

Can anyone enlighten us?



--
Mary Edie Meredith <maryedie@osdl.org>
Open Source Development Lab

Re: Seeking information about backup/recovery

From

Bruce Momjian

Date:

04 September 2003, 16:30:49

Right.  We need point-in-time-recovery.  Someone is working on it and we
hope to have it for 7.5.

---------------------------------------------------------------------------

Mary Edie Meredith wrote:
> Dear PostgreSQL admin'ers,
>
>
> Our group at OSDL have been  porting our DBT test kits to PostgreSQL.
> In getting up to speed on PostgreSQL, we have not found a way to recover
> from a serious database failure (disk corruption, disk/volume failure).
>
> The following scenario described in the 7.3 docs and the "PostgreSQL
> 7.4devel Documentation" is exactly what we are looking for:
>
>
> "WAL offers the opportunity for a new method for database on-line backup
> and restore (BAR). To use this method, one would have to make periodic
> saves of data files to another disk, a tape or another host and also
> archive the WAL log files. The database file copy and the archived log
> files could be used to restore just as if one were restoring after a
> crash. Each time a new database file copy was made the old log files
> could be removed. Implementing this facility will require the logging of
> data file and index creation and deletion; it will also require
> development of a method for copying the data files (operating system
> copy commands are not suitable). "
>
> Since it states that WAL, "offers the opportunity" for what we need, we
> concluded this is _not supported at 7.3 and the 7.4 beta releases.
>
> Is this conclusion correct?
>
> Of course we would like to do even more - point in time recovery,
> incremental backup, but for now just the basics.
>
> Point in Time recovery is listed as "Urgent" on the TODO list,
> incremental backups are listed under "Admin", so it appears that those
> items are recognized as important.  What we cannot understand is why the
> basic backup/restore described above is not on the TODO list.
>
> Can anyone enlighten us?
>
>
>
> --
> Mary Edie Meredith <maryedie@osdl.org>
> Open Source Development Lab
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Seeking information about backup/recovery

From

Murthy Kambhampaty

Date:

04 September 2003, 18:35:42

IMHO, while point-in-time recovery would be great to have, there are many
applications that benefit from having online backup and recovery without
needing log roll-forward. For example, Oracle contrasts "Full Database
Point-in-time Recovery", the feature mentioned by Bruce, with "Tablespace
Point in Time Recovery (TSPITR)" [1]. With postgresql and WAL, you can
implement the equivalent of TSPITR. By using filesystem snapshot
capabilities provided by certain storage appliances (NetApp Filer?) or by
operating system service in Linux (Logical Volume Manager, "LVM") to "...
make periodic saves of data files to another disk, a tape or another host
and also archive the WAL log files". Once you have the copy, you can put it
on disk, start a postgresql server (call it the backup server) on it, and
then dump and restore the data to a running server ("production server") on
the same or a different host.

This procedure allows you to recover a recent copy of any database, schema
or table, depending on the frequency with which you take snapshots. By using
rysnc to copy the snapshots over to a backup disk volume you can get
incremental backup capability, which shrinks the backup window to a few
minutes (and less than an hour even for databases with multiple gigabytes
worth of INSERTs). With such a small backup window, one can minimize data
loss in case of disk failure or corruption by increasing backup frequency.

There have been several discussions of this, including here
http://marc.theaimsgroup.com/?l=postgresql-admin&w=2&r=1&s=backup+routine&q=
b
here
http://marc.theaimsgroup.com/?l=postgresql-admin&w=2&r=1&s=LVM+snapshots&q=b
and here
http://marc.theaimsgroup.com/?l=postgresql-general&m=104610149723362&w=4


Cheers,
    Murthy


[1]
http://storacle.princeton.edu:9001/oracle8-doc/server.805/a58396/ch13.htm)


>-----Original Message-----
>From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
>Sent: Thursday, September 04, 2003 14:16
>To: Mary Edie Meredith
>Cc: pgsql-admin@postgresql.org; osdldbt-general
>Subject: Re: [ADMIN] Seeking information about backup/recovery
>
>
>
>Right.  We need point-in-time-recovery.  Someone is working on
>it and we
>hope to have it for 7.5.
>
>---------------------------------------------------------------
>------------
>
>Mary Edie Meredith wrote:
>> Dear PostgreSQL admin'ers,
>>
>>
>> Our group at OSDL have been  porting our DBT test kits to
>PostgreSQL.
>> In getting up to speed on PostgreSQL, we have not found a
>way to recover
>> from a serious database failure (disk corruption,
>disk/volume failure).
>>
>> The following scenario described in the 7.3 docs and the "PostgreSQL
>> 7.4devel Documentation" is exactly what we are looking for:
>>
>>
>> "WAL offers the opportunity for a new method for database
>on-line backup
>> and restore (BAR). To use this method, one would have to
>make periodic
>> saves of data files to another disk, a tape or another host and also
>> archive the WAL log files. The database file copy and the
>archived log
>> files could be used to restore just as if one were restoring after a
>> crash. Each time a new database file copy was made the old log files
>> could be removed. Implementing this facility will require
>the logging of
>> data file and index creation and deletion; it will also require
>> development of a method for copying the data files (operating system
>> copy commands are not suitable). "
>>
>> Since it states that WAL, "offers the opportunity" for what
>we need, we
>> concluded this is _not supported at 7.3 and the 7.4 beta releases.
>>
>> Is this conclusion correct?
>>
>> Of course we would like to do even more - point in time recovery,
>> incremental backup, but for now just the basics.
>>
>> Point in Time recovery is listed as "Urgent" on the TODO list,
>> incremental backups are listed under "Admin", so it appears
>that those
>> items are recognized as important.  What we cannot
>understand is why the
>> basic backup/restore described above is not on the TODO list.
>>
>> Can anyone enlighten us?
>>
>>
>>
>> --
>> Mary Edie Meredith <maryedie@osdl.org>
>> Open Source Development Lab
>>
>>
>> ---------------------------(end of
>broadcast)---------------------------
>> TIP 7: don't forget to increase your free space map settings
>>
>
>--
>  Bruce Momjian                        |  http://candle.pha.pa.us
>  pgman@candle.pha.pa.us               |  (610) 359-1001
>  +  If your life is a hard drive,     |  13 Roberts Road
>  +  Christ can be your backup.        |  Newtown Square,
>Pennsylvania 19073
>
>---------------------------(end of
>broadcast)---------------------------
>TIP 9: the planner will ignore your desire to choose an index
>scan if your
>      joining column's datatypes do not match
>

Re: Seeking information about backup/recovery

From

Bruce Momjian

Date:

04 September 2003, 19:12:30

Murthy Kambhampaty wrote:
> IMHO, while point-in-time recovery would be great to have, there are many
> applications that benefit from having online backup and recovery without
> needing log roll-forward. For example, Oracle contrasts "Full Database
> Point-in-time Recovery", the feature mentioned by Bruce, with "Tablespace
> Point in Time Recovery (TSPITR)" [1]. With postgresql and WAL, you can
> implement the equivalent of TSPITR. By using filesystem snapshot
> capabilities provided by certain storage appliances (NetApp Filer?) or by
> operating system service in Linux (Logical Volume Manager, "LVM") to "...
> make periodic saves of data files to another disk, a tape or another host
> and also archive the WAL log files". Once you have the copy, you can put it
> on disk, start a postgresql server (call it the backup server) on it, and
> then dump and restore the data to a running server ("production server") on
> the same or a different host.
>
> This procedure allows you to recover a recent copy of any database, schema
> or table, depending on the frequency with which you take snapshots. By using
> rysnc to copy the snapshots over to a backup disk volume you can get
> incremental backup capability, which shrinks the backup window to a few
> minutes (and less than an hour even for databases with multiple gigabytes
> worth of INSERTs). With such a small backup window, one can minimize data
> loss in case of disk failure or corruption by increasing backup frequency.

I assume you are contrasting _any_ point-in-time recovery to recover up
to the crash point, right?

Anyway, unfortunately, WAL doesn't contain enough information to recover
without having the file system files in some consistent state, even if
that state is old.  In fact, the files have to be consistent as of the
last checkpoint.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Seeking information about backup/recovery

From

Murthy Kambhampaty

Date:

04 September 2003, 20:30:01

On Thursday, September 04, 2003 18:12, Bruce Momjian
[mailto:pgman@candle.pha.pa.us]
>Murthy Kambhampaty wrote:
...
>I assume you are contrasting _any_ point-in-time recovery to recover up
>to the crash point, right?
>
Right.

>Anyway, unfortunately, WAL doesn't contain enough information
>to recover
>without having the file system files in some consistent state, even if
>that state is old.  In fact, the files have to be consistent as of the
>last checkpoint.
This, I'm not so sure of. On Linux, an xfs_freeze -f <fs_to_freeze> pauses
the filesystem and flushes all writes to disk; a snapshot of $PGDATA's
filesystem taken while it is frozen gives a consistent copy of $PGDATA as of
that instant (similar functionality is obtained by compiling the VFS locking
patch into the kernel).

The discussion at
http://marc.theaimsgroup.com/?l=postgresql-admin&w=2&r=1&s=LVM+snapshots&q=b
, includes log files from postmaster startup and shutdown on the backup
$PGDATA, and AFAICT, WAL recovery does not roll back to the last checkpoint.
If there is a better way to test this, let me know, and I'm happy to do it.

Thanks,
    Murthy

PS: From the man page for xfs_freeze:

"The -f flag requests the specified XFS filesystem to be frozen from new
modifications.   When this is selected, all ongoing transactions in the
filesystem are allowed to complete, new write system calls are  halted,
other calls which modify the filesystem are halted, and all dirty data,
metadata, and  log  information  are  written  to  disk.   Any  process
attempting to write to the frozen filesystem will block waiting for the
filesystem to be unfrozen."

When $PGDATA/pg_xlog/ is on disks different from $PGDATA's,
the XFS filesystem still allows online BAR with the following sequence:

1. rysnc -avr --delete $PGDATA/ <backup server>::mirror_pgdata/
2. xfs_freeze -f $PGDATA/pg_clog/
3. xfs_freeze -f $PGDATA
4. create snapshots and mount
5. xfs_freeze -f $PGDATA
6. xfs_freeze -f $PGDATA/pg_clog/
7. rysnc -avr --delete --exclude=pg_xlog/ $PGDATA/ <backup
server>::mirror_pgdata/pg_xlog/
8. rysnc -avr --delete $PGDATA/pg_xlog/ $PGDATA/ <backup
server>::mirror_pgdata/pg_xlog/
9. remove snapshots

By freezing both volumes ($PGDATA/pg_clog/ and $PGDATA/) during snapshot
creation,
a "consistent" copy is assured. Freezing the pg_xlog first, and unfreezing
it last, makes sure
that no CHECKPOINT operations are missed, ensuring the consistency of the
copy.

Re: Seeking information about backup/recovery

From

Bruce Momjian

Date:

04 September 2003, 21:33:03

Murthy Kambhampaty wrote:
> >Anyway, unfortunately, WAL doesn't contain enough information
> >to recover
> >without having the file system files in some consistent state, even if
> >that state is old.  In fact, the files have to be consistent as of the
> >last checkpoint.
> This, I'm not so sure of. On Linux, an xfs_freeze -f <fs_to_freeze> pauses
> the filesystem and flushes all writes to disk; a snapshot of $PGDATA's
> filesystem taken while it is frozen gives a consistent copy of $PGDATA as of
> that instant (similar functionality is obtained by compiling the VFS locking
> patch into the kernel).
>
> The discussion at
> http://marc.theaimsgroup.com/?l=postgresql-admin&w=2&r=1&s=LVM+snapshots&q=b
> , includes log files from postmaster startup and shutdown on the backup
> $PGDATA, and AFAICT, WAL recovery does not roll back to the last checkpoint.
> If there is a better way to test this, let me know, and I'm happy to do it.

Yes, this has been discussed before.  If you snapshot xfs, you can then
restore using that snapshot.  Of course, you will be recovering from WAL
as though there was a crash, but that works fine.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [osdldbt-general] Re: Seeking information about

From

Mary Edie Meredith

Date:

06 September 2003, 15:33:35

On Thu, 2003-09-04 at 22:06, Justin Clift wrote:
> Hi Mary,
>
> Are you talking about something more "realtime" than pg_dumpall?

Yes, definitely.
>
> You may be wondering "why the heck haven't they added this before?"...
> just imagine this scenario... we *very rarely* encounter database
> failures that aren't hardware related, and pretty much any decent admin
> has a good backup schedule in place.

True.

I guess I'm trying to understand what is the "good backup schedule" and
procedure in the case of PostgreSQL.

Let's say I cannot afford to lose more than one day of updates.  It
appears that I need to backup nightly the entire database.  Whether you
use either pg_dumpall or you backup by simply coping the files,  this
approach does not scale for enterprise size databases.  For the copy
scheme the database has to be down, effecting availability.  The restore
for pg_dumpall would be longer than for the copy scheme since the
indexes have to be reconstructed.  An enterprise database system would
have lots equipment to avoid single points of failure, but they still
happen, human mistakes happen (oops I deleted the table!) , etc, etc, so
you still have to do the backups.

If the loss of a day's work is too costly, you will be faced with
backing up even more frequently.
>
> PITR is definitely a needed thing though, so that we can support those
> businesses where every bit of data can't afford to be lost after the
> COMMIT.  There are other solutions presently (replication, various
> hardware things, etc), but PITR is the best move forwards.

Point in time recovery will certainly solve the issue.  However, PITR is
really more than I was asking to get.  I was just asking for a way to
recover to the last commit (RTTLC).

In this scenario (I don't expect this to be big news to anyone, I'm just
explaining what I was hoping to find) there is one full backup made
occasionally.  The updates from that point are saved in an archived
log.  This is usually done with the database up.  In a failure
situation, you have to restore from the full backup and re-run the
updates from the archived logs.  You have to run all of them to get a
consistent database back.   The frequency of full backups is determined
by how long you are willing to wait to do a restore.

PITR is more precise in that you can run the archived logs up to the
point that you want (like before you deleted that table).

So recovering to a specific point in time is still _very desirable and
totally superior, I am just used to seeing RTTLC appear first as
features evolve in database development and was expecting to see it
called out separately in the TODO list.

My conclusion from the responses so far is that the community is
planning to deliver RTTLC as part of PITR.

BTW, thanks to all for your patient and kind responses.  You can't
imagine how _much it is appreciated.

>
> :-)
>
> Regards and best wishes,
>
> Justin Clift
>
>
> Bruce Momjian wrote:
>
> > Right.  We need point-in-time-recovery.  Someone is working on it and we
> > hope to have it for 7.5.
> >
> > ---------------------------------------------------------------------------
> >
> > Mary Edie Meredith wrote:
> >
> >>Dear PostgreSQL admin'ers,
> >>
> >>
> >>Our group at OSDL have been  porting our DBT test kits to PostgreSQL.
> >>In getting up to speed on PostgreSQL, we have not found a way to recover
> >>from a serious database failure (disk corruption, disk/volume failure).
> >>
> >>The following scenario described in the 7.3 docs and the "PostgreSQL
> >>7.4devel Documentation" is exactly what we are looking for:
> >>
> >>
> >>"WAL offers the opportunity for a new method for database on-line backup
> >>and restore (BAR). To use this method, one would have to make periodic
> >>saves of data files to another disk, a tape or another host and also
> >>archive the WAL log files. The database file copy and the archived log
> >>files could be used to restore just as if one were restoring after a
> >>crash. Each time a new database file copy was made the old log files
> >>could be removed. Implementing this facility will require the logging of
> >>data file and index creation and deletion; it will also require
> >>development of a method for copying the data files (operating system
> >>copy commands are not suitable). "
> >>
> >>Since it states that WAL, "offers the opportunity" for what we need, we
> >>concluded this is _not supported at 7.3 and the 7.4 beta releases.
> >>
> >>Is this conclusion correct?
> >>
> >>Of course we would like to do even more - point in time recovery,
> >>incremental backup, but for now just the basics.
> >>
> >>Point in Time recovery is listed as "Urgent" on the TODO list,
> >>incremental backups are listed under "Admin", so it appears that those
> >>items are recognized as important.  What we cannot understand is why the
> >>basic backup/restore described above is not on the TODO list.
> >>
> >>Can anyone enlighten us?
> >>
> >>
> >>
> >>--
> >>Mary Edie Meredith <maryedie@osdl.org>
> >>Open Source Development Lab
> >>
> >>
> >>---------------------------(end of broadcast)---------------------------
> >>TIP 7: don't forget to increase your free space map settings
> >>
> >
> >
>
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> osdldbt-general mailing list
> osdldbt-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/osdldbt-general
--
Mary Edie Meredith <maryedie@osdl.org>
Open Source Development Lab

Re: [osdldbt-general] Re: Seeking information about

From

Tom Lane

Date:

06 September 2003, 22:02:47

Mary Edie Meredith <maryedie@osdl.org> writes:
> My conclusion from the responses so far is that the community is
> planning to deliver RTTLC as part of PITR.

Right, exactly.  Given the WAL code we already have, the next step is
to add management facilities that let you archive and replay WAL logs,
instead of just discarding them after the next checkpoint.  AFAICS,
there's really no significant increase in complexity to support recovery
to a past time point as well as recovery to the end of the log.

            regards, tom lane