Thread: Incremental backup

Incremental backup

From
Martin Marques
Date:
How's this issue going on the 7.4 development tree?
I saw it on the TODO list, but didn't find much on the archives of this
mailing list.

--
Porqué usar una base de datos relacional cualquiera,
si podés usar PostgreSQL?
-----------------------------------------------------------------
Martín Marqués                  |        mmarques@unl.edu.ar
Programador, Administrador, DBA |       Centro de Telematica                      Universidad Nacional
        del Litoral 
-----------------------------------------------------------------



Re: Incremental backup

From
Bruce Momjian
Date:
Someone at Red Hat is working on point-in-time recovery, also known as
incremental backups.  It will be in 7.4.

---------------------------------------------------------------------------

Martin Marques wrote:
> How's this issue going on the 7.4 development tree?
> I saw it on the TODO list, but didn't find much on the archives of this 
> mailing list.
> 
> -- 
> Porqu? usar una base de datos relacional cualquiera,
> si pod?s usar PostgreSQL?
> -----------------------------------------------------------------
> Mart?n Marqu?s                  |        mmarques@unl.edu.ar
> Programador, Administrador, DBA |       Centro de Telematica
>                        Universidad Nacional
>                             del Litoral
> -----------------------------------------------------------------
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Incremental backup

From
"Christopher Kings-Lynne"
Date:
> Someone at Red Hat is working on point-in-time recovery, also known as
> incremental backups.  It will be in 7.4.

Does that mean that the poor guy/gal is implementing redo for all the index
types?

Chris



Re: Incremental backup

From
Bruce Momjian
Date:
Christopher Kings-Lynne wrote:
> > Someone at Red Hat is working on point-in-time recovery, also known as
> > incremental backups.  It will be in 7.4.
> 
> Does that mean that the poor guy/gal is implementing redo for all the index
> types?

No idea.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Incremental backup

From
Patrick Macdonald
Date:
Bruce Momjian wrote:
> 
> Someone at Red Hat is working on point-in-time recovery, also known as
> incremental backups. 

PITR and incremental backup are different beasts.  PITR deals with a backup
+ logs.  Incremental backup deals with a full backup + X smaller/incremental
backups.

So... it doesn't look like anyone is working on incremental backup at the 
moment.

Cheers,
Patrick

> It will be in 7.4.
> 
> ---------------------------------------------------------------------------
> 
> Martin Marques wrote:
> > How's this issue going on the 7.4 development tree?
> > I saw it on the TODO list, but didn't find much on the archives of this
> > mailing list.
> >
> > --
> > Porqu? usar una base de datos relacional cualquiera,
> > si pod?s usar PostgreSQL?
> > -----------------------------------------------------------------
> > Mart?n Marqu?s                  |        mmarques@unl.edu.ar
> > Programador, Administrador, DBA |       Centro de Telematica
> >                        Universidad Nacional
> >                             del Litoral
> > -----------------------------------------------------------------
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
> >
> 
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster


Re: Incremental backup

From
Bruce Momjian
Date:
Patrick Macdonald wrote:
> Bruce Momjian wrote:
> > 
> > Someone at Red Hat is working on point-in-time recovery, also known as
> > incremental backups. 
> 
> PITR and incremental backup are different beasts.  PITR deals with a backup
> + logs.  Incremental backup deals with a full backup + X smaller/incremental
> backups.
> 
> So... it doesn't look like anyone is working on incremental backup at the 
> moment.

But why would someone want incremental backups compared to PITR?  The
backup would be mixture of INSERTS, UPDATES, and DELETES, right?  Seems
pretty weird.  :-)

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Incremental backup

From
Patrick Macdonald
Date:
Bruce Momjian wrote:
> 
> Patrick Macdonald wrote:
> > Bruce Momjian wrote:
> > >
> > > Someone at Red Hat is working on point-in-time recovery, also known as
> > > incremental backups.
> >
> > PITR and incremental backup are different beasts.  PITR deals with a backup
> > + logs.  Incremental backup deals with a full backup + X smaller/incremental
> > backups.
> >
> > So... it doesn't look like anyone is working on incremental backup at the
> > moment.
> 
> But why would someone want incremental backups compared to PITR?  The
> backup would be mixture of INSERTS, UPDATES, and DELETES, right?  Seems
> pretty weird.  :-)

Yeah, it's a different method of producing a similar outcome.  However, many
companies do not want to be concerned with the management (and space)
of archived logs.  Incremental backup allows them the option of performing 
a full backup and then only backing up the modifications on a regular basis.
When it's time to restore, they'll restore the full backup and then the 
proper sequence of incremental backups. 

Cheers,
Patrick


Re: Incremental backup

From
Bruce Momjian
Date:
Patrick Macdonald wrote:
> > But why would someone want incremental backups compared to PITR?  The
> > backup would be mixture of INSERTS, UPDATES, and DELETES, right?  Seems
> > pretty weird.  :-)
> 
> Yeah, it's a different method of producing a similar outcome.  However, many
> companies do not want to be concerned with the management (and space)
> of archived logs.  Incremental backup allows them the option of performing 
> a full backup and then only backing up the modifications on a regular basis.
> When it's time to restore, they'll restore the full backup and then the 
> proper sequence of incremental backups. 

Wow, I never even thought that was possible.  Do other db's support that
feature?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Incremental backup

From
Rod Taylor
Date:
> Wow, I never even thought that was possible.  Do other db's support that
> feature?

Isn't that basically what the current replication kits for Postgresql do
-- via triggers and log tables?

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Re: Incremental backup

From
Patrick Macdonald
Date:
Bruce Momjian wrote:
> 
> Patrick Macdonald wrote:
>
> > Yeah, it's a different method of producing a similar outcome.  However, many
> > companies do not want to be concerned with the management (and space)
> > of archived logs.  Incremental backup allows them the option of performing
> > a full backup and then only backing up the modifications on a regular basis.
> > When it's time to restore, they'll restore the full backup and then the
> > proper sequence of incremental backups.
> 
> Wow, I never even thought that was possible.  Do other db's support that
> feature?

I know Oracle and DB2 have incremental backup in their arsenal (and iirc,
SQL Server has something called "differential backup").  Whatever the name,
it's a win at the enterprise level. 

Cheers,
Patrick


Re: Incremental backup

From
Manfred Koizar
Date:
On Thu, 13 Feb 2003 19:24:13 -0500, Patrick Macdonald
<patrickm@redhat.com> wrote:
>I know Oracle and DB2 have incremental backup in their arsenal (and iirc,
>SQL Server has something called "differential backup").  Whatever the name,
>it's a win at the enterprise level. 

"A differential backup copies only the database pages that have been
modified after the last full database backup."

This could be doable using XLogRecPtr pd_lsn in the page headers, but
I don't see an easy way to do it on a live database.

ServusManfred


Re: Incremental backup

From
Martin Marques
Date:
On Jue 13 Feb 2003 16:38, Bruce Momjian wrote:
> Patrick Macdonald wrote:
> > Bruce Momjian wrote:
> > > Someone at Red Hat is working on point-in-time recovery, also known as
> > > incremental backups.
> >
> > PITR and incremental backup are different beasts.  PITR deals with a
> > backup + logs.  Incremental backup deals with a full backup + X
> > smaller/incremental backups.
> >
> > So... it doesn't look like anyone is working on incremental backup at the
> > moment.
>
> But why would someone want incremental backups compared to PITR?  The
> backup would be mixture of INSERTS, UPDATES, and DELETES, right?  Seems
> pretty weird.  :-)

Good backup systems, such as Informix (it's the one I used) doesn't do a query
backup, but a pages backup. What I mean is that it looks for pages in the
system that has changed from the las full backup and backs them up.

That's how an incremental backup works. PITR is another thing, which is even
more important. :-)

--
Porqué usar una base de datos relacional cualquiera,
si podés usar PostgreSQL?
-----------------------------------------------------------------
Martín Marqués                  |        mmarques@unl.edu.ar
Programador, Administrador, DBA |       Centro de Telematica                      Universidad Nacional
        del Litoral 
-----------------------------------------------------------------



Re: Incremental backup

From
Bruce Momjian
Date:
OK, once we have PITR, will anyone want incremental backups?

---------------------------------------------------------------------------

Martin Marques wrote:
> On Jue 13 Feb 2003 16:38, Bruce Momjian wrote:
> > Patrick Macdonald wrote:
> > > Bruce Momjian wrote:
> > > > Someone at Red Hat is working on point-in-time recovery, also known as
> > > > incremental backups.
> > >
> > > PITR and incremental backup are different beasts.  PITR deals with a
> > > backup + logs.  Incremental backup deals with a full backup + X
> > > smaller/incremental backups.
> > >
> > > So... it doesn't look like anyone is working on incremental backup at the
> > > moment.
> >
> > But why would someone want incremental backups compared to PITR?  The
> > backup would be mixture of INSERTS, UPDATES, and DELETES, right?  Seems
> > pretty weird.  :-)
> 
> Good backup systems, such as Informix (it's the one I used) doesn't do a query 
> backup, but a pages backup. What I mean is that it looks for pages in the 
> system that has changed from the las full backup and backs them up.
> 
> That's how an incremental backup works. PITR is another thing, which is even 
> more important. :-)
> 
> -- 
> Porqu? usar una base de datos relacional cualquiera,
> si pod?s usar PostgreSQL?
> -----------------------------------------------------------------
> Mart?n Marqu?s                  |        mmarques@unl.edu.ar
> Programador, Administrador, DBA |       Centro de Telematica
>                        Universidad Nacional
>                             del Litoral
> -----------------------------------------------------------------
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
> http://www.postgresql.org/users-lounge/docs/faq.html
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Incremental backup

From
Martin Marques
Date:
On Vie 14 Feb 2003 09:52, Bruce Momjian wrote:
> OK, once we have PITR, will anyone want incremental backups?

I will probably not need it, but I know of people how have databases which
build dumps of more then 20GB.
They are interested in live incremental backups.

--
Porqué usar una base de datos relacional cualquiera,
si podés usar PostgreSQL?
-----------------------------------------------------------------
Martín Marqués                  |        mmarques@unl.edu.ar
Programador, Administrador, DBA |       Centro de Telematica                      Universidad Nacional
        del Litoral 
-----------------------------------------------------------------



Re: Incremental backup

From
Greg Copeland
Date:
On Fri, 2003-02-14 at 06:52, Bruce Momjian wrote:
> OK, once we have PITR, will anyone want incremental backups?
> 
> ---------------------------------------------------------------------------
> 
> Martin Marques wrote:
> > On Jue 13 Feb 2003 16:38, Bruce Momjian wrote:
> > > Patrick Macdonald wrote:
> > > > Bruce Momjian wrote:
> > > > > Someone at Red Hat is working on point-in-time recovery, also known as
> > > > > incremental backups.
> > > >
> > > > PITR and incremental backup are different beasts.  PITR deals with a
> > > > backup + logs.  Incremental backup deals with a full backup + X
> > > > smaller/incremental backups.
> > > >
> > > > So... it doesn't look like anyone is working on incremental backup at the
> > > > moment.
> > >
> > > But why would someone want incremental backups compared to PITR?  The
> > > backup would be mixture of INSERTS, UPDATES, and DELETES, right?  Seems
> > > pretty weird.  :-)
> > 
> > Good backup systems, such as Informix (it's the one I used) doesn't do a query 
> > backup, but a pages backup. What I mean is that it looks for pages in the 
> > system that has changed from the las full backup and backs them up.
> > 
> > That's how an incremental backup works. PITR is another thing, which is even 
> > more important. :-)

I do imagine for some people it will register high on their list.

-- 
Greg Copeland <greg@copelandconsulting.net>
Copeland Computer Consulting



Re: Incremental backup

From
Kevin Brown
Date:
Bruce Momjian wrote:
> 
> OK, once we have PITR, will anyone want incremental backups?

None of my database references (Date's "Introduction to Database
Systems" and Garcia-Molina's "Database Systems - The Complete Book",
in particular) seem to talk about PITR at all.  At least, there's no
index entry for it.  And a google search for "point in time recovery"
yields mostly marketing fluff.

Is there a good reference for this that someone can point me to?  I'm
interested in exactly how it'll work, especially in terms of how logs
are stored versus the main data store, effects on performance, etc.

Thanks, and sorry for the newbie question.  :-(


-- 
Kevin Brown                          kevin@sysexperts.com


Re: Incremental backup

From
Christopher Browne
Date:
A long time ago, in a galaxy far, far away, kevin@sysexperts.com (Kevin Brown) wrote:
> Bruce Momjian wrote:
>> 
>> OK, once we have PITR, will anyone want incremental backups?
>
> None of my database references (Date's "Introduction to Database
> Systems" and Garcia-Molina's "Database Systems - The Complete Book",
> in particular) seem to talk about PITR at all.  At least, there's no
> index entry for it.  And a google search for "point in time recovery"
> yields mostly marketing fluff.

Well, from an "academic DBMS" standpoint, it isn't terribly
interesting, since it involves assumptions of messy imperfection that
academics prefer to avoid.  

And that's not intended to insult the academics; it is often
reasonable to leave that "out of scope" much as an academic OS
researcher might prefer to try to avoid putting attention on things
like binary linkers, text file editors, and SCM systems like CVS,
which, while terribly important from a practical standpoint, don't
make for interesting OS research.

> Is there a good reference for this that someone can point me to?
> I'm interested in exactly how it'll work, especially in terms of how
> logs are stored versus the main data store, effects on performance,
> etc.
>
> Thanks, and sorry for the newbie question.  :-(

Unfortunately, the best sources I can think of are in the "O-Word"
literature, and the /practical/ answers require digging into really
messy bits of the documentation.

What it amounts to is that anyone that isn't a "near-O*****-guru" would be
strongly advised not to engage in PITR activity.  

It doesn't surprise me overly that the documentation is poor: those
that can't figure it out despite the challenges almost surely
shouldn't be using the functionality...

What PITR generally consists of is the notion that you want to recover
to the state at a particular moment in time.

In O*****-nomenclature, this means that you recover as at some earlier
moment for which you have a good backup, and then re-apply changes,
which in their terms, are kept in "archive logs," which are somewhat
analagous to WAL files.
-- 
If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me
http://www3.sympatico.ca/cbbrowne/x.html
"We blew it -- too big, too slow..." - Bill Gates talking about NT, as
noted by Steven McGeady of Intel during a meeting with Gates


Re: Incremental backup

From
Kevin Brown
Date:
Christopher Browne wrote:
> What PITR generally consists of is the notion that you want to recover
> to the state at a particular moment in time.
> 
> In O*****-nomenclature, this means that you recover as at some earlier
> moment for which you have a good backup, and then re-apply changes,
> which in their terms, are kept in "archive logs," which are somewhat
> analagous to WAL files.

Yeah, that's pretty much what I figured.

Oracle has something they call "rollback segments" which I assume are
separate bits of data that have enough information to reverse changes
that were made to the database during a transaction, and I figured
PITR would (or could) apply particular saved rollback segments to the
current state in order to "roll back" a table, tablespace, or database
to the state it was in at a particular point in time.

As it is, it sounds like PITR is a bit less refined than I expected.


So the relevant question is: how is *our* PITR going to work?  In
particular, how is it going to interact with our WAL files and the
table store?  If I'm not mistaken, right now (well, as of 7.2 anyway)
we round robin through a fixed set of WAL files.  For PITR, I assume
we'd need an archivelog function that would copy the WAL files as
they're checkpointed to some other location (with destination names
that reflect their order in time), just for starters.


It'd be *awfully* nice if you could issue a command to roll a table
(or, perhaps, a tablespace, if you've got a bunch of foreign keys and
such) back to a particular point in time, from the command line, with
no significant advance preparation (so long as the required files are
still around, and if they're not then abort the operation with the
appropriate error message).  But it doesn't sound like that's what
we're talking about when we talk about PITR...



I wouldn't expect the O***** docs to be particularly revealing about
how the database manages PITR at the file level, but if it does, would
you happen to know where so I can look at it?  What I've seen so far
is very basic and not very revealing at all...



-- 
Kevin Brown                          kevin@sysexperts.com


Re: Incremental backup

From
Curt Sampson
Date:
On Fri, 14 Feb 2003, Bruce Momjian wrote:

> OK, once we have PITR, will anyone want incremental backups?

Well, I'm not entirely clear on how PITR will work, so I may be off-base
here, but it seems to me that offering incremental backups that back
up only changed pages might not be all that big a win, given how
postgres writes its pages. On DBMSs that don't use MVCC, if you change a
particular item in a row ten times, one page is changed. If you do it in
postgres, you could well be changing ten pages, as the system writes the
two copies of the entire row wherever it can find space. So in databases
where a lot of rows are changed, where an incremental backup would
normally be a win because it would be much smaller than the logs over a
given period, it isn't going to be with postgres.

But you know, if we could get rid of redundant changes in the logs we're
using for backup, that could save a lot of space in a situation like
the one I described above. If a particular row and column is changed
fifty times over the course of a month, it's going to be recorded fifty
times in the log. But there's really no need for all fifty of those,
if you don't mind not being able to restore to any time before the
current time. You can reduce the size of the logs you need to store
for backup by throwing away the first forty-nine of those changes, and
keeping only the most recent version. There shouldn't be any worries
about referential integrity, because when you do a restore, you start
with a full backup that is ok, and once you've successfully applied
all the transactions in the log, you know it will be ok again, so any
intermediate states during the restore where integrity is not maintained
are not a problem.

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 


Re: Incremental backup

From
Tom Lane
Date:
Curt Sampson <cjs@cynic.net> writes:
> ... But there's really no need for all fifty of those,
> if you don't mind not being able to restore to any time before the
> current time.

Which, of course, is exactly the point of PITR designs.

When you know that your assistant trainee DBA deleted most of your
database with a mistyped command last Tuesday evening around 8pm,
it is cold comfort to know that your database has faithfully preserved
his committed changes.  You want to get back to where you were Tuesday
afternoon, or preferably Tuesday evening 7:59pm.  This is what PITR
setups can do for you.

If you don't feel you need PITR capability, fine ... but don't tell
the people who want it that they have no need for it.
        regards, tom lane


Re: Incremental backup

From
Curt Sampson
Date:
On Fri, 14 Feb 2003, Kevin Brown wrote:

> Oracle has something they call "rollback segments" which I assume are
> separate bits of data that have enough information to reverse changes
> that were made to the database during a transaction, and I figured
> PITR would (or could) apply particular saved rollback segments to the
> current state in order to "roll back" a table, tablespace, or database
> to the state it was in at a particular point in time.

You're right about what rollback segments are; they hold the information
about previous versions of a row, so that if a transaction is rolled
back, the previous data can be restored. (Postgres doesn't need this,
since it doesn't update the rows in place: the old copy of the row stays
as and where it is, and a new copy is made with the new data. [The old
copy can later be removed, after no transactions reference it any more,
with a VACUUM.])

Oracle9i has a new feature called a "flashback query," which uses the
information in the rollback segements to let you query the database
in a previous state. (I.e., "select such and such from this table as
of two hours ago.") Postgres could do this using the older copies of
rows as well, though the performance often wouldn't be pretty, since
your indexes become useless, I believe. (Don't they point to only the
latest copy of a row?) Still, it would be cool and life-saving in some
situations.

But yeah, PITR takes a snapshot and goes in a forward direction, not a
backwards one. This is just what Oracle does, too, using the redo logs.

> For PITR, I assume we'd need an archivelog function that would
> copy the WAL files as they're checkpointed to some other location
> (with destination names that reflect their order in time), just for
> starters.

Well, I'm not sure that you'd really need to have any special archiving
facilities: you just need to let the old files sit there, and keep
creating new ones. You do lose a bit of performance in that you can't
recycle log segments, but you could always just fork off a (well niced)
process to create a new, empty log segment at the time you start in on
the last pre-created one, so that you will have another pre-created one
ready when you finish the current one.

BTW, why exactly do we pre-create log segments, anyway? I see this comment
in backend/access/transam/xlog.c:
   /*    * Zero-fill the file.  We have to do this the hard way to ensure that    * all the file space has really been
allocated--- on platforms that    * allow "holes" in files, just seeking to the end doesn't allocate    * intermediate
space. This way, we know that we have all the space    * and (after the fsync below) that all the indirect blocks are
down   * on disk.  Therefore, fdatasync(2) or O_DSYNC will be sufficient to    * sync future writes to the log file.
*/

This seems to imply to me that fdatasync will, when synchronizing the
data blocks of a file, not necessarially synchronize the indirect
blocks, which seems a little...odd. It's not like there's much point
in writing the data to the disk if you can't get to it. I'd understood
fdatasync as just a way to avoid updaing the inode's last changed date.
Are there OSes that implement fdatasync in a way that you could still
lose data?

> It'd be *awfully* nice if you could issue a command to roll a table
> (or, perhaps, a tablespace, if you've got a bunch of foreign keys and
> such) back to a particular point in time, from the command line, with
> no significant advance preparation (so long as the required files are
> still around, and if they're not then abort the operation with the
> appropriate error message).  But it doesn't sound like that's what
> we're talking about when we talk about PITR...

I don't think most people are thinking of that when they think of PITR;
I think they're thinking of applying changes from a log to a previous
version of a database.

And you can't do such a rollback at all, except on an entire database,
because of the potential integrity violations. The best you could do
would be to generate SQL for the changes you'd need to get back to the
previous point, and see if you can execute these changes.


cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 


Re: Incremental backup

From
Curt Sampson
Date:
On Sat, 15 Feb 2003, Tom Lane wrote:

> Curt Sampson <cjs@cynic.net> writes:
> > ... But there's really no need for all fifty of those,
> > if you don't mind not being able to restore to any time before the
> > current time.
>
> Which, of course, is exactly the point of PITR designs.
>
> When you know that your assistant trainee DBA deleted most of your
> database with a mistyped command last Tuesday evening around 8pm,
> it is cold comfort to know that your database has faithfully preserved
> his committed changes.  You want to get back to where you were Tuesday
> afternoon, or preferably Tuesday evening 7:59pm.  This is what PITR
> setups can do for you.
>
> If you don't feel you need PITR capability, fine ... but don't tell
> the people who want it that they have no need for it.

Hey, I never said you *have* to do this compression!

I envisioned it as an option. I'd like, for example, to be able to aim a
program that the last eight weeks worth of log files and say, "compress
the first seven weeks of this, but leave the last week fully intact."
Then I can save some space (quite a lot, if my updates have certain
characteristics), and yet still get back to Tuesday evening at 7:59 p.m.

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 


Re: Incremental backup

From
Tom Lane
Date:
Curt Sampson <cjs@cynic.net> writes:
> Oracle9i has a new feature called a "flashback query," which uses the
> information in the rollback segements to let you query the database
> in a previous state. (I.e., "select such and such from this table as
> of two hours ago.") Postgres could do this using the older copies of
> rows as well,

Yeah, good ol' time-travel.  This was built into Postgres in Berkeley
days, and later ripped out for performance reasons.

> though the performance often wouldn't be pretty, since
> your indexes become useless, I believe. (Don't they point to only the
> latest copy of a row?)

No.  If they did, they'd not work under MVCC.

> BTW, why exactly do we pre-create log segments, anyway?

Partly because the logic is designed to work in the PITR case, but
mostly because we don't want to suffer an out-of-disk-space condition
while we are in the midst of using a log segment.

> This seems to imply to me that fdatasync will, when synchronizing the
> data blocks of a file, not necessarially synchronize the indirect
> blocks, which seems a little...odd.

The comment is being paranoid about whether fdatasync is correctly
implemented everywhere.  Whether you consider this worry justified or
not is academic, since the possible out-of-disk-space failure mode is
reason enough to do it anyway.
        regards, tom lane


Re: Incremental backup

From
Marc Munro
Date:
Kevin,
Hi.  I was looking into PITR for PostgreSQL myself about a year back but
life intervened.  I am an Oracle DBA so may be able to help you with an
understanding of how Oracle does this.

You wrote:
> Oracle has something they call "rollback segments" which I assume are
> separate bits of data that have enough information to reverse changes
> that were made to the database during a transaction, and I figured
> PITR would (or could) apply particular saved rollback segments to the
> current state in order to "roll back" a table, tablespace, or database
> to the state it was in at a particular point in time.
> 
> As it is, it sounds like PITR is a bit less refined than I expected.

Actually Oracle uses its redo logs (like PostgreSQL' WAL) and
archivelogs (copies of old redo logs) rather than rollback segments.

The way it works, more or less, is that you start from a hot backup, and
roll-forward using archivelogs and redo logs to the point in time to
which you wish to recover.  The whole point of this is not so much that
we can restore our database to some point in the past, as that we can
recover to just before some disaster struck, from a starting point of a
previous hot backup.  In fact, the whole PITR thing for Oracle seems to
be simply an enabling technology for hot backups.  This seems to me to
be one of the killer enterprise features that PostgreSQL currently
lacks.

> So the relevant question is: how is *our* PITR going to work?  In
> particular, how is it going to interact with our WAL files and the
> table store?  If I'm not mistaken, right now (well, as of 7.2 anyway)
> we round robin through a fixed set of WAL files.  For PITR, I assume
> we'd need an archivelog function that would copy the WAL files as
> they're checkpointed to some other location (with destination names
> that reflect their order in time), just for starters.

I believe that three things are needed:
1) filesystem-based hot backups.  It may be possible to simply copy the
database files even though they would be open.  Oracle places each
tablespace into a hot-backup mode prior to taking the copy. In this
mode, updates to the files are queued-up to be applied once the
tablespace is taken out of hot-backup mode.  This all seems quite tricky
and will slow the database down.

2) Auto-archiving of WAL files.  Just as you suggest.

3) A recovery controller that can figure out the state of the restored
database, and manage the reapplication of archived and current WAL
files.
> 
> It'd be *awfully* nice if you could issue a command to roll a table
> (or, perhaps, a tablespace, if you've got a bunch of foreign keys and
> such) back to a particular point in time, from the command line, with
> no significant advance preparation (so long as the required files are
> still around, and if they're not then abort the operation with the
> appropriate error message).  But it doesn't sound like that's what
> we're talking about when we talk about PITR...

Nice but lots of work for questionable benefit.  If your application
needs to keep track of history there are better ways to do it.  If not,
then PITR should be seen as an exceptional circumstance and so need not
be so easy to manage.
> I wouldn't expect the O***** docs to be particularly revealing about
> how the database manages PITR at the file level, but if it does, would
> you happen to know where so I can look at it?  What I've seen so far
> is very basic and not very revealing at all...

The best Oracle documentation on this is available through Oracle
technet (technet.oracle.com) for which you will have to need to
register.  Look for documentation->Oracle 9i Documentation->list of
books->Backup and Recovery Concepts.

-- 
Marc        marc@bloodnok.com


Re: Incremental backup

From
Kevin Brown
Date:
Curt Sampson wrote:
> Oracle9i has a new feature called a "flashback query," which uses the
> information in the rollback segements to let you query the database
> in a previous state. (I.e., "select such and such from this table as
> of two hours ago.") Postgres could do this using the older copies of
> rows as well, though the performance often wouldn't be pretty, since
> your indexes become useless, I believe. (Don't they point to only the
> latest copy of a row?) Still, it would be cool and life-saving in some
> situations.

That sounds like a really nice feature.  We don't do rollback
segments, but I suppose we could (perhaps at the expense of
performance).

> > For PITR, I assume we'd need an archivelog function that would
> > copy the WAL files as they're checkpointed to some other location
> > (with destination names that reflect their order in time), just for
> > starters.
> 
> Well, I'm not sure that you'd really need to have any special archiving
> facilities: you just need to let the old files sit there, and keep
> creating new ones. You do lose a bit of performance in that you can't
> recycle log segments, but you could always just fork off a (well niced)
> process to create a new, empty log segment at the time you start in on
> the last pre-created one, so that you will have another pre-created one
> ready when you finish the current one.

It's probably just as easy to fork off a well-niced process to copy
the log segments elsewhere.

If you're a shop trying to eke out the last bit of performance from
PostgreSQL, then you're probably going to put the transaction logs on
the fastest bit of disk you've got -- a solid-state disk if you can.
That means the amount of space you've got on the filesystem holding
the logs isn't necessarily all that high.  For PITR you almost
certainly want to put the old log files somewhere else for
safekeeping: since you're not using them anymore and you probably want
to keep a lot of them, the old log files need to go somewhere that has
tons of space and is highly reliable.  For that area, performance is
the least of your concerns, as long as it's fast enough that you can
keep up with the database as it writes transaction logs.

And chances are you'll want a background process that compresses the
old log files to minimize the amount of space they eat.


> It's not like there's much point in writing the data to the disk if
> you can't get to it. I'd understood fdatasync as just a way to avoid
> updaing the inode's last changed date.  Are there OSes that
> implement fdatasync in a way that you could still lose data?

I wouldn't think so.  Otherwise what would be the point of having it?

> And you can't do such a rollback at all, except on an entire database,
> because of the potential integrity violations. The best you could do
> would be to generate SQL for the changes you'd need to get back to the
> previous point, and see if you can execute these changes.

Well, you could do the rollback for all tables involved in the
referential integrity tree and leave it at that, but yeah it's
probably simpler to rollback the whole database.

It would still be a useful feature, but I don't know that it's worth
the upfront performance cost of having to save off rollback segments.
If you did have such a feature you'd want to be able to control
whether or not it's enabled, so that you take the performance hit only
if you need the feature.



-- 
Kevin Brown                          kevin@sysexperts.com