Thread: Patch for disaster recovery

Patch for disaster recovery

From
Bruce Momjian
Date:
I have added this define to help people recover deleted tuples.
Of course it is only to be used for disaster recovery.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: src/backend/utils/time/tqual.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/time/tqual.c,v
retrieving revision 1.81
diff -c -c -r1.81 tqual.c
*** src/backend/utils/time/tqual.c    31 Dec 2004 22:02:56 -0000    1.81
--- src/backend/utils/time/tqual.c    20 Feb 2005 04:52:13 -0000
***************
*** 776,781 ****
--- 776,786 ----
  HeapTupleSatisfiesSnapshot(HeapTupleHeader tuple, Snapshot snapshot,
                             Buffer buffer)
  {
+ /* This is to be used only for disaster recovery and requires serious analysis. */
+ #ifdef MAKE_ALL_TUPLES_VISIBLE
+     return true;
+ #endif
+
      if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
      {
          if (tuple->t_infomask & HEAP_XMIN_INVALID)

Re: Patch for disaster recovery

From
Michael Fuhr
Date:
On Sat, Feb 19, 2005 at 11:56:28PM -0500, Bruce Momjian wrote:

> I have added this define to help people recover deleted tuples.
> Of course it is only to be used for disaster recovery.

Wouldn't the patch as provided require a special build of the
postmaster, useful only for disaster recovery?  What about making
the behavior configurable at run time, say via a command-line option?
The code could remain in an #ifdef block and be included or not
included at build time depending on a configure script option.
People who don't want the functionality at all could omit it from
their build (or not enable it, depending on the default), while
people who do want it could use the same postmaster for normal
operation and for recovery, the latter function requiring starting
the postmaster with a "disaster recovery mode" option.

Just brainstorming.  Thoughts?

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: Patch for disaster recovery

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> + /* This is to be used only for disaster recovery and requires serious analysis. */
> + #ifdef MAKE_ALL_TUPLES_VISIBLE
> +     return true;
> + #endif

If this requires serious analysis, why did you install the patch without
any public discussion?

I cannot imagine any situation where this wouldn't break things beyond
belief, and so I request that you revert the patch until you've provided
a real example in which it would help.

            regards, tom lane

Re: Patch for disaster recovery

From
Michael Fuhr
Date:
On Sat, Feb 19, 2005 at 10:56:46PM -0700, Michael Fuhr wrote:
>
> What about making the behavior configurable at run time, say via a
> command-line option?

Such functionality should be easy to backpatch to earlier versions,
shouldn't it?  Pleas for help could then be answered with, "upgrade
to the latest version in the branch you're running, build with this
configure option, start the postmaster with this run-time option."

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: Patch for disaster recovery

From
Michael Fuhr
Date:
On Sat, Feb 19, 2005 at 11:07:55PM -0700, Michael Fuhr wrote:
> On Sat, Feb 19, 2005 at 10:56:46PM -0700, Michael Fuhr wrote:
> >
> > What about making the behavior configurable at run time, say via a
> > command-line option?
>
> Such functionality should be easy to backpatch to earlier versions,
> shouldn't it?

Hmmm...after seeing Tom's reply, I suppose I should have first
asked, "Gee, looks simple, but does it work?"  Should I even bother
experimenting with it in a test environment?

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: Patch for disaster recovery

From
Tom Lane
Date:
Michael Fuhr <mike@fuhr.org> writes:
> Hmmm...after seeing Tom's reply, I suppose I should have first
> asked, "Gee, looks simple, but does it work?"  Should I even bother
> experimenting with it in a test environment?

Experiment away, but I have a hard time visualizing how you'll find it
useful.

Just brainstorming here, but it seems to me that what might make some
kind of sense is a command to "undelete all tuples in this table".
You do that, you look through them, you delete the versions you don't
want, you're happy.  The problem with the patch as-is is that (a)
"deleting the versions you don't want" is a no-op, so you cannot keep
straight what you've done in terms of filtering out garbage; and (b)
when you revert to a non-broken postmaster, the stuff you couldn't see
before goes back to being unseeable, because after all you didn't change
its state.

With either the snapshot kluge or the undelete-all kluge, you have an
issue in that constraints are broken wholesale --- you can see lots of
duplicate row versions that violate unique constraints, deleted versions
that violate FK constraints because they reference also-deleted master
rows, deleted versions that violate later-added CHECK constraints, etc.
I'd sort of like to see the system flip into some mode that says "we're
not promising constraints are honored", and then you can't go back to
normal operation without going through some pushup that checks all the
remaining rows satisfy the declared constraints.

In any case I suspect it's a bad idea to treat tuples as good if their
originating transaction did not commit.  For starters, such a tuple
might not possess all the index entries it should (if the originating
transaction failed before inserting said entries).  I think what we
want to think about is overriding delete commands, not overriding
insert failures.

Not sure where this leads to, but it's not leading to an undocumented
one-line hack in tqual.c, and definitely not *that* one-line hack.

            regards, tom lane

Re: Patch for disaster recovery

From
Michael Fuhr
Date:
On Sun, Feb 20, 2005 at 02:21:00AM -0500, Tom Lane wrote:

> Just brainstorming here, but it seems to me that what might make some
> kind of sense is a command to "undelete all tuples in this table".

Would it be possible to start a transaction with specific snapshot
parameters?  Could you "time warp" that way, assuming that what
you're looking for hasn't already been vacuumed away?  Rather than
modifying anything, you'd SELECT or COPY the data somewhere and
then restore it when you came "back to the future."  Just another
brainstorm...certainly not thought very far through yet.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: Patch for disaster recovery

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > + /* This is to be used only for disaster recovery and requires serious analysis. */
> > + #ifdef MAKE_ALL_TUPLES_VISIBLE
> > +     return true;
> > + #endif
>
> If this requires serious analysis, why did you install the patch without
> any public discussion?
>
> I cannot imagine any situation where this wouldn't break things beyond
> belief, and so I request that you revert the patch until you've provided
> a real example in which it would help.

We regularly have people on IRC who delete data and then want to recover
it.  By having the define it makes it easier for us to help them without
them having to add actual C code.

Does that make sense?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Patch for disaster recovery

From
Bruce Momjian
Date:
Tom Lane wrote:
> Michael Fuhr <mike@fuhr.org> writes:
> > Hmmm...after seeing Tom's reply, I suppose I should have first
> > asked, "Gee, looks simple, but does it work?"  Should I even bother
> > experimenting with it in a test environment?
>
> Experiment away, but I have a hard time visualizing how you'll find it
> useful.
>
> Just brainstorming here, but it seems to me that what might make some
> kind of sense is a command to "undelete all tuples in this table".
> You do that, you look through them, you delete the versions you don't
> want, you're happy.  The problem with the patch as-is is that (a)
> "deleting the versions you don't want" is a no-op, so you cannot keep
> straight what you've done in terms of filtering out garbage; and (b)
> when you revert to a non-broken postmaster, the stuff you couldn't see
> before goes back to being unseeable, because after all you didn't change
> its state.

It would be used only for dumping the table out and reverting to the
regular postmaster.

> With either the snapshot kluge or the undelete-all kluge, you have an
> issue in that constraints are broken wholesale --- you can see lots of
> duplicate row versions that violate unique constraints, deleted versions
> that violate FK constraints because they reference also-deleted master
> rows, deleted versions that violate later-added CHECK constraints, etc.
> I'd sort of like to see the system flip into some mode that says "we're
> not promising constraints are honored", and then you can't go back to
> normal operation without going through some pushup that checks all the
> remaining rows satisfy the declared constraints.
>
> In any case I suspect it's a bad idea to treat tuples as good if their
> originating transaction did not commit.  For starters, such a tuple
> might not possess all the index entries it should (if the originating
> transaction failed before inserting said entries).  I think what we
> want to think about is overriding delete commands, not overriding
> insert failures.
>
> Not sure where this leads to, but it's not leading to an undocumented
> one-line hack in tqual.c, and definitely not *that* one-line hack.

Yea, it should be more precise about showing only delete stuff and not
uncommitted creates.  Let me work on that.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Patch for disaster recovery

From
Bruce Momjian
Date:
Tom Lane wrote:
> Not sure where this leads to, but it's not leading to an undocumented
> one-line hack in tqual.c, and definitely not *that* one-line hack.

OK, it turns out the bottom of the function is the right place to fix
this.  Patch attached and applied.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: src/backend/utils/time/tqual.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/time/tqual.c,v
retrieving revision 1.82
diff -c -c -r1.82 tqual.c
*** src/backend/utils/time/tqual.c    20 Feb 2005 04:56:00 -0000    1.82
--- src/backend/utils/time/tqual.c    20 Feb 2005 14:56:33 -0000
***************
*** 776,786 ****
  HeapTupleSatisfiesSnapshot(HeapTupleHeader tuple, Snapshot snapshot,
                             Buffer buffer)
  {
- /* This is to be used only for disaster recovery and requires serious analysis. */
- #ifdef MAKE_ALL_TUPLES_VISIBLE
-     return true;
- #endif
-
      if (!(tuple->t_infomask & HEAP_XMIN_COMMITTED))
      {
          if (tuple->t_infomask & HEAP_XMIN_INVALID)
--- 776,781 ----
***************
*** 965,971 ****
--- 960,971 ----
          }
      }

+ /* This is to be used only for disaster recovery and requires serious analysis. */
+ #ifdef MAKE_ALL_TUPLES_VISIBLE
      return false;
+ #else
+     return true;
+ #endif
  }



Re: Patch for disaster recovery

From
Bruce Momjian
Date:
Tom Lane wrote:
> Not sure where this leads to, but it's not leading to an undocumented
> one-line hack in tqual.c, and definitely not *that* one-line hack.

Sorry, here is the proper change I just applied:

/* This is to be used only for disaster recovery and requires serious analysis. */
#ifndef MAKE_ALL_TUPLES_VISIBLE
    return false;
#else
    return true;
#endif

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Patch for disaster recovery

From
Bruno Wolff III
Date:
On Sun, Feb 20, 2005 at 09:43:11 -0500,
  Bruce Momjian <pgman@candle.pha.pa.us> wrote:
>
> We regularly have people on IRC who delete data and then want to recover
> it.  By having the define it makes it easier for us to help them without
> them having to add actual C code.
>
> Does that make sense?

You aren't going to get a consistant snapshot if you get back all of the
deleted rows. With autovacuum it is going to get harder to do this, because
accidentally making large changes in a table is going to trigger a vacuum.
It seems like the right way to do this is a recovery using the PITR
system and putting effort into making that easier is the way to go.

Re: Patch for disaster recovery

From
Bruce Momjian
Date:
Bruno Wolff III wrote:
> On Sun, Feb 20, 2005 at 09:43:11 -0500,
>   Bruce Momjian <pgman@candle.pha.pa.us> wrote:
> >
> > We regularly have people on IRC who delete data and then want to recover
> > it.  By having the define it makes it easier for us to help them without
> > them having to add actual C code.
> >
> > Does that make sense?
>
> You aren't going to get a consistant snapshot if you get back all of the
> deleted rows. With autovacuum it is going to get harder to do this, because
> accidentally making large changes in a table is going to trigger a vacuum.

Right, but as far as I remember the vacuum isn't going to reuse the
rows.  Rather, it is just going to mark them for later reuse.

> It seems like the right way to do this is a recovery using the PITR
> system and putting effort into making that easier is the way to go.

You are assuming someone is running PITR already, which they might not.

This define is just so we can help folks recover deleted stuff under our
guidance.  Also, I think a general GUC mechanism is too dangerous.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Patch for disaster recovery

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> Not sure where this leads to, but it's not leading to an undocumented
>> one-line hack in tqual.c, and definitely not *that* one-line hack.

> Sorry, here is the proper change I just applied:

> /* This is to be used only for disaster recovery and requires serious analysis. */
> #ifndef MAKE_ALL_TUPLES_VISIBLE
>     return false;
> #else
>     return true;
> #endif

AFAICS this has no value whatsoever.  Assuming that someone has a
disaster recovery problem on their hands, how likely is it that they
will know that that code is there, or be able to turn it on (most
users don't compile from source anymore), or be able to use it
effectively, given the complete lack of documentation?  As is, this
is of value only to someone familiar with the code, and such a someone
could go in and modify the logic for themselves just as easily as
turn on a #define.

I think the only real effect of this patch will be to confuse people
who are reading the source code.  tqual.c is already complicated and
fragile enough --- it doesn't need conditionally compiled "features"
that we can't even explain the use of.

            regards, tom lane

Re: Patch for disaster recovery

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> Not sure where this leads to, but it's not leading to an undocumented
> >> one-line hack in tqual.c, and definitely not *that* one-line hack.
>
> > Sorry, here is the proper change I just applied:
>
> > /* This is to be used only for disaster recovery and requires serious analysis. */
> > #ifndef MAKE_ALL_TUPLES_VISIBLE
> >     return false;
> > #else
> >     return true;
> > #endif
>
> AFAICS this has no value whatsoever.  Assuming that someone has a
> disaster recovery problem on their hands, how likely is it that they
> will know that that code is there, or be able to turn it on (most
> users don't compile from source anymore), or be able to use it
> effectively, given the complete lack of documentation?  As is, this
> is of value only to someone familiar with the code, and such a someone
> could go in and modify the logic for themselves just as easily as
> turn on a #define.
>
> I think the only real effect of this patch will be to confuse people
> who are reading the source code.  tqual.c is already complicated and
> fragile enough --- it doesn't need conditionally compiled "features"
> that we can't even explain the use of.

I need a note somewhere to remember where to tell people to modify the
code to recovery something.  Do you have a better idea?  You want just a
comment rather than a define?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Patch for disaster recovery

From
Richard Huxton
Date:
Bruce Momjian wrote:
> Tom Lane wrote:
>>I think the only real effect of this patch will be to confuse people
>>who are reading the source code.  tqual.c is already complicated and
>>fragile enough --- it doesn't need conditionally compiled "features"
>>that we can't even explain the use of.
>
>
> I need a note somewhere to remember where to tell people to modify the
> code to recovery something.  Do you have a better idea?  You want just a
> comment rather than a define?

A short guide to disaster recovery would be useful. There's a real
shortage of people qualified to help users in this situation. I feel
comfortable in telling them what *not* to do, but nothing more.

--
   Richard Huxton
   Archonet Ltd

Re: Patch for disaster recovery

From
Simon Riggs
Date:
On Sun, 2005-02-20 at 13:08 -0500, Bruce Momjian wrote:
> Bruno Wolff III wrote:
> > On Sun, Feb 20, 2005 at 09:43:11 -0500,
> >   Bruce Momjian <pgman@candle.pha.pa.us> wrote:
> > >
> > > We regularly have people on IRC who delete data and then want to recover
> > > it.

That is (one of) the purpose(s) of PITR.

> By having the define it makes it easier for us to help them without
> > > them having to add actual C code.
> > >
> > > Does that make sense?
> >
> > You aren't going to get a consistant snapshot if you get back all of the
> > deleted rows. With autovacuum it is going to get harder to do this, because
> > accidentally making large changes in a table is going to trigger a vacuum.
>
> Right, but as far as I remember the vacuum isn't going to reuse the
> rows.  Rather, it is just going to mark them for later reuse.
>
> > It seems like the right way to do this is a recovery using the PITR
> > system and putting effort into making that easier is the way to go.
>
> You are assuming someone is running PITR already, which they might not.

I do hope you/we all recommend PITR as the supported route for recovery
situations?? Otherwise it never will get easier.

If PITR were more clearly recommended, then perhaps more people would
think about about using it ahead of time, just like everybody does with
commercial databases.

Best Regards, Simon Riggs