Thread: XLog: how to log?

XLog: how to log?

From

Alvaro Herrera

Date:

10 May 2004, 20:13:58

Hackers,

I'm trying to figure out how to log the subtransaction operations.

A simple idea would be to add the committed subxact Xids to the
xl_xact_commit and xl_xact_abort structs.  However, this will be a
variable length array, and those structs already have variable length
arrays at the end (filenodes to drop).  I don't see how would I overlay
that into a C struct.

A different idea would be to write a XLog record for subxact aborts
(which would include Xids of all committed child subxacts), and
at main transaction commit write another (different) XLog record, where
XLogRecord.Xid = mainTransactionXid, which would carry the committed
subtransactions.

Does this make any sense?  Ideas, comments?

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"No reniegues de lo que alguna vez creíste"

Re: XLog: how to log?

From

Tom Lane

Date:

11 May 2004, 00:06:32

Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> A simple idea would be to add the committed subxact Xids to the
> xl_xact_commit and xl_xact_abort structs.  However, this will be a
> variable length array, and those structs already have variable length
> arrays at the end (filenodes to drop).  I don't see how would I overlay
> that into a C struct.

Just means you have to do some address arithmetic instead of being able
to reference the additional data as a struct member.  Tedious but hardly
difficult.  See for instance the handling of "move" data in
XLOG_HEAP_MOVE records, or "unused" data in XLOG_HEAP_CLEAN.

I'd not recommend emitting additional xlog records unless there's a
genuine log-semantics-related reason for doing so.  (Thinks about it...)
It seems like you'd want to emit a subtrans abort record to carry
information about file deletions, but there's no real need to emit
subtrans commit records, is there?
        regards, tom lane

Re: XLog: how to log?

From

Simon Riggs

Date:

11 May 2004, 03:45:18

On Tue, 2004-05-11 at 04:05, Tom Lane wrote:
> Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > A simple idea would be to add the committed subxact Xids to the
> > xl_xact_commit and xl_xact_abort structs.  However, this will be a
> > variable length array, and those structs already have variable length
> > arrays at the end (filenodes to drop).  I don't see how would I overlay
> > that into a C struct.
> 
> Just means you have to do some address arithmetic instead of being able
> to reference the additional data as a struct member.  Tedious but hardly
> difficult.  See for instance the handling of "move" data in
> XLOG_HEAP_MOVE records, or "unused" data in XLOG_HEAP_CLEAN.
> 
> I'd not recommend emitting additional xlog records unless there's a
> genuine log-semantics-related reason for doing so.  (Thinks about it...)
> It seems like you'd want to emit a subtrans abort record to carry
> information about file deletions, but there's no real need to emit
> subtrans commit records, is there?
> 

Should we write a subtrans commit record when WAL_DEBUG is set? We could
remove it completely in the future when the interaction between PITR
recovery and subtransactions have been fully covered off.

My thinking is that subtransactions themselves are not valid recovery
targets. As long as we can tell the top level commits from the
subtransactions then it should be easy to recover. My thinking is that
if a subtransaction id is wrongly quoted as a recover point, then this
would be treated as a "rollforward to txnid X, but do not include this
one". Does that make sense? Does everything we have support that?

Best Regards, Simon Riggs

Re: XLog: how to log?

From

Alvaro Herrera

Date:

11 May 2004, 10:56:50

On Tue, May 11, 2004 at 07:44:53AM +0100, Simon Riggs wrote:

> Tom Lane wrote:
> > Just means you have to do some address arithmetic instead of being able
> > to reference the additional data as a struct member.  Tedious but hardly
> > difficult.  See for instance the handling of "move" data in
> > XLOG_HEAP_MOVE records, or "unused" data in XLOG_HEAP_CLEAN.

Got it.  I had skimmed looking for an example of this but didn't find
this.

> Should we write a subtrans commit record when WAL_DEBUG is set? We could
> remove it completely in the future when the interaction between PITR
> recovery and subtransactions have been fully covered off.

Actually, the only thing a subtransaction has to recover is mark its Xid
as either committed of aborted in clog, and possible file deletions, so
there shouldn't be too much interaction with your work.

(Unrelated: note that after main transaction commit, a committed
subtransaction is indistinguishable from a committed main transaction --
and with the current idea of XLog I have, after recovering a transaction
tree from XLog there won't be any mark in pg_subtrans.  So the system
will not be exactly as it was before but it won't matter.)

> My thinking is that subtransactions themselves are not valid recovery
> targets. As long as we can tell the top level commits from the
> subtransactions then it should be easy to recover. My thinking is that
> if a subtransaction id is wrongly quoted as a recover point, then this
> would be treated as a "rollforward to txnid X, but do not include this
> one". Does that make sense? Does everything we have support that?

Hmm ... I think it should be forbidden to quote a subtrans Xid as
rollforward point.  Not sure if that can be done though, or how to do
it.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"La gente vulgar solo piensa en pasar el tiempo;
el que tiene talento, en aprovecharlo"

Re: XLog: how to log?

From

Tom Lane

Date:

11 May 2004, 11:39:27

Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> Hmm ... I think it should be forbidden to quote a subtrans Xid as
> rollforward point.  Not sure if that can be done though, or how to do
> it.

Seems like a nonissue, unless the XLOG trace makes a subtrans look the
same as a main trans, which it'd not do would it?

We could allow specification of a subtrans ID to be interpreted the same
as specification of its parent main trans.  Dunno if that's actually
useful to anyone.  Actually, I'd think that people would generally
specify recovery up to a particular timestamp, and not be interested in
xact numbers at all ...
        regards, tom lane

Re: XLog: how to log?

From

Bruce Momjian

Date:

11 May 2004, 12:33:53

Tom Lane wrote:
> Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > Hmm ... I think it should be forbidden to quote a subtrans Xid as
> > rollforward point.  Not sure if that can be done though, or how to do
> > it.
> 
> Seems like a nonissue, unless the XLOG trace makes a subtrans look the
> same as a main trans, which it'd not do would it?
> 
> We could allow specification of a subtrans ID to be interpreted the same
> as specification of its parent main trans.  Dunno if that's actually
> useful to anyone.  Actually, I'd think that people would generally
> specify recovery up to a particular timestamp, and not be interested in
> xact numbers at all ...

I don't think timestamp is going to be precise enough.  Basically I can
see someone saying I want recovery up to 4am, but anything more specific
will need xid.  I suggested that we write an xlog dump tool so you can
see the xids (with some xid details) and rough timestamps stored in the
WAL file and choose the xid for recovery.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: XLog: how to log?

From

Alvaro Herrera

Date:

11 May 2004, 12:40:45

On Tue, May 11, 2004 at 10:37:49AM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > Hmm ... I think it should be forbidden to quote a subtrans Xid as
> > rollforward point.  Not sure if that can be done though, or how to do
> > it.
> 
> Seems like a nonissue, unless the XLOG trace makes a subtrans look the
> same as a main trans, which it'd not do would it?

Huh, certainly not.

> We could allow specification of a subtrans ID to be interpreted the same
> as specification of its parent main trans.  Dunno if that's actually
> useful to anyone.

Probably not.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"If it wasn't for my companion, I believe I'd be having
the time of my life"  (John Dunbar)

Re: XLog: how to log?

From

Tom Lane

Date:

11 May 2004, 17:33:40

Simon Riggs <simon@2ndquadrant.com> writes:
> Currently, recovery loops until end of xlogs. There is no exit condition
> from the loop. There is not currently a timestamp on the xlogs -
> anywhere apart from the file date on each xlog.

Sure there is: every transaction commit record (and I think abort as
well) carries a timestamp.  This was put in with PITR in mind in the
very beginning.
        regards, tom lane

Re: XLog: how to log?

From

Simon Riggs

Date:

11 May 2004, 17:38:53

On Tue, 2004-05-11 at 16:33, Bruce Momjian wrote:
> Tom Lane wrote:
> > Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > > Hmm ... I think it should be forbidden to quote a subtrans Xid as
> > > rollforward point.  Not sure if that can be done though, or how to do
> > > it.
> > 
> > Seems like a nonissue, unless the XLOG trace makes a subtrans look the
> > same as a main trans, which it'd not do would it?
> > 

I agree that a subtrans xid should not be a valid rollforward point.

Forgive me discussing what seems like obvious points - I'm sure you
appreciate we need an exact statement of how/when to terminate recovery
and that might be found in looking harder at the subtrans questions.
This is my third re-write of this e-mail, since I keep thinking of
additional things while going for the "definitive statement". I had
thought this was straightforward...

Currently, recovery loops until end of xlogs. There is no exit condition
from the loop. There is not currently a timestamp on the xlogs -
anywhere apart from the file date on each xlog.

Xids are assigned sequentially to transactions as they start. However,
Xids are not committed sequentially. Moreover, checkpoint records do not
wait for transactions to complete, so a checkpoint could record an Xid,
yet a lower Xid might still be in progress and commit sometime after the
checkpoint. So, when we do a backup, we might take with us a pg_control
that has a particular Xid, only to find lots of later committed, but
earlier Xids in the xlogs. So Xid can have no lower bound. (and a fully
formed clog is essential to recovery).

If we go searching for a particular Xid, there is no way to tell whether
an Xid suggested by a user is too big or too small for use as a recovery
target. We need to recover - it is the only way to tell; if we find an
Xid that matches, we stop. If not, we keep going until end of logs, when
we need to issue a "recovered fully - the Xid you gave was not valid",
which may take some time and is also very clearly not what was wanted.
(If they had wanted full recovery, they would have asked).

So searching on an Xid is inherently a poor way to recover. Which is a
shame, because it seemed like an easy target. Unless of course, we live
with this vagueness and get on and build the XLogSpy...

Xlog records ARE written sequentially, so a timestamp written to the
xlogs COULD be used as a target for halting recovery. We would be able
to decide, ahead of starting recovery, whether we would be able to
sensibly recover to that point by using the pg_control checkpoint time
as the lower bound and the file write times of the highest xlog as the
upper bound. Once decided that the target timestamp lies between upper
and lower bounds, we begin recovery, knowing exactly where it will
complete.

During recovery, we would search for a timestamp. If found exactly,
stop. If exceeded, stop. Any transactions not committed at that point
are, as we say, out of luck. ....This approach has a certainty about it
that I think is much better than the error prone Xid hunting approach,
and is also more attuned to the human reality (time matters, Xids
don't).

Earlier, Bruce and I had discussed that for reasons of time pressure,
the PITR code for this release would consist of 
a) recovery to a particular Xid
b) later, a utility that allowed xlogs to be inspected to allow DBA to
decide which is the correct Xid to recover to.
Those ideas don't sound as good now....

Therefore: action on me? - add a timestamp to EACH xlog record -
something I had been shying away from.

On Tue, 2004-05-11 at 14:56, Alvaro Herrera wrote: 
> (Unrelated: note that after main transaction commit, a committed
> subtransaction is indistinguishable from a committed main transaction --
> and with the current idea of XLog I have, after recovering a transaction
> tree from XLog there won't be any mark in pg_subtrans.  So the system
> will not be exactly as it was before but it won't matter.)

I don't think we need a subtrans commit directly, since if the top-level
commits after the subtrans has committed, then we're good. 

However, if a subtrans aborts, yet the top-level commits there will be
data written to the database about an aborted transaction. We don't have
Undo, so the subtrans clog must be updated to show that the subtrans
aborted, otherwise we would read both the committed (top-level) and the
uncommitted data (subtrans).

Another way of putting it - if it was worth writing before a crash, it
is worth recovering after a crash. Shurely?

> > We could allow specification of a subtrans ID to be interpreted the same
> > as specification of its parent main trans.  Dunno if that's actually
> > useful to anyone.  Actually, I'd think that people would generally
> > specify recovery up to a particular timestamp, and not be interested in
> > xact numbers at all ...
> 
> I don't think timestamp is going to be precise enough.  Basically I can
> see someone saying I want recovery up to 4am, but anything more specific
> will need xid.  I suggested that we write an xlog dump tool so you can
> see the xids (with some xid details) and rough timestamps stored in the
> WAL file and choose the xid for recovery.

Bruce, As I started this e-mail (1st time), I completely agreed with
you. I've now had to switch my thinking.

(Doesn't effect archiving architecture....)

I'm a little dazed....comments anyone?

Best regards, Simon Riggs

Re: XLog: how to log?

From

Bruce Momjian

Date:

11 May 2004, 18:06:01

Simon Riggs wrote:
> During recovery, we would search for a timestamp. If found exactly,
> stop. If exceeded, stop. Any transactions not committed at that point
> are, as we say, out of luck. ....This approach has a certainty about it
> that I think is much better than the error prone Xid hunting approach,
> and is also more attuned to the human reality (time matters, Xids
> don't).
> 
> Earlier, Bruce and I had discussed that for reasons of time pressure,
> the PITR code for this release would consist of 
> a) recovery to a particular Xid
> b) later, a utility that allowed xlogs to be inspected to allow DBA to
> decide which is the correct Xid to recover to.
> Those ideas don't sound as good now....
> 
> Therefore: action on me? - add a timestamp to EACH xlog record -
> something I had been shying away from.

Many transactions are going to have the same timestamp because that just
isn't precise enough to choose a particular transaction.  I agree
finding a particular xid in the logs is hard.  We could just scan the
logs to see if we find the xid before doing the recovery.

Anyway, I though we agreed to just get total recovery working for 7.5
and we can deal with recovery to a particular point later.  Or we can
just add timestamps to the wal file header and restore to a particular
wal file date timestamp.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: XLog: how to log?

From

Alvaro Herrera

Date:

11 May 2004, 18:26:31

On Tue, May 11, 2004 at 09:25:37PM +0100, Simon Riggs wrote:
> On Tue, 2004-05-11 at 16:33, Bruce Momjian wrote:
> > Tom Lane wrote:
> > > Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> > > > Hmm ... I think it should be forbidden to quote a subtrans Xid as
> > > > rollforward point.  Not sure if that can be done though, or how to do
> > > > it.
> > > 
> > > Seems like a nonissue, unless the XLOG trace makes a subtrans look the
> > > same as a main trans, which it'd not do would it?
> 
> I agree that a subtrans xid should not be a valid rollforward point.

If you try to do that you'll fail because there will be no XLog record
signalling the commit of a subtransaction.  They will be marked
committed as necessary as a subproduct of main transaction committing.

> Currently, recovery loops until end of xlogs. There is no exit condition
> from the loop. There is not currently a timestamp on the xlogs -
> anywhere apart from the file date on each xlog.

Both xact commit and abort have timestamps in the XLog.  I think valid
recovery points are transaction commit/abort, not transaction start.

> If we go searching for a particular Xid, there is no way to tell whether
> an Xid suggested by a user is too big or too small for use as a recovery
> target. We need to recover - it is the only way to tell; if we find an
> Xid that matches, we stop. If not, we keep going until end of logs, when
> we need to issue a "recovered fully - the Xid you gave was not valid",
> which may take some time and is also very clearly not what was wanted.

I think the user should first examine the logs with whatever tools are
provided, and use a timestamp or a Xid listed in the XLog.  If they use
a Xid that's not listed, it's not our fault ...

> b) later, a utility that allowed xlogs to be inspected to allow DBA to
> decide which is the correct Xid to recover to.

Why is this difficult?  There are lots of subsys_desc() functions which
already returns what's in each log record as a string.  The tool could
initially just dump that ...

> Therefore: action on me? - add a timestamp to EACH xlog record -
> something I had been shying away from.

You only need timestamps in xl_xact_commit and xl_xact_abort, which are
already there.

> On Tue, 2004-05-11 at 14:56, Alvaro Herrera wrote: 
> > (Unrelated: note that after main transaction commit, a committed
> > subtransaction is indistinguishable from a committed main transaction --
> > and with the current idea of XLog I have, after recovering a transaction
> > tree from XLog there won't be any mark in pg_subtrans.  So the system
> > will not be exactly as it was before but it won't matter.)
> 
> I don't think we need a subtrans commit directly, since if the top-level
> commits after the subtrans has committed, then we're good. 

If the subxact wrote a tuple, its Xid has to be in the pg_clog.  Thus
we need to recover the pg_clog write.

> However, if a subtrans aborts, yet the top-level commits there will be
> data written to the database about an aborted transaction. We don't have
> Undo, so the subtrans clog must be updated to show that the subtrans
> aborted, otherwise we would read both the committed (top-level) and the
> uncommitted data (subtrans).

If the aborted subxact wrote a tuple, its Xid has to be in the pg_clog.

> Another way of putting it - if it was worth writing before a crash, it
> is worth recovering after a crash. Surely?

Right.  What I was saying is that we don't need pg_subtrans info,
because that's only needed while the subtransaction is marked as
"subcommitted" but it's parent hasn't committed or aborted yet.  The
subcommitted status is changed to committed/aborted when the main
transaction commits or aborts; at recovery time, we already know if that
happenned or not so we can mark it right away.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"La vida es para el que se aventura"

Re: XLog: how to log?

From

Tom Lane

Date:

11 May 2004, 18:54:09

Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> Both xact commit and abort have timestamps in the XLog.  I think valid
> recovery points are transaction commit/abort, not transaction start.

A transaction abort cannot be an interesting stop point either; you may
as well stop at the latest preceding commit.  Whatever the aborted
transaction did need not be entered into the database.

The abort record timestamps may be useful for navigation (to help
determine where you want to stop) but AFAICS only main transaction
commit records are useful stopping points.
        regards, tom lane

Re: XLog: how to log?

From

Simon Riggs

Date:

11 May 2004, 18:58:48

On Tue, 2004-05-11 at 21:29, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Currently, recovery loops until end of xlogs. There is no exit condition
> > from the loop. There is not currently a timestamp on the xlogs -
> > anywhere apart from the file date on each xlog.
> 
> Sure there is: every transaction commit record (and I think abort as
> well) carries a timestamp.  This was put in with PITR in mind in the
> very beginning.
> 

When being wrong has a purpose, I just love it.

I'll put my code-goggles back on the right way round.

Thanks, Simon

Re: XLog: how to log?

From

Simon Riggs

Date:

11 May 2004, 19:03:02

On Tue, 2004-05-11 at 22:01, Bruce Momjian wrote:
> Anyway, I though we agreed to just get total recovery working for 7.5
> and we can deal with recovery to a particular point later.

Back on target now! Regards, Simon

Re: XLog: how to log?

From

Tom Lane

Date:

11 May 2004, 19:08:04

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Many transactions are going to have the same timestamp because that just
> isn't precise enough to choose a particular transaction.

I think this argument is largely a red herring ... but if it makes you
feel better, we could change the contents of the commit timestamp to
be gettimeofday() output (seconds+microseconds) instead of just time()
output.  That should be precise enough for practical purposes.
        regards, tom lane

Re: XLog: how to log?

From

Simon Riggs

Date:

11 May 2004, 19:08:12

Thanks to all of you for such swift advice and correction!
Have a good evening...


On Tue, 2004-05-11 at 22:26, Alvaro Herrera wrote:
> On Tue, May 11, 2004 at 09:25:37PM +0100, Simon Riggs wrote:
> > On Tue, 2004-05-11 at 16:33, Bruce Momjian wrote:
> > > Tom Lane wrote:
> > > > Alvaro Herrera <alvherre@dcc.uchile.cl> writes:

> Both xact commit and abort have timestamps in the XLog.  I think valid
> recovery points are transaction commit/abort, not transaction start.
> 

Agreed.

> 
> > If we go searching for a particular Xid, there is no way to tell whether
> > an Xid suggested by a user is too big or too small for use as a recovery
> > target. We need to recover - it is the only way to tell; if we find an
> > Xid that matches, we stop. If not, we keep going until end of logs, when
> > we need to issue a "recovered fully - the Xid you gave was not valid",
> > which may take some time and is also very clearly not what was wanted.
> 
> I think the user should first examine the logs with whatever tools are
> provided, and use a timestamp or a Xid listed in the XLog.  If they use
> a Xid that's not listed, it's not our fault ...

Blame is in the eye of the beholder. Anyway, because of above, no
worries.

> 
> > b) later, a utility that allowed xlogs to be inspected to allow DBA to
> > decide which is the correct Xid to recover to.
> 
> Why is this difficult?  There are lots of subsys_desc() functions which
> already returns what's in each log record as a string.  The tool could
> initially just dump that ...
> 

It's not hard, its all a question of finite time resources.

> 
> > Therefore: action on me? - add a timestamp to EACH xlog record -
> > something I had been shying away from.
> 
> You only need timestamps in xl_xact_commit and xl_xact_abort, which are
> already there.
> 
> > On Tue, 2004-05-11 at 14:56, Alvaro Herrera wrote: 
> > > (Unrelated: note that after main transaction commit, a committed
> > > subtransaction is indistinguishable from a committed main transaction --
> > > and with the current idea of XLog I have, after recovering a transaction
> > > tree from XLog there won't be any mark in pg_subtrans.  So the system
> > > will not be exactly as it was before but it won't matter.)
> > 
> > I don't think we need a subtrans commit directly, since if the top-level
> > commits after the subtrans has committed, then we're good. 
> 
> If the subxact wrote a tuple, its Xid has to be in the pg_clog.  Thus
> we need to recover the pg_clog write.
> 
> > However, if a subtrans aborts, yet the top-level commits there will be
> > data written to the database about an aborted transaction. We don't have
> > Undo, so the subtrans clog must be updated to show that the subtrans
> > aborted, otherwise we would read both the committed (top-level) and the
> > uncommitted data (subtrans).
> 
> If the aborted subxact wrote a tuple, its Xid has to be in the pg_clog.
> 
> > Another way of putting it - if it was worth writing before a crash, it
> > is worth recovering after a crash. Surely?
> 
> Right.  What I was saying is that we don't need pg_subtrans info,
> because that's only needed while the subtransaction is marked as
> "subcommitted" but it's parent hasn't committed or aborted yet.  The
> subcommitted status is changed to committed/aborted when the main
> transaction commits or aborts; at recovery time, we already know if that
> happenned or not so we can mark it right away.

Maybe I'm just not following you, which seems likely.

Surely it is possible that a subtrans could abort, yet the fulltrans
commits. Yet that doesn't in any way imply that the subtrans should
change from aborted to committed?

Anyway, I've raised my thoughts, so if you're still OK, thats fine by
me. I'll get back to the wheel to see what can be spun...

Best regards, Simon Riggs

Re: XLog: how to log?

From

Bruce Momjian

Date:

11 May 2004, 19:10:28

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Many transactions are going to have the same timestamp because that just
> > isn't precise enough to choose a particular transaction.
> 
> I think this argument is largely a red herring ... but if it makes you
> feel better, we could change the contents of the commit timestamp to
> be gettimeofday() output (seconds+microseconds) instead of just time()
> output.  That should be precise enough for practical purposes.

I am saying timestamp as used for specifying a recovery location might
not be unique enough, no?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: XLog: how to log?

From

Tom Lane

Date:

11 May 2004, 19:20:07

Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> On Tue, May 11, 2004 at 09:25:37PM +0100, Simon Riggs wrote:
>> Another way of putting it - if it was worth writing before a crash, it
>> is worth recovering after a crash. Surely?

> Right.

Wrong.  Only actions that were *committed* before the crash, or in
general the selected recovery stopping time, need be recovered.  The
actions-so-far of transactions that were in progress at that time don't
have to be redone.  It doesn't really matter if we do do them; we have a
consistent database state either way.  I'm just pointing this out for
the sake of clear thinking.

> What I was saying is that we don't need pg_subtrans info,
> because that's only needed while the subtransaction is marked as
> "subcommitted" but it's parent hasn't committed or aborted yet.  The
> subcommitted status is changed to committed/aborted when the main
> transaction commits or aborts; at recovery time, we already know if that
> happenned or not so we can mark it right away.

Agreed, if pg_subtrans is never consulted except for children of
in-progress (or recently-completed) main transactions, then it need not
be recovered.  However, I'm not sure this is really true.  Couldn't
there be an entry in pg_clog showing a SUBTRANS COMMIT state immediately
after a recovery?  If the main transaction never gets to commit or
abort, there would be nothing in XLOG to cue the recovery process to go
and clear that clog entry.  So without some further mechanism I'd expect
the first backend that visits the tuple in question to go and consult
pg_subtrans.  There would need to be a way of knowing "XIDs older than
this are by definition not to be sought in pg_subtrans".  With a cutoff,
you could assume that SUBTRANS COMMIT entries older than the cutoff can
be changed to ABORT state.

What's not clear to me is how we can advance that cutoff; it might have
to go along with the clog truncation logic.
        regards, tom lane

Re: XLog: how to log?

From

Alvaro Herrera

Date:

11 May 2004, 19:30:12

On Tue, May 11, 2004 at 11:07:47PM +0100, Simon Riggs wrote:

> On Tue, 2004-05-11 at 22:26, Alvaro Herrera wrote:

> > Right.  What I was saying is that we don't need pg_subtrans info,
> > because that's only needed while the subtransaction is marked as
> > "subcommitted" but it's parent hasn't committed or aborted yet.  The
> > subcommitted status is changed to committed/aborted when the main
> > transaction commits or aborts; at recovery time, we already know if that
> > happenned or not so we can mark it right away.
> 
> Maybe I'm just not following you, which seems likely.
> 
> Surely it is possible that a subtrans could abort, yet the fulltrans
> commits. Yet that doesn't in any way imply that the subtrans should
> change from aborted to committed?

The subtransaction will be recorded as aborted in pg_clog (I mean there
will be a XLog record saying that it aborted).  That protects it from
showing as committed, no matter what happens to the parent xact.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"La tristeza es un muro entre dos jardines" (Khalil Gibran)

Re: XLog: how to log?

From

Tom Lane

Date:

11 May 2004, 20:04:10

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> I think this argument is largely a red herring ... but if it makes you
>> feel better, we could change the contents of the commit timestamp to
>> be gettimeofday() output (seconds+microseconds) instead of just time()
>> output.  That should be precise enough for practical purposes.

> I am saying timestamp as used for specifying a recovery location might
> not be unique enough, no?

Why not?  I don't think there are going to be practical situations where
the user knows that he wants transactions up till exactly 6:48:52 PM
anyway.  He'll be lucky if he knows that the junior DBA dropped the
wrong table around 6:30 :-(.  It's even less likely that he desperately
needs to revert to before such a disaster but still get in a transaction
that happened to commit at the same second.  Take it down to the
microsecond level and the use-case becomes vanishingly small.

What is *very* likely, but has no chance of being supported by PITR,
is that he wishes he could keep the results of transactions committed
shortly after the error he's trying to undo.  At the microsecond level,
the transaction commit order is not that predictable anyway (except of
course when one transaction's lock blocks another).  I think it's
pointless to get all exercised about whether you can stop on *this*
transaction and not *that* one when they are committing so close
together.  Half the time the user will be wishing he could omit the
first and include the second, but he doesn't get to do that, and so
I say being able to point to the stopping xact by xid is just an
exercise in technical hair-splitting.  I don't think it solves any
real-world problem.

Another way of putting this is that one-second resolution in the states
you have available to restore to is plenty good enough for real-world
cases.
        regards, tom lane

Re: XLog: how to log?

From

Bruce Momjian

Date:

12 May 2004, 00:47:41

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> I think this argument is largely a red herring ... but if it makes you
> >> feel better, we could change the contents of the commit timestamp to
> >> be gettimeofday() output (seconds+microseconds) instead of just time()
> >> output.  That should be precise enough for practical purposes.
> 
> > I am saying timestamp as used for specifying a recovery location might
> > not be unique enough, no?
> 
> Why not?  I don't think there are going to be practical situations where
> the user knows that he wants transactions up till exactly 6:48:52 PM
> anyway.  He'll be lucky if he knows that the junior DBA dropped the
> wrong table around 6:30 :-(.  It's even less likely that he desperately
> needs to revert to before such a disaster but still get in a transaction
> that happened to commit at the same second.  Take it down to the
> microsecond level and the use-case becomes vanishingly small.

Here is my logic.  Once they have a way to dump the WAL contents, folks
trying to recover to a specific point in the past are going to look at
the WAL dump and hopefully identify the transaction that was bad.  They
then will want to roll back to just before that transaction.

Do they subtract one second from the transaction?  Seems it would be
easier to just pick the xid that was just before the bad one.  Also,
considering the various time formats and timezone issues that it is
simpler to just have them specify an xid.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: XLog: how to log?

From

Simon Riggs

Date:

12 May 2004, 04:23:54

On Wed, 2004-05-12 at 04:47, Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > Tom Lane wrote:
> > >> I think this argument is largely a red herring ... but if it makes you
> > >> feel better, we could change the contents of the commit timestamp to
> > >> be gettimeofday() output (seconds+microseconds) instead of just time()
> > >> output.  That should be precise enough for practical purposes.
> > 
> > > I am saying timestamp as used for specifying a recovery location might
> > > not be unique enough, no?
> > 
> > Why not?  I don't think there are going to be practical situations where
> > the user knows that he wants transactions up till exactly 6:48:52 PM
> > anyway.  He'll be lucky if he knows that the junior DBA dropped the
> > wrong table around 6:30 :-(.  It's even less likely that he desperately
> > needs to revert to before such a disaster but still get in a transaction
> > that happened to commit at the same second.  Take it down to the
> > microsecond level and the use-case becomes vanishingly small.
> 
> Here is my logic.  Once they have a way to dump the WAL contents, folks
> trying to recover to a specific point in the past are going to look at
> the WAL dump and hopefully identify the transaction that was bad.  They
> then will want to roll back to just before that transaction.
> 
> Do they subtract one second from the transaction?  Seems it would be
> easier to just pick the xid that was just before the bad one.  Also,
> considering the various time formats and timezone issues that it is
> simpler to just have them specify an xid.

Well, I think I agree with both sides of this debate.

Solution: provide both timestamp AND Xid capability. We assume that if
they specify Xid, it is because they know and, for whatever reason,
care, about the exact specification of where recovery stops.

If you know a large statement just executed in error, then you want to
restore back to just before the error.

My earlier angst was based upon mistaking that there was no timestamp.
There is now a simple choice of recovery targets and fairly simple to
implement both. Design is now clear for me.

Best Regards, Simon Riggs

Re: XLog: how to log?

From

"Zeugswetter Andreas SB SD"

Date:

12 May 2004, 06:20:24

> > I think this argument is largely a red herring ... but if it makes you
> > feel better, we could change the contents of the commit timestamp to
> > be gettimeofday() output (seconds+microseconds) instead of just time()
> > output.  That should be precise enough for practical purposes.
>
> I am saying timestamp as used for specifying a recovery location might
> not be unique enough, no?

Maybe the api should allow a xid only in addition to a timestamp, for use when
timestamp alone is not precise enough. That would solve the problem of not
finding the xid. I would stay away from microseconds.

Andreas