Home > mailing lists

Forward zeroing of pg_clog - Mailing list pgsql-hackers

From	Tom Lane
Subject	Forward zeroing of pg_clog
Date	August 30, 2004 15:21:05
Msg-id	8958.1093889989@sss.pgh.pa.us Whole thread Raw
Responses	Re: Forward zeroing of pg_clog Re: Forward zeroing of pg_clog
List	pgsql-hackers

Tree view

I just spent some time chasing weird failures ("PANIC: cannot abort
transaction 201109, it was already committed" after some but not all
errors) which I eventually realized were because pg_clog contained
commit and abort flags for several thousand transactions ahead of where
the current XID counter is in my test database.

How did it get that way?  Well, yesterday I was testing the XLOG mods to
support huge COMMIT records, so I ran a test script that would commit a
transaction with 20000 subcommitted subtransactions.  And then I kill 9'd
the backend to force WAL replay of that large transaction.

WAL replay sets the XID counter as one more than the largest XID that it
sees evidence of in the replayed log.  However, it's not looking inside
the COMMIT or ABORT records, and so in this case the largest XID it saw
was that of the parent transaction.  The actual pre-crash XID counter
was of course 20000 more than that.

This particular issue is just a simple oversight in xact_redo, and it's
easily fixed: make sure nextXID gets advanced past all of the committed
or aborted subXIDs too.

But thinking about it, I realized that we have some other issues in the
same area.  Because subxact commit sets clog bits but emits no WAL
record, it's at least theoretically possible that post-crash there will
be written-out clog bits for XIDs ahead of every XID of which there is
any record in the WAL data.  RecordTransactionCommit and friends have
other cases in which they think it's sufficient to write a clog entry
and no WAL entry.  Perhaps that's broken, but I think the cleanest fix
is that the clog code ought to forcibly zero all clog entries ahead of
whatever nextXID is settled on by WAL replay.  Otherwise we run some
risk of subtransactions that are still running looking like they are
subcommitted (or worse) in the clog data.

This is already true at the page level: when advancing into a new page
we zero it instead of reading anything from disk.  I am thinking of
adding code to StartupCLOG to zero the remaining portion of the
"current" page too.

Thoughts?
        regards, tom lane

pgsql-hackers by date:

From: "Jim Buttafuoco"
Date: 30 August 2004, 14:35:03
Subject: Re: beta 1 failed on linux mipsel

From: Joe Conway
Date: 30 August 2004, 15:29:26
Subject: pgxs regression test support

Forward zeroing of pg_clog - Mailing list pgsql-hackers

Previous

Next