Re: Two-phase commit issues - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Two-phase commit issues
Date
Msg-id Pine.OSF.4.61.0505190907400.219440@kosh.hut.fi
Whole thread Raw
In response to Two-phase commit issues  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Two-phase commit issues
List pgsql-hackers
On Wed, 18 May 2005, Tom Lane wrote:

> * The major missing issue that I've come across so far is that
> subtransaction and multixact state isn't preserved across a crash.
> Assuming that we want to store only top-level XIDs in the shared-memory
> list of prepared XIDs (which I think is important), it is essential that
> crash restart rebuild the pg_subxact status for prepared transactions.
> The subxacts of a prepared xact have to be seen as still running, and
> they won't be unless the subxact links are there.  Since subxact.c is
> designed to wipe all its state on restart, we need to recreate those
> entries.  Fortunately this doesn't seem hard: the state file for a
> prepared xact will include all of its subxact XIDs, and we can just
> do SubTransSetParent() on them while rereading the state file.  (AFAICS
> it's sufficient to make each subxact link directly to the top XID, even
> if there was a more complex hierarchy originally.)  Similarly, we've got
> to reconstruct MultiXactIds that any prepared xacts are members of, else
> row-level locks taken out by prepared xacts won't be enforced correctly.
> I think this can be handled if we add to the state files a list of all
> MultiXactIds that each prepared xact belongs to, and then during restart
> forcibly recreate those MultiXactIds.  (They would only be rebuilt with
> prepared XIDs, not any ordinary XIDs that might originally have been
> members.)  This seems to require some new code in multixact.c, but not
> anything fundamentally difficult --- Alvaro, do you see any likely
> problems in this stuff?

The subtransaction part is in fact there already, and it's done just like 
you described. RecoverPreparedTransactions function reads the subxids from 
the state file and calls SubTransSetParent for them.

As Alvaro pointed out elsewhere, the multixacts are harder because a 
backend doesn't know which multixactids it belongs to. AFAICS, the most 
straightforward solution is to xlog every CreateMultixact call, so that 
the multixact slru files can be completely reconstructed on recovery.

> * The patch is designed to dump state files into WAL as well as onto
> disk.  Why?  Wouldn't it be better just to write and fsync the state
> file before reporting successful prepare?  That would get rid of the
> need for checkpoint-time fsyncs.

Performance and correctness. There mustn't be a valid state file on the 
disk before the WAL entries of that transactions are on disk. Otherwise, 
the recovery might recover a transaction that in fact aborted right after 
it wrote the state file.

If we fsync the WAL prepare record first, and state file second, a crash 
in between would make it impossible to recover the transaction though the 
WAL says it's prepared.

WAL logging the state file completely saves us one fsync. The state files 
are usually small, say < 1 kb, so the tradeoff to write it twice and save 
one fsync is probably well worth it.

Third, we have to cater for PITR. I haven't given it much thought, but if 
we want to do log shipping and PITR, I believe we must have everything in 
the WAL.

> * I'm inclined to think that the "gid" identifiers for prepared
> transactions ought to be SQL identifiers (names), not string literals.
> Was there a particular reason for making them strings?

Sure. No Reason. While you're at it, do you think it's possible to make it 
unlimited size? I couldn't think of a simple way.

> * What are we going to do with GUC variables?  My feeling is that
> the only sane answer is that PREPARE is the same as COMMIT as far as
> local GUC variables go, and COMMIT/ROLLBACK PREPARED have no effect
> on GUC state.  Otherwise it's really unclear what to do.  Consider
>     SET myvar = foo;
>     BEGIN;
>     SET myvar = bar;
>     PREPARE gid;
>     SHOW myvar;        -- what do you see ... foo or bar?
>     SET myvar = baz;    -- is this even legal?
>     ROLLBACK PREPARED gid;
>     SHOW myvar;        -- now what do you see ... foo or baz?
> Since local GUC changes aren't going to be saved/restored across a
> crash anyway, I can't see a point in doing anything really complex.
>
> * There are some fairly ugly cases associated with creation and deletion
> of temporary tables as well.  I think we might want to just decree that
> you can't PREPARE a transaction that included creating or dropping a
> temp table.  Does anyone have much of a problem with that?

I think the safest way to handle the GUC case as well is to just refuse to 
prepare a transaction that has changed local GUC variables.

Another possibility is to rethink the contract of PREPARE TRANSACTION and 
COMMIT/ROLLBACK PREPARED. If PREPARE TRANSACTION would put the backend to 
a state where you can't do anything else than COMMIT/ROLLBACK the prepared 
transaction, we could do more sensible things with GUC and temp tables. 
That would have complications of it's own though. What would happen if 
another backend then tries to COMMIT/ROLLBACK the transaction the original 
backend is still tied to?

- Heikki


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: could not dump unrecognized node type: 500
Next
From: Andrej Ricnik-Bay
Date:
Subject: Contributing