Tom Lane wrote:
> At the API level, I like the PREPARE/COMMIT/ROLLBACK statements, but I
> think you have missed a bet in that it needs to be possible to issue
> "COMMIT PREPARED gid" for the same gid several times without error.
> Consider a scenario where the transaction monitor crashes during the
> commit phase. When it recovers, it will be aware that it had committed
> to commit, but it won't know which nodes were successfully committed.
> So it will need to resend the COMMIT commands. It would be bad for the
> nodes to simply say "yes boss" if they are told to COMMIT a gid they
> have no record of. So I think the gid's have to stick around after
> COMMIT PREPARED or ROLLBACK PREPARED, and there needs to be a fourth
> command (RELEASE PREPARED?) to actually remove the state data when the
> transaction monitor is satisfied that everything's done. RELEASE of
> an unknown gid is okay to be a no-op.
Isn't this usually where the GTM would issue "recover" requests to
determine the state of the individual resources involved in the global
transaction, and then only commit/abort the resources that need it? (I
think the equivalent in Heikki's work is a SELECT of the
pg_prepared_xact view)
I found the Berkeley DB distributed transaction docs quite useful for
working out how two-phase commit fits together:
http://pybsddb.sourceforge.net/ref/xa/intro.html
> I would be inclined to require GIDs to be numbers (probably int8's)
> instead of strings, so that we don't have any problems with funny
> characters in the file names. That's negotiable though, as we could
> certainly uuencode the strings or something to avoid that trap.
Aren't the GIDs generated externally by the GTM? We need more than an
int8 there. See for example Heikki's JDBC driver patch: it is given a
javax.transaction.xa.Xid by the TM in prepare/commit/etc. The Xid is
basically just a couple of raw bytearrays. The driver base64-encodes
that into a string GID to give to the backend.
-O