two-phase commit - Mailing list pgsql-patches

From Heikki Linnakangas
Subject two-phase commit
Date
Msg-id Pine.OSF.4.58.0310100948360.117068@kosh.hut.fi
Whole thread Raw
Responses Re: two-phase commit
Re: two-phase commit
List pgsql-patches
Hi,

I saw some discussion about two-hase commits on hackers-list and wanted
therefore to submit this patch for discussion now, although it's still a
bit of a mess, just to let you guys know that I'm working on this.

This patch adds three new commands:

PREPCOMMIT <global transaction id>

This performs the 1st phase. After issuing this command, the backend is
free to begin new transactions. The given global transaction id (gid) is
used later to do the 2nd phase commit with COMMITPREPARED.

COMMITPREPARED <global transaction id>

Performs the 2nd phase. This marks the transaction as committed, and
releases any locks held.

LISTPREPARED

Prints a list of all transactions in prepared state to the console. You
must currently start postmaster with -d 1 to see the output. It should of
course return a result set, but it not quite there yet.

Here's a simple way to try this out.

1. bin/postmaster -d 1
2. CREATE TABLE twophase (foobar int);
3. INSERT INTO twophase VALUES (1);
4. INSERT INTO twophase VALUES (2);

5. BEGIN; UPDATE twophase SET foobar = 10 WHERE foobar = 1;
6. PREPCOMMIT 'gid'

The transaction is now in prepared state. The backend is now free to do
something else.

7. SELECT * FROM foobar;
8. BEGIN; UPDATE twophase SET foobar = 11 WHERE foobar = 1;

This command should block, because the previous transaction is in prepared
state and it's still holding the lock.

9. in a new psql session: COMMITPREPARED 'gid';

The update started at line 8. should now return with UPDATE 0, as the
prepared transaction committed and took away the row with value 10.

You can also try killing the postmaster after PREPCOMMIT. When you restart
it and continue with line 8, it should still block. The locks held by the
prepared transactions persist over shutdown/restart.

Open issues:

1. No ABORTPREPARED yet. That's going to be trivial to add.

2. The only permanent information about any locks held by a prepared
transaction is currently only kept in xlog. There is a hack in place that
keeps the checkpointing routine from doing a checkpoint past the begin of
any prepared transactions, which would cause that information to be lost.
I guess a separate log like clog but for locks is needed.

3. Related to point 2, there is a race condition with the hack.
There is probably still a lot of other locking issues and bugs as well.

4. No support for notifications, DDL stuff etc. yet.

The complicated part was making the locks work right. When a
transaction is prepared with PREPCOMMIT, all locks acquired by the backend
are "offloaded" to another pseudo PGPROC-struct, and a record is also
written to xlog about each lock so that they can be re-acquired if the
server crashes before the transaction is committed. There is a new
function called LockPersistAll in lock.c that does that.

I started working on two-phase commit because I wanted to get myself
more familiar with postgresql internals and source code. I actually
thought that I would never get this far, but it wasn't that hard after
all.

This patch (after the open issues are solved) is enough for a Java
Transaction API compliant resource manager implementation in the JDBC
driver. That would allow Postgresql to be used together with a J2EE server
and message queue software etc. I haven't looked at the X/Open
specification yet.

There has also been discussion about using 2PC to implement synchronous
replication. To do that, you would need a global transaction manager in
addition to this patch, but I'm personally not interested in that.

I would very much appreciate any comments on this patch, and I will
continue working on this if you agree this is the right direction. I'll be
away for the weekend but I'll be back on Monday.

- Heikki

Attachment

pgsql-patches by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [Fwd: Make psql use all pretty print options]
Next
From: Manuel Sugawara
Date:
Subject: Re: Spanish translations of pg_dump and pg_resetxlog