Thread: two-phase commit

two-phase commit

From
Heikki Linnakangas
Date:
Hi,

I saw some discussion about two-hase commits on hackers-list and wanted
therefore to submit this patch for discussion now, although it's still a
bit of a mess, just to let you guys know that I'm working on this.

This patch adds three new commands:

PREPCOMMIT <global transaction id>

This performs the 1st phase. After issuing this command, the backend is
free to begin new transactions. The given global transaction id (gid) is
used later to do the 2nd phase commit with COMMITPREPARED.

COMMITPREPARED <global transaction id>

Performs the 2nd phase. This marks the transaction as committed, and
releases any locks held.

LISTPREPARED

Prints a list of all transactions in prepared state to the console. You
must currently start postmaster with -d 1 to see the output. It should of
course return a result set, but it not quite there yet.

Here's a simple way to try this out.

1. bin/postmaster -d 1
2. CREATE TABLE twophase (foobar int);
3. INSERT INTO twophase VALUES (1);
4. INSERT INTO twophase VALUES (2);

5. BEGIN; UPDATE twophase SET foobar = 10 WHERE foobar = 1;
6. PREPCOMMIT 'gid'

The transaction is now in prepared state. The backend is now free to do
something else.

7. SELECT * FROM foobar;
8. BEGIN; UPDATE twophase SET foobar = 11 WHERE foobar = 1;

This command should block, because the previous transaction is in prepared
state and it's still holding the lock.

9. in a new psql session: COMMITPREPARED 'gid';

The update started at line 8. should now return with UPDATE 0, as the
prepared transaction committed and took away the row with value 10.

You can also try killing the postmaster after PREPCOMMIT. When you restart
it and continue with line 8, it should still block. The locks held by the
prepared transactions persist over shutdown/restart.

Open issues:

1. No ABORTPREPARED yet. That's going to be trivial to add.

2. The only permanent information about any locks held by a prepared
transaction is currently only kept in xlog. There is a hack in place that
keeps the checkpointing routine from doing a checkpoint past the begin of
any prepared transactions, which would cause that information to be lost.
I guess a separate log like clog but for locks is needed.

3. Related to point 2, there is a race condition with the hack.
There is probably still a lot of other locking issues and bugs as well.

4. No support for notifications, DDL stuff etc. yet.

The complicated part was making the locks work right. When a
transaction is prepared with PREPCOMMIT, all locks acquired by the backend
are "offloaded" to another pseudo PGPROC-struct, and a record is also
written to xlog about each lock so that they can be re-acquired if the
server crashes before the transaction is committed. There is a new
function called LockPersistAll in lock.c that does that.

I started working on two-phase commit because I wanted to get myself
more familiar with postgresql internals and source code. I actually
thought that I would never get this far, but it wasn't that hard after
all.

This patch (after the open issues are solved) is enough for a Java
Transaction API compliant resource manager implementation in the JDBC
driver. That would allow Postgresql to be used together with a J2EE server
and message queue software etc. I haven't looked at the X/Open
specification yet.

There has also been discussion about using 2PC to implement synchronous
replication. To do that, you would need a global transaction manager in
addition to this patch, but I'm personally not interested in that.

I would very much appreciate any comments on this patch, and I will
continue working on this if you agree this is the right direction. I'll be
away for the weekend but I'll be back on Monday.

- Heikki

Attachment

Re: two-phase commit

From
Heikki Linnakangas
Date:
Here is the second version of my two phase commit patch. Please see
the discussion at the hackers-list.

- Heikki

Attachment

Re: two-phase commit

From
Bruce Momjian
Date:
I assume you have looked at Satoshi Nagayasu's work:

    http://snaga.org/pgsql/

If not, would you work with him on something we can add to 7.5 when
development starts?

---------------------------------------------------------------------------

Heikki Linnakangas wrote:
> Hi,
>
> I saw some discussion about two-hase commits on hackers-list and wanted
> therefore to submit this patch for discussion now, although it's still a
> bit of a mess, just to let you guys know that I'm working on this.
>
> This patch adds three new commands:
>
> PREPCOMMIT <global transaction id>
>
> This performs the 1st phase. After issuing this command, the backend is
> free to begin new transactions. The given global transaction id (gid) is
> used later to do the 2nd phase commit with COMMITPREPARED.
>
> COMMITPREPARED <global transaction id>
>
> Performs the 2nd phase. This marks the transaction as committed, and
> releases any locks held.
>
> LISTPREPARED
>
> Prints a list of all transactions in prepared state to the console. You
> must currently start postmaster with -d 1 to see the output. It should of
> course return a result set, but it not quite there yet.
>
> Here's a simple way to try this out.
>
> 1. bin/postmaster -d 1
> 2. CREATE TABLE twophase (foobar int);
> 3. INSERT INTO twophase VALUES (1);
> 4. INSERT INTO twophase VALUES (2);
>
> 5. BEGIN; UPDATE twophase SET foobar = 10 WHERE foobar = 1;
> 6. PREPCOMMIT 'gid'
>
> The transaction is now in prepared state. The backend is now free to do
> something else.
>
> 7. SELECT * FROM foobar;
> 8. BEGIN; UPDATE twophase SET foobar = 11 WHERE foobar = 1;
>
> This command should block, because the previous transaction is in prepared
> state and it's still holding the lock.
>
> 9. in a new psql session: COMMITPREPARED 'gid';
>
> The update started at line 8. should now return with UPDATE 0, as the
> prepared transaction committed and took away the row with value 10.
>
> You can also try killing the postmaster after PREPCOMMIT. When you restart
> it and continue with line 8, it should still block. The locks held by the
> prepared transactions persist over shutdown/restart.
>
> Open issues:
>
> 1. No ABORTPREPARED yet. That's going to be trivial to add.
>
> 2. The only permanent information about any locks held by a prepared
> transaction is currently only kept in xlog. There is a hack in place that
> keeps the checkpointing routine from doing a checkpoint past the begin of
> any prepared transactions, which would cause that information to be lost.
> I guess a separate log like clog but for locks is needed.
>
> 3. Related to point 2, there is a race condition with the hack.
> There is probably still a lot of other locking issues and bugs as well.
>
> 4. No support for notifications, DDL stuff etc. yet.
>
> The complicated part was making the locks work right. When a
> transaction is prepared with PREPCOMMIT, all locks acquired by the backend
> are "offloaded" to another pseudo PGPROC-struct, and a record is also
> written to xlog about each lock so that they can be re-acquired if the
> server crashes before the transaction is committed. There is a new
> function called LockPersistAll in lock.c that does that.
>
> I started working on two-phase commit because I wanted to get myself
> more familiar with postgresql internals and source code. I actually
> thought that I would never get this far, but it wasn't that hard after
> all.
>
> This patch (after the open issues are solved) is enough for a Java
> Transaction API compliant resource manager implementation in the JDBC
> driver. That would allow Postgresql to be used together with a J2EE server
> and message queue software etc. I haven't looked at the X/Open
> specification yet.
>
> There has also been discussion about using 2PC to implement synchronous
> replication. To do that, you would need a global transaction manager in
> addition to this patch, but I'm personally not interested in that.
>
> I would very much appreciate any comments on this patch, and I will
> continue working on this if you agree this is the right direction. I'll be
> away for the weekend but I'll be back on Monday.
>
> - Heikki

Content-Description:

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073