Re: Are you actively hacking on 2PC? - Mailing list pgsql-patches

From Heikki Linnakangas
Subject Re: Are you actively hacking on 2PC?
Date
Msg-id Pine.OSF.4.61.0506102254360.181839@kosh.hut.fi
Whole thread Raw
In response to Re: Are you actively hacking on 2PC?  (Alvaro Herrera <alvherre@surnet.cl>)
List pgsql-patches
On Wed, 8 Jun 2005, Alvaro Herrera wrote:

> Additionally I collected some to-do entries, maybe you find it useful.

Good list overall. Comments to some entries below:

> * Clean up the callback support.

Yeah. Possibly remove the callback abstraction alltogether and wire
calls to different functions directly in a switch statement etc. I
expected there to be more functions in the callback tables but it didn't
turn out that way.

> * Is PREPARE TRANSACTION the best interface?  What about marking a transaction
>  for preparing when it begins?  Apparently the XA spec requires/prefers this.

I'd prefer not to mark the transaction on begin. I might not know it's
going to be a two-phase transaction until later on. At the very least it
must still be possible to do a regular commit even if you mark the
transaction as two-phase in the begin.

We'll have to see what the XA spec says about it.

BTW: while googling for the XA spec, I bumped into RFC2371 - Transaction
Internet Protocol, which seems to be a protocol for coordinating a
two-phase commit over TCP. It looks quite flexible, it should work either
way. I don't know how widely adopted it is, it's first time I hear about
it.

> * Check whether the TwoPhaseStateLock usage per GetPreparedTransactionXidList
>  is safe.  (In particular in CheckPointTwoPhase)

I don't see a problem, no other locks are held at the same time. Can you
elaborate?

> * Rethink about using both WAL and a state file.  (If we declared the
>  intent to persist a transaction when it begins, maybe we don't need
>  the state file at all.)

The purpose of the state file is two-fold. First, the original backend
uses it to send a message, in a sense, to the backend that's later going
to finish the 2nd phase commit. It contains the instructions to do the
finishing.

Secondly, the pg_twophase directory and the set of valid state files in it
allows prepared transactions to be picked up by recovery after crash.

The first need could be covered with a shared memory structure. But
shared memory is fixed-size.

I don't know any good alternatives for the second need. The information
need to be retained over checkpoints, so the WAL alone is not enough.

I wouldn't worry about the overhead of writing the same data twice, the
state files are typically around 300 bytes in the current implementation.
And it could probably be cut down even more with some thought, making the
gid variable size for example.

I'm a bit bothered with the fact that we have to create and delete a file
for each transaction, though. That could be expensive on some file
systems.

I don't see how it would help to declare the intent to persist
on transaction at begin.

> * Why is the new code in GetOldestXmin dependent on allDbs?  That seems bogus.

It's a bit accidental. GetOldestXmin is used in three places:

1. It determines the subtrans files safe truncation point
2. It determines the vacuum cut-off point, tuples older than that can be
removed.
3. Similar to 2, it's used in index building to determine which tuples
are dead.

Case 1 needs to take prepared transactions into account. Cases 2 & 3
don't, because a prepared transaction can't read any tuples anymore.

Case 1 happens to call GetOldestXmin with allDbs set to true, so I just
wired the logic to that parameter. It should be cleaned up.

- Heikki

pgsql-patches by date:

Previous
From: Christopher Kings-Lynne
Date:
Subject: Re: indxpath.c refactoring
Next
From: Matthias Schmidt
Date:
Subject: Re: pg_starttime()