Re: Are you actively hacking on 2PC? - Mailing list pgsql-patches
From | Heikki Linnakangas |
---|---|
Subject | Re: Are you actively hacking on 2PC? |
Date | |
Msg-id | Pine.OSF.4.61.0506102254360.181839@kosh.hut.fi Whole thread Raw |
In response to | Re: Are you actively hacking on 2PC? (Alvaro Herrera <alvherre@surnet.cl>) |
List | pgsql-patches |
On Wed, 8 Jun 2005, Alvaro Herrera wrote: > Additionally I collected some to-do entries, maybe you find it useful. Good list overall. Comments to some entries below: > * Clean up the callback support. Yeah. Possibly remove the callback abstraction alltogether and wire calls to different functions directly in a switch statement etc. I expected there to be more functions in the callback tables but it didn't turn out that way. > * Is PREPARE TRANSACTION the best interface? What about marking a transaction > for preparing when it begins? Apparently the XA spec requires/prefers this. I'd prefer not to mark the transaction on begin. I might not know it's going to be a two-phase transaction until later on. At the very least it must still be possible to do a regular commit even if you mark the transaction as two-phase in the begin. We'll have to see what the XA spec says about it. BTW: while googling for the XA spec, I bumped into RFC2371 - Transaction Internet Protocol, which seems to be a protocol for coordinating a two-phase commit over TCP. It looks quite flexible, it should work either way. I don't know how widely adopted it is, it's first time I hear about it. > * Check whether the TwoPhaseStateLock usage per GetPreparedTransactionXidList > is safe. (In particular in CheckPointTwoPhase) I don't see a problem, no other locks are held at the same time. Can you elaborate? > * Rethink about using both WAL and a state file. (If we declared the > intent to persist a transaction when it begins, maybe we don't need > the state file at all.) The purpose of the state file is two-fold. First, the original backend uses it to send a message, in a sense, to the backend that's later going to finish the 2nd phase commit. It contains the instructions to do the finishing. Secondly, the pg_twophase directory and the set of valid state files in it allows prepared transactions to be picked up by recovery after crash. The first need could be covered with a shared memory structure. But shared memory is fixed-size. I don't know any good alternatives for the second need. The information need to be retained over checkpoints, so the WAL alone is not enough. I wouldn't worry about the overhead of writing the same data twice, the state files are typically around 300 bytes in the current implementation. And it could probably be cut down even more with some thought, making the gid variable size for example. I'm a bit bothered with the fact that we have to create and delete a file for each transaction, though. That could be expensive on some file systems. I don't see how it would help to declare the intent to persist on transaction at begin. > * Why is the new code in GetOldestXmin dependent on allDbs? That seems bogus. It's a bit accidental. GetOldestXmin is used in three places: 1. It determines the subtrans files safe truncation point 2. It determines the vacuum cut-off point, tuples older than that can be removed. 3. Similar to 2, it's used in index building to determine which tuples are dead. Case 1 needs to take prepared transactions into account. Cases 2 & 3 don't, because a prepared transaction can't read any tuples anymore. Case 1 happens to call GetOldestXmin with allDbs set to true, so I just wired the logic to that parameter. It should be cleaned up. - Heikki
pgsql-patches by date: