Thread: In-doubt window
Some of you may remember that we discussed libpqxx's higher-level support for transactionality. It was pointed out (by Tom, IIRC) that the connection to the backend may be lost just after you've sent out the commit, while you're still waiting for a confirmation. So I built a class that expends some optional extra effort to reconnect after this type of failure, and find out whether the transaction was actually committed or not. It involves writing a record to a dedicated "transaction log" table inside the transaction, and deleting it right after the commit--or it could be the other way around: write the record before the transaction, then delete it inside the transaction. If the library ever needs to reconnect to find out if the transaction went through, it looks up this record and its mere existence tells it what it needs to know. A long time after, somebody else pointed out a remaining hole in this approach: what if your transaction took so long that it's still doing its commit when you reconnect, and so the state you see on reconnect suggests that the transaction was aborted--when in reality it may still complete successfully? The only things I could think of that didn't require new database features were ways to slightly improve the odds: wait a while before reconnecting; ensure that constraints are checked before sending the commit; only resolve the in-doubt situation if the answer is "yes, there was definitely a commit." But now my bright friend and colleague German Rodriguez has come up with an idea that might just work. It involves locking, which I don't know much about so I'd be interested to hear from people with a better grasp of how locking works and what's possible in the postgres implementation. The proposed scheme, based on German's idea, goes as follows: 1. create a log record 2. begin transaction 3. lock record 4. delete record 5. perform "payload" transaction 6. commit--let's say this fails! 7. reconnect 8. check if log record exists:- if no, the transaction was successfully committed- if yes, and we can read it: transactionwas aborted- let's say yes, but it's still locked: transaction still in progress 9. poll for a change in the record's situation For extra points, I might even use a trigger to reduce the need for polling (though I don't think that's a particularly good idea since it won't help the abort case and sacrifices performance of the normal case) What do you guys think--could this work? TIA, Jeroen
"Jeroen T. Vermeulen" <jtv@xs4all.nl> writes: > Some of you may remember that we discussed libpqxx's higher-level > support for transactionality. > [complicated scheme] Another way you can look to see if a transaction has completed is to see if there is still an ExclusiveLock record for it in pg_locks. For this you need to know the transaction's internal XID, but you can get that from the XMAX of the log record it deleted. In other words: insert log record;begin;delete log record;<<payload work>>commit; On reconnect: log record not present: it committedelse, read XMAX from log recordzero -> failed before the delete :-(not zero -> look formatch in pg_locksfound -> transaction still in progress, waitnot found -> it aborted This is all pretty messy though. Seems like it would be a whole lot easier if the backend exposed ways to (a) obtain your current transaction number and (b) inquire directly about the commit status of a transaction. However, with the present transaction infrastructure (b) would only work for a limited time window after the original transaction (until the corresponding clog space got recycled). So that might or might not be good enough for your purposes. regards, tom lane
On Mon, Oct 20, 2003 at 12:18:44PM -0400, Tom Lane wrote: > Another way you can look to see if a transaction has completed is to see > if there is still an ExclusiveLock record for it in pg_locks. For this > you need to know the transaction's internal XID, but you can get that > from the XMAX of the log record it deleted. In other words: Does this mean there's already a record in the database that (exists && is locked) if and only if the transaction is still in progress? Dirty implementation details aside, that would be simple enough--sounds like just what I need. If only I gain the ability to see if the transaction is still in progress, I can just patch that into my existing implementation to close the hole. Anything else is optimization! BTW, I guess I'd need to either create or update the record inside the main transaction to get a useful XID on it, right? > log record not present: it committed > else, read XMAX from log record > zero -> failed before the delete :-( > not zero -> look for match in pg_locks > found -> transaction still in progress, wait > not found -> it aborted It may be very postgres-specific but it's not really all that messy. At least not if I structure my code right. C++ exceptions make it very easy to "drop out of the algorithm" at any stage to convey the message that the transaction failed. Which isn't even bad news under the circumstances; the only bad news is no news. And something I really like about your algorithm is that there's no inconclusive exit. That's the important thing. > This is all pretty messy though. Seems like it would be a whole lot > easier if the backend exposed ways to (a) obtain your current > transaction number and (b) inquire directly about the commit status of > a transaction. However, with the present transaction infrastructure > (b) would only work for a limited time window after the original > transaction (until the corresponding clog space got recycled). So that > might or might not be good enough for your purposes. We discussed that with Bruce at the FOSDEM last February... But if I can do it without API and FE/BE protocol changes, barring any major performance difference in the normal case, that's obviously better. Might be a useful test vehicle as well. If we end up with something that works but only leaves some garbage collection task that the client isn't really the right place for, there's no need to make highly specific changes to the core just to support one algorithm. Something like the "active notification" Bruce came up with recently might do the trick. Or maybe we'd want a "kill" feature similar to the Unix command to force an abort onto an ongoing transaction. That way we could force Schroedinger's cat to be either dead or alive, so to speak. Thanks for your help BTW. Jeroen
"Jeroen T. Vermeulen" <jtv@xs4all.nl> writes: > On Mon, Oct 20, 2003 at 12:18:44PM -0400, Tom Lane wrote: >> Another way you can look to see if a transaction has completed is to see >> if there is still an ExclusiveLock record for it in pg_locks. For this >> you need to know the transaction's internal XID, but you can get that >> from the XMAX of the log record it deleted. In other words: > Does this mean there's already a record in the database that (exists && > is locked) if and only if the transaction is still in progress? It's not actually a record in the database ... pg_locks is a view of internal lock-manager state. (It's new as of 7.3, BTW; dunno how far back you want your stuff to work.) But it will get the job done for you. regards, tom lane
On Mon, Oct 20, 2003 at 01:35:13PM -0400, Tom Lane wrote: > > It's not actually a record in the database ... pg_locks is a view of > internal lock-manager state. (It's new as of 7.3, BTW; dunno how far > back you want your stuff to work.) But it will get the job done for > you. Sure beats what I have now--I can even compile it in conditionally or check for it at runtime. One question though: how likely is the code to require changes for future postgres versions? Jeroen
"Jeroen T. Vermeulen" <jtv@xs4all.nl> writes: > One question though: how likely is the code to require changes for > future postgres versions? Impossible to say ... I wouldn't foresee any near-term changes, though. Whether the other approach would be any more future-proof is not clear either. regards, tom lane
On Mon, Oct 20, 2003 at 03:39:31PM -0400, Tom Lane wrote: > > Impossible to say ... I wouldn't foresee any near-term changes, though. > Whether the other approach would be any more future-proof is not clear > either. How standardized is row locking anyway? If I can just ensure that the record stays locked (or flagged in some other externally visible way) while the transaction is running, we have the problem licked. Jeroen
"Jeroen T. Vermeulen" <jtv@xs4all.nl> writes: > On Mon, Oct 20, 2003 at 03:39:31PM -0400, Tom Lane wrote: >> Whether the other approach would be any more future-proof is not clear >> either. > How standardized is row locking anyway? Not at all, which is one reason I think that you might be fooling yourself to imagine that a solution dependent on row locking is more future-proof than one that depends explicitly on some Postgres-specific behavior. The pg_locks approach is at least likely to be more obvious about it if it breaks ... regards, tom lane