Thread: In-doubt window

In-doubt window

From

"Jeroen T. Vermeulen"

Date:

20 October 2003, 10:36:15

Some of you may remember that we discussed libpqxx's higher-level
support for transactionality.

It was pointed out (by Tom, IIRC) that the connection to the backend
may be lost just after you've sent out the commit, while you're still
waiting for a confirmation. So I built a class that expends some
optional extra effort to reconnect after this type of failure, and find
out whether the transaction was actually committed or not. It involves
writing a record to a dedicated "transaction log" table inside the
transaction, and deleting it right after the commit--or it could be the
other way around: write the record before the transaction, then delete
it inside the transaction. If the library ever needs to reconnect to
find out if the transaction went through, it looks up this record and
its mere existence tells it what it needs to know.

A long time after, somebody else pointed out a remaining hole in this
approach: what if your transaction took so long that it's still doing
its commit when you reconnect, and so the state you see on reconnect
suggests that the transaction was aborted--when in reality it may still
complete successfully? The only things I could think of that didn't
require new database features were ways to slightly improve the odds:
wait a while before reconnecting; ensure that constraints are checked
before sending the commit; only resolve the in-doubt situation if the
answer is "yes, there was definitely a commit."
But now my bright friend and colleague German Rodriguez has come up
with an idea that might just work. It involves locking, which I don't
know much about so I'd be interested to hear from people with a better
grasp of how locking works and what's possible in the postgres
implementation.

The proposed scheme, based on German's idea, goes as follows:

1. create a log record
2. begin transaction
3. lock record
4. delete record
5. perform "payload" transaction
6. commit--let's say this fails!
7. reconnect
8. check if log record exists:- if no, the transaction was successfully committed- if yes, and we can read it:
transactionwas aborted- let's say yes, but it's still locked: transaction still in progress

9. poll for a change in the record's situation

For extra points, I might even use a trigger to reduce the need for
polling (though I don't think that's a particularly good idea since it
won't help the abort case and sacrifices performance of the normal case)

What do you guys think--could this work?

TIA,

Jeroen

Re: In-doubt window

From

Tom Lane

Date:

20 October 2003, 13:19:17

"Jeroen T. Vermeulen" <jtv@xs4all.nl> writes:
> Some of you may remember that we discussed libpqxx's higher-level
> support for transactionality.  

> [complicated scheme]

Another way you can look to see if a transaction has completed is to see
if there is still an ExclusiveLock record for it in pg_locks.  For this
you need to know the transaction's internal XID, but you can get that
from the XMAX of the log record it deleted.  In other words:
insert log record;begin;delete log record;<<payload work>>commit;

On reconnect:
log record not present: it committedelse, read XMAX from log recordzero -> failed before the delete :-(not zero -> look
formatch in pg_locksfound -> transaction still in progress, waitnot found -> it aborted
 

This is all pretty messy though.  Seems like it would be a whole lot
easier if the backend exposed ways to (a) obtain your current
transaction number and (b) inquire directly about the commit status of
a transaction.  However, with the present transaction infrastructure
(b) would only work for a limited time window after the original
transaction (until the corresponding clog space got recycled).  So that
might or might not be good enough for your purposes.
        regards, tom lane

Re: In-doubt window

From

"Jeroen T. Vermeulen"

Date:

20 October 2003, 14:20:01

On Mon, Oct 20, 2003 at 12:18:44PM -0400, Tom Lane wrote:

> Another way you can look to see if a transaction has completed is to see
> if there is still an ExclusiveLock record for it in pg_locks.  For this
> you need to know the transaction's internal XID, but you can get that
> from the XMAX of the log record it deleted.  In other words:

Does this mean there's already a record in the database that (exists &&
is locked) if and only if the transaction is still in progress?  Dirty
implementation details aside, that would be simple enough--sounds like
just what I need.

If only I gain the ability to see if the transaction is still in
progress, I can just patch that into my existing implementation to
close the hole.  Anything else is optimization!

BTW, I guess I'd need to either create or update the record inside the
main transaction to get a useful XID on it, right?

>     log record not present: it committed
>     else, read XMAX from log record
>     zero -> failed before the delete :-(
>     not zero -> look for match in pg_locks
>     found -> transaction still in progress, wait
>     not found -> it aborted
It may be very postgres-specific but it's not really all that messy.
At least not if I structure my code right.  C++ exceptions make it very
easy to "drop out of the algorithm" at any stage to convey the message
that the transaction failed.  Which isn't even bad news under the 
circumstances; the only bad news is no news.

And something I really like about your algorithm is that there's no
inconclusive exit.  That's the important thing.

> This is all pretty messy though.  Seems like it would be a whole lot
> easier if the backend exposed ways to (a) obtain your current
> transaction number and (b) inquire directly about the commit status of
> a transaction.  However, with the present transaction infrastructure
> (b) would only work for a limited time window after the original
> transaction (until the corresponding clog space got recycled).  So that
> might or might not be good enough for your purposes.

We discussed that with Bruce at the FOSDEM last February...  But if I
can do it without API and FE/BE protocol changes, barring any major
performance difference in the normal case, that's obviously better.

Might be a useful test vehicle as well.  If we end up with something
that works but only leaves some garbage collection task that the client
isn't really the right place for, there's no need to make highly
specific changes to the core just to support one algorithm.  Something
like the "active notification" Bruce came up with recently might do the
trick.  Or maybe we'd want a "kill" feature similar to the Unix command
to force an abort onto an ongoing transaction.  That way we could force 
Schroedinger's cat to be either dead or alive, so to speak.

Thanks for your help BTW.

Jeroen

Re: In-doubt window

From

Tom Lane

Date:

20 October 2003, 14:37:20

"Jeroen T. Vermeulen" <jtv@xs4all.nl> writes:
> On Mon, Oct 20, 2003 at 12:18:44PM -0400, Tom Lane wrote:
>> Another way you can look to see if a transaction has completed is to see
>> if there is still an ExclusiveLock record for it in pg_locks.  For this
>> you need to know the transaction's internal XID, but you can get that
>> from the XMAX of the log record it deleted.  In other words:

> Does this mean there's already a record in the database that (exists &&
> is locked) if and only if the transaction is still in progress?

It's not actually a record in the database ... pg_locks is a view of
internal lock-manager state.  (It's new as of 7.3, BTW; dunno how far
back you want your stuff to work.)  But it will get the job done for
you.
        regards, tom lane

Re: In-doubt window

From

"Jeroen T. Vermeulen"

Date:

20 October 2003, 15:59:24

On Mon, Oct 20, 2003 at 01:35:13PM -0400, Tom Lane wrote:
> 
> It's not actually a record in the database ... pg_locks is a view of
> internal lock-manager state.  (It's new as of 7.3, BTW; dunno how far
> back you want your stuff to work.)  But it will get the job done for
> you.

Sure beats what I have now--I can even compile it in conditionally or
check for it at runtime.

One question though: how likely is the code to require changes for
future postgres versions?

Jeroen

Re: In-doubt window

From

Tom Lane

Date:

20 October 2003, 16:40:04

"Jeroen T. Vermeulen" <jtv@xs4all.nl> writes:
> One question though: how likely is the code to require changes for
> future postgres versions?

Impossible to say ... I wouldn't foresee any near-term changes, though.
Whether the other approach would be any more future-proof is not clear
either.
        regards, tom lane

Re: In-doubt window

From

"Jeroen T. Vermeulen"

Date:

20 October 2003, 16:44:48

On Mon, Oct 20, 2003 at 03:39:31PM -0400, Tom Lane wrote:
> 
> Impossible to say ... I wouldn't foresee any near-term changes, though.
> Whether the other approach would be any more future-proof is not clear
> either.

How standardized is row locking anyway?  If I can just ensure that the
record stays locked (or flagged in some other externally visible way) 
while the transaction is running, we have the problem licked.

Jeroen

Re: In-doubt window

From

Tom Lane

Date:

20 October 2003, 16:57:29

"Jeroen T. Vermeulen" <jtv@xs4all.nl> writes:
> On Mon, Oct 20, 2003 at 03:39:31PM -0400, Tom Lane wrote:
>> Whether the other approach would be any more future-proof is not clear
>> either.

> How standardized is row locking anyway?

Not at all, which is one reason I think that you might be fooling
yourself to imagine that a solution dependent on row locking is more
future-proof than one that depends explicitly on some Postgres-specific
behavior.  The pg_locks approach is at least likely to be more obvious
about it if it breaks ...
        regards, tom lane