Re: XA end then join fix for WebLogic - Mailing list pgsql-jdbc

From Heikki Linnakangas
Subject Re: XA end then join fix for WebLogic
Date
Msg-id 4551B0C5.3010708@enterprisedb.com
Whole thread Raw
In response to Re: XA end then join fix for WebLogic  (Heikki Linnakangas <heikki@enterprisedb.com>)
Responses Re: XA end then join fix for WebLogic
List pgsql-jdbc
ludovic orban wrote:
> 2006/11/7, Heikki Linnakangas <heikki@enterprisedb.com>:
>> Well, if the server has the resources to do more concurrent work than
>> the pool size allows, just increase the pool. If it doesn't, then it
>> doesn't no matter what you do.
>>
>> Did you try increasing the pool size instead of interleaving normal work
>> and commits?
>
> No, I didn't try with different pool size. I just checked that with 10
> connections you could do 25% more work with interleaved transactions.
>
> I guess I could achieve the same performance by increasing the pool
> size but wouldn't that  add some overhead in memory utilization on the
> DB engine ? I mean you can do less with the same amount of resources
> when using XA compared to local transactions.

Well, it depends on the DBMS implementation. If the DBMS allocates a
significant amount of memory per connection, then yes. If it doesn't,
no. But you have to also take into account the cost of switching
transaction context in the DBMS as well. It might be expensive, again
depending on the internal architecture of the DBMS.

There isn't a generic answer, but I'd say that either way the overhead
is probably negligible compared to all the other CPU work and memory
used for shared buffers etc. What matters most is the total amount of
CPU and I/O workload you submit to the server, not how you submit it.

>> I guess it depends on the RM, but let's assume a typical client/server
>> scenario where one XAConnection has one TCP connection to the server.
>> The commit() will certainly have to submit the commit message to the
>> server somehow, and unless it uses carrier pigeons or some other
>> out-of-band channel, it has to use the TCP connection it has. If that
>> connection is busy, waiting for another query to finish, the commit will
>> have to wait.
>>
>> In fact, because the commit() call will have to wait for the current
>> statement to finish, that might lead to a longer response time.
>> Especially if you have a mix of long and short transactions.
>>
>> I just tried this with Firebird (1.5.3) and JayBird (2.1.0) that you did
>> your performance test with. It *does* seem to suffer from the scenario I
>> described. See attached test case.
>
> I disagree. The reason why commit blocks in your use case is because
> of transaction isolation (ie: update locks) in the DB engine and
> nothing else. You could reproduce this exact scenario with two local
> transactions.

You missed the point of the test case. It mimics the call pattern a RM
would see in an environment consisting of:

1. An application server
2. A TM that does 'on statement acquirement'
3. A connection pool of 1 connection
4. A transaction that executes: UPDATE foo SET bar = ?

If two of these transactions. let's call them A and B, arrive to the
application server roughly at the same time, you can get a deadlock with
this call pattern:

A: gets connection from pool
A: executes UPDATE
A: returns connection to pool, but doesn't commit yet
B: gets connection from pool
B: executes UPDATE. UPDATE blocks because it has to wait for transaction
A to finish.

Now when TM tries to get a connection from the pool to commit A, it
can't because the single connection in the pool is already in use. Or,
if it doesn't try to acquire a connection but just issues
XAResource.commit of the single connection, the commit-call will block
because the underlaying physical connection to the server is busy
waiting for UPDATE B statement to finish.

In the XATest.java, transaction A == neverfinishxid and transaction B ==
blockxid. TM trying to issue the commit of A is the Committer thread.

This scenario wouldn't happen without interleaving of commits and other
work. Transaction A wouldn't release the connection to the pool until it
commits, and therefore transaction B would block until it commits, and
there would be no deadlock.

Of course, as the number of connections in the pool increases, the
chances of this happening decreases, but is not fully eliminated. Some
usage patterns would make it quite likely. For example, if you have a
fairly long-running transaction that's executed periodically and takes a
full exclusive table lock, you might have 20 other transactions blocking
on that transaction, exhausting the pool and causing a deadlock because
the long-running transaction could never be committed.

There is ways to fix that in the TM. For example, the TM could reserve a
dedicated connection to do commits with. But that seems bad for
concurrency. Or a separate pool of connections. Or it could not return
the connection to the pool until commit.

> If the two transactions you run concurrently aren't put to sleep
> because of locked data access, they'd both run in parallel without a
> hitch.
>
> To convince yourself, edit your test and change
> conn.createStatement().executeUpdate("UPDATE foo SET bar = 0000");
> into
> conn.createStatement().executeUpdate("INSERT INTO foo VALUES(0000)");

No, that's not what my imaginary application does, it does UPDATEs.
Fixing the problem by changing the application is cheating :).

> Then add the missing
> xares.end(blockxid, XAResource.TMSUCCESS);
> after
> System.out.println("Second update committed");
>
> and finally change this incorrect call
> conn.commit();
> into
> xares.commit(blockxid, true);

Good catch. I missed that because the test case never runs that far.

>> > With 'on statement creation' what I expect will happen: update A
>> > committed and update B rolled back while with 'on connection
>> > acquirement' both will get committed because update B is executed in a
>> > local transaction context since it has not been enlisted in the
>> > sub-transaction's context.
>>
>> Umm, sub-transaction?
>
> A badly chosen term to indicate the transaction that runs between
> tm.suspend() and tm.resume() in the example.
>
> With 'on connection acquirement' UPDATE B is committed because it runs
> on a resource that has not been enlisted in the current transaction.
> This means a default auto-committed local transaction is silently used
> instead.

Well I guess you could detect that in the TM by keeping track of
connections and their associations with threads if you cared. But this
is getting off-topic.

> This is exactly my point: I can accept that a database, JMS server or
> transaction manager has some limitations but *NOT* inconsistencies.
> Especially *NOT* when it comes to distributed transactions where the
> whole point is to secure data coherence, even at a high cost. I could
> accept an exception being raised when the TM or the RM detects an
> unsupported call but silently messing the consistency is not
> acceptable.

Sure. If you find anything like that in the PostgreSQL driver, please
drop a note (or even better, a patch).

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

pgsql-jdbc by date:

Previous
From: tomasz brymora
Date:
Subject: PreparedStatement, getLastOID() and java.lang.ClassCastException
Next
From: Dave Cramer
Date:
Subject: Re: PreparedStatement, getLastOID() and java.lang.ClassCastException